# Get Phenopackets
This notebook shows how to extract a TAR or ZIP archive with all of the Phenopackets in this repository.
To do so, the code looks in the **notebooks* directory for all subfolders called "phenopackets", copies all of the
"*.json" files in those directories to a temporary location, creates a TAR or ZIP archive, and copies that to the location indicated by the code.
The code also provides two pandas dataframes that can be used to extract files from the archives that satisfy certain criterie, e.g., having a minum number of HPO terms, having a certaion disease diagnosis, etc.

In [1]:
from ppktstore.archive import PPKtStore

In [2]:
notebook_dir = "notebooks"
store = PPKtStore(notebook_dir)

In [3]:
df = store.get_phenopacket_dataframe()
df.head()

Unnamed: 0,cohort,directory,filename,phenopacket.id,disease,n_hpo,n_var,n_alleles,n_encounters
0,TRAF7,notebooks/TRAF7/phenopackets,notebooks/TRAF7/phenopackets/PMID_32376980_37....,PMID_32376980_37,"Cardiac, facial, and digital anomalies with de...",14,1,1,1
1,TRAF7,notebooks/TRAF7/phenopackets,notebooks/TRAF7/phenopackets/PMID_32376980_43....,PMID_32376980_43,"Cardiac, facial, and digital anomalies with de...",29,1,1,1
2,TRAF7,notebooks/TRAF7/phenopackets,notebooks/TRAF7/phenopackets/PMID_32376980_6pa...,PMID_32376980_6_(=_patient_DDD4K.01539_in_DDD_...,"Cardiac, facial, and digital anomalies with de...",28,1,1,1
3,TRAF7,notebooks/TRAF7/phenopackets,notebooks/TRAF7/phenopackets/PMID_32376980_25....,PMID_32376980_25,"Cardiac, facial, and digital anomalies with de...",11,1,1,1
4,TRAF7,notebooks/TRAF7/phenopackets,notebooks/TRAF7/phenopackets/PMID_32376980_30....,PMID_32376980_30,"Cardiac, facial, and digital anomalies with de...",19,1,1,1


In [4]:
summary_df = store.get_summary_dir()
summary_df.head(50)

Unnamed: 0,Cohort,Directory,Count
0,TRAF7,notebooks/TRAF7/phenopackets,45
1,SLC4A1,notebooks/SLC4A1/phenopackets,33
2,TGFB2,notebooks/TGFB2/phenopackets,34
3,TAF4,notebooks/TAF4/phenopackets,10
4,KCNH5,notebooks/KCNH5/phenopackets,22
5,SPTSSA,notebooks/SPTSSA/phenopackets,3
6,SMAD2,notebooks/SMAD2/phenopackets,16
7,COQ7,notebooks/COQ7/phenopackets,6
8,MRAS,notebooks/MRAS/phenopackets,3
9,POT1,notebooks/POT1/phenopackets,4


# Export gzip archive

In [5]:
store.get_store_gzip("all_phenopackets")

Adding archive suffix to outfilename
Added 4189 files to tar archive all_phenopackets.tgz


# Export zip archive

In [6]:
store.get_store_zip("all_phenopackets")

Added 4189 files to zip archive at /home/ielis/data/phenopacket-store/all_phenopackets


# Create MarkDown file
We use this function to update the markdown file for the online documentation

In [7]:
import os
from ppktstore.archive import PPKtListing

notebook_dir = "notebooks"
outfile = os.path.join('docs', 'collections.md')
plisting = PPKtListing(notebook_dir=notebook_dir)
plisting.createMDFile(outFile=outfile)

We found 99 cohorts
Wrote phenopacket collection MarkDown file to docs/collections.md


EOF