# Get Phenopackets
This notebook shows how to extract a TAR or ZIP archive with all of the Phenopackets in this repository.
To do so, the code looks in the **notebooks* directory for all subfolders called "phenopackets", copies all of the
"*.json" files in those directories to a temporary location, creates a TAR or ZIP archive, and copies that to the location indicated by the code.
The code also provides two pandas dataframes that can be used to extract files from the archives that satisfy certain criterie, e.g., having a minum number of HPO terms, having a certaion disease diagnosis, etc.

In [1]:
from ppktstore import PPKtStore, Cohort, PPacket

In [2]:
notebook_dir = "notebooks"
store = PPKtStore(notebook_dir)

In [3]:
df = store.get_phenopacket_dataframe()
df.head()

Unnamed: 0,cohort,directory,filename,phenopacket.id,disease,n_hpo,n_var,n_alleles,n_encounters
0,ESAM,notebooks/ESAM/phenopackets,notebooks/ESAM/phenopackets/PMID_36996813_Indi...,PMID_36996813_Individual_KCHYD24-1,Neurodevelopmental disorder with intracranial ...,1,1,2,1
1,ESAM,notebooks/ESAM/phenopackets,notebooks/ESAM/phenopackets/PMID_36996813_Indi...,PMID_36996813_Individual_1,Neurodevelopmental disorder with intracranial ...,27,1,2,1
2,ESAM,notebooks/ESAM/phenopackets,notebooks/ESAM/phenopackets/PMID_36996813_Indi...,PMID_36996813_Individual_13,Neurodevelopmental disorder with intracranial ...,9,1,2,1
3,ESAM,notebooks/ESAM/phenopackets,notebooks/ESAM/phenopackets/PMID_36996813_Indi...,PMID_36996813_Individual_7,Neurodevelopmental disorder with intracranial ...,20,1,2,1
4,ESAM,notebooks/ESAM/phenopackets,notebooks/ESAM/phenopackets/PMID_36996813_Indi...,PMID_36996813_Individual_12,Neurodevelopmental disorder with intracranial ...,9,1,2,1


In [4]:
summary_df = store.get_summary_dir()
summary_df.head(50)

Unnamed: 0,Cohort,Directory,Count
0,ESAM,notebooks/ESAM/phenopackets,14
1,HMGCR,notebooks/HMGCR/phenopackets,15
2,SMARCB1,notebooks/SMARCB1/phenopackets,17
3,LMNA,notebooks/LMNA/phenopackets,127
4,11q_terminal_deletion,notebooks/11q_terminal_deletion/phenopackets,69
5,TBX5,notebooks/TBX5/phenopackets,103
6,NUP54,notebooks/NUP54/phenopackets,3
7,WWOX,notebooks/WWOX/phenopackets,9
8,SRSF1,notebooks/SRSF1/phenopackets,15
9,ANKRD11,notebooks/ANKRD11/phenopackets,328


# Export gzip archive

In [5]:
store.get_store_gzip("all_phenopackets")

Adding archive suffix to outfilename
Added 3283 files to tar archive at /Users/robinp/GIT/phenopacket-store/all_phenopackets.tgz


# Export zip archive

In [6]:
store.get_store_zip("all_phenopackets")

Added 3283 files to zip archive at /Users/robinp/GIT/phenopacket-store/all_phenopackets


# Create MarkDown file
We use this function to update the markdown file for the online documentation

In [7]:
from ppktstore import PPKtListing
notebook_dir = "notebooks"
outfile = "collections.md"
plisting = PPKtListing(notebook_dir=notebook_dir)
plisting.createMDFile(outFile=outfile)

We found 46 cohorts
Wrote phenopacket collection MarkDown file to collections.md
