# Get Phenopackets
This notebook shows how to extract a TAR or ZIP archive with all of the Phenopackets in this repository.
To do so, the code looks in the **notebooks* directory for all subfolders called "phenopackets", copies all of the
"*.json" files in those directories to a temporary location, creates a TAR or ZIP archive, and copies that to the location indicated by the code.
The code also provides two pandas dataframes that can be used to extract files from the archives that satisfy certain criterie, e.g., having a minum number of HPO terms, having a certaion disease diagnosis, etc.

In [1]:
from ppktstore import PPKtStore, Cohort, PPacket

In [2]:
notebook_dir = "notebooks"
store = PPKtStore(notebook_dir)

In [3]:
df = store.get_phenopacket_dataframe()
df.head()

Unnamed: 0,cohort,directory,filename,phenopacket.id,disease,n_hpo,n_var,n_alleles,n_encounters
0,WWOX,notebooks/WWOX/phenopackets,notebooks/WWOX/phenopackets/PMID_17470496_2.json,PMID_17470496_2,"Spinocerebellar ataxia, autosomal recessive 12...",9,1,2,1
1,WWOX,notebooks/WWOX/phenopackets,notebooks/WWOX/phenopackets/PMID_17470496_3.json,PMID_17470496_3,"Spinocerebellar ataxia, autosomal recessive 12...",9,1,2,1
2,WWOX,notebooks/WWOX/phenopackets,notebooks/WWOX/phenopackets/PMID_17470496_0.json,PMID_17470496_0,"Spinocerebellar ataxia, autosomal recessive 12...",9,1,2,1
3,WWOX,notebooks/WWOX/phenopackets,notebooks/WWOX/phenopackets/PMID_17470496_1.json,PMID_17470496_1,"Spinocerebellar ataxia, autosomal recessive 12...",9,1,2,1
4,ANKRD11,notebooks/ANKRD11/phenopackets,notebooks/ANKRD11/phenopackets/PMID_36446582_N...,"PMID_36446582_Novara,_2017_P2",KBG syndrome (OMIM:148050),5,1,1,1


In [4]:
summary_df = store.get_summary_dir()
summary_df.head(50)

Unnamed: 0,Cohort,Directory,Count
0,SMARCB1,notebooks/SMARCB1/phenopackets,0
1,WWOX,notebooks/WWOX/phenopackets,4
2,ANKRD11,notebooks/ANKRD11/phenopackets,328
3,GLI3,notebooks/GLI3/phenopackets,77
4,SETD2,notebooks/SETD2/phenopackets,14
5,ZSWIM6,notebooks/ZSWIM6/phenopackets,0
6,ANKH,notebooks/ANKH/phenopackets,7
7,KDM6B,notebooks/KDM6B/phenopackets,73
8,SMARCC2,notebooks/SMARCC2/phenopackets,0
9,MAPK8IP3,notebooks/MAPK8IP3/phenopackets,20


# Export gzip archive

In [5]:
store.get_store_gzip("all_phenopackets")

Adding archive suffix to outfilename
Added 177 files to tar archive at /Users/robinp/GIT/phenopacket-store/all_phenopackets.tgz


# Export zip archive

In [6]:
store.get_store_zip("all_phenopackets")

Added 177 files to zip archive at /Users/robinp/GIT/phenopacket-store/all_phenopackets
