**Prepare Phenopacket store release**

This notebook shows how to pack all phenopackets in this repository into a TAR or ZIP archive.

To do so, the code looks in the *notebooks* directory for all subfolders called "phenopackets", copies all of the
"*.json" files in those directories to a temporary folder, and creates a TAR or ZIP archive from the folder.

The code also provides two pandas dataframes that can be used to extract files from the archives that satisfy certain criteria, e.g., having a minimum number of HPO terms, having a certain disease diagnosis, etc.

In [1]:
from ppktstore.model import PhenopacketStore

notebook_dir = "notebooks"
store = PhenopacketStore.from_notebook_dir(notebook_dir)

We show a table with gene and variant summary for each phenopacket:

In [2]:
from ppktstore.stats import summarize_diseases_and_genotype

df = summarize_diseases_and_genotype(store)
df.head(2)

Unnamed: 0,disease,disease_id,patient_id,gene,allele_1,allele_2,PMID,cohort,filename
0,"Fanconi anemia, complementation group C",OMIM:227645,proband,FANCC,NM_000136.3:c.67del,NM_000136.3:c.67del,PMID:22701786,FANCC,FANCC/phenopackets/PMID_22701786_proband.json
1,"Fanconi anemia, complementation group C",OMIM:227645,first patient,FANCC,NM_000136.3:c.455dup,NM_000136.3:c.1393C>T,PMID:16429406,FANCC,FANCC/phenopackets/PMID_16429406_firstpatient....


We show the number of phenotypic features, variants, alleles, and encounters for each phenopacket:

In [3]:
from ppktstore.stats import summarize_genotype_phenotype

df = summarize_genotype_phenotype(store)
df.head()

Unnamed: 0,cohort,directory,filename,phenopacket.id,disease,n_hpo,n_var,n_alleles,n_encounters
0,FANCC,FANCC/phenopackets,PMID_22701786_proband.json,PMID_22701786_proband,"Fanconi anemia, complementation group C (OMIM:...",11,1,2,2
1,FANCC,FANCC/phenopackets,PMID_16429406_firstpatient.json,PMID_16429406_first_patient,"Fanconi anemia, complementation group C (OMIM:...",4,2,2,1
2,FANCC,FANCC/phenopackets,PMID_31044565_proband.json,PMID_31044565_proband,"Fanconi anemia, complementation group C (OMIM:...",12,1,2,3
3,FANCC,FANCC/phenopackets,PMID_16429406_secondpatient.json,PMID_16429406_second_patient,"Fanconi anemia, complementation group C (OMIM:...",10,2,2,2
4,SAMD7,SAMD7/phenopackets,PMID_38272031_Individual1-1.json,PMID_38272031_Individual_1_1,Macular dystrophy with or without cone dysfunc...,7,1,2,1


# Create release archives

In [4]:

from ppktstore.archive import PhenopacketStoreArchiver, ArchiveFormat

archiver = PhenopacketStoreArchiver()

## Export TAR GZ archive

Write phenopackets into a TAR GZ archive `all_phenopackets.tar.gz`.

In [5]:
archiver.prepare_archive(
    store=store,
    format=ArchiveFormat.TGZ,
    filename="all_phenopackets",
)

## Export ZIP archive

Write phenopackets into a TAR GZ archive `all_phenopackets.zip`.

In [None]:
archiver.prepare_archive(
    store=store,
    format=ArchiveFormat.ZIP,
    filename="all_phenopackets",
)

EOF