<h1>Generate table from collection of phenopackets</h1>
<p>A common task for the analysis of a cohort of individuals with pathogenic variants in a given gene is to generate a table with a summary of the findings. The pyphetool package has functionality to ingest a collection of phenopakcets and to generate several different kinds of tables that may be useful for publications of supplementary material sections.</p>

In [1]:
import phenopackets as php
from google.protobuf.json_format import MessageToDict, MessageToJson
from google.protobuf.json_format import Parse, ParseDict
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from collections import defaultdict
import os
import sys

from phenopackets import Phenopacket
from google.protobuf.json_format import Parse
import json

import hpotk

sys.path.insert(0, os.path.abspath('../../pyphetools'))
from pyphetools.output import *

In [2]:
phenopacket_dir = "phenopackets"

In [3]:
## NOT NEEDED ANYMORE
phenopacket_paths = []
if not os.path.isdir(phenopacket_dir):
    raise ValueError(f"{phenopacket_dir} is not a directory")
for root, dirs, files in os.walk(phenopacket_dir):
    for file in files:
        if file.endswith(".json"):
            phenopacket_paths.append(os.path.join(root,file))
print(f"We extracted {len(phenopacket_paths)} GA4GH phenopackets")

We extracted 32 GA4GH phenopackets


In [4]:
ingestor = PhenopacketIngestor(indir=phenopacket_dir)

In [5]:
patient_d = ingestor.get_patient_dictionary()
print(f"We got {len(patient_d)} phenopackets")

We got 19 phenopackets


In [6]:
from hpotk.ontology import Ontology
from hpotk.ontology.load.obographs import load_ontology
if os.path.isfile('hpo_data/hp.json'):
    hpo_ontology = load_ontology('hpo_data/hp.json')
else:
    hpo_ontology = load_ontology('https://raw.githubusercontent.com/obophenotype/human-phenotype-ontology/master/hp.json')

In [7]:
focus_id = 'PMID_35146895_Individual_5'
ftab = FocusCountTable(patient_d=patient_d, focus_id=focus_id, ontology=hpo_ontology)

In [8]:
df = ftab.get_simple_table()

In [9]:
pd.set_option('display.max_rows', None)
df

Unnamed: 0_level_0,term,HP:id,focus,other,total,total_count
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
constitutional,Pain,HP:0012531,0,1,1/19 (5.3%),1
ear,Hearing impairment,HP:0000365,0,1,1/19 (5.3%),1
endocrine,Increased circulating osteocalcin level,HP:0031428,0,1,1/19 (5.3%),1
eye,Strabismus,HP:0000486,1,1,2/19 (10.5%),2
eye,Cerebral visual impairment,HP:0100704,0,2,2/19 (10.5%),2
eye,Ptosis,HP:0000508,0,1,1/19 (5.3%),1
eye,Exotropia,HP:0000577,0,1,1/19 (5.3%),1
eye,Deeply set eye,HP:0000490,1,0,1/19 (5.3%),1
eye,Hypertelorism,HP:0000316,0,1,1/19 (5.3%),1
genitourinary,Cryptorchidism,HP:0000028,1,0,1/19 (5.3%),1


In [10]:
df

Unnamed: 0_level_0,term,HP:id,focus,other,total,total_count
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
constitutional,Pain,HP:0012531,0,1,1/19 (5.3%),1
ear,Hearing impairment,HP:0000365,0,1,1/19 (5.3%),1
endocrine,Increased circulating osteocalcin level,HP:0031428,0,1,1/19 (5.3%),1
eye,Strabismus,HP:0000486,1,1,2/19 (10.5%),2
eye,Cerebral visual impairment,HP:0100704,0,2,2/19 (10.5%),2
eye,Ptosis,HP:0000508,0,1,1/19 (5.3%),1
eye,Exotropia,HP:0000577,0,1,1/19 (5.3%),1
eye,Deeply set eye,HP:0000490,1,0,1/19 (5.3%),1
eye,Hypertelorism,HP:0000316,0,1,1/19 (5.3%),1
genitourinary,Cryptorchidism,HP:0000028,1,0,1/19 (5.3%),1


In [11]:
df2 = ftab.get_thresholded_table(min_proportion=0.33)

Output terms with at least 6 counts


In [12]:
ftab.get_thresholded_table(min_proportion=0.2)

Output terms with at least 4 counts


Unnamed: 0_level_0,term,HP:id,focus,other,total,total_count
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
eye,Abnormality of the eye,HP:0000478,1,4,5/19 (26.3%),5
eye,Abnormal eye physiology,HP:0012373,1,3,4/19 (21.1%),4
growth,Abnormality of body height,HP:0000002,0,4,4/19 (21.1%),4
growth,Short stature,HP:0004322,0,4,4/19 (21.1%),4
growth,Growth abnormality,HP:0001507,0,4,4/19 (21.1%),4
growth,Growth delay,HP:0001510,0,4,4/19 (21.1%),4
head/neck,Abnormality of the face,HP:0000271,2,7,9/19 (47.4%),9
head/neck,Abnormality of head or neck,HP:0000152,2,7,9/19 (47.4%),9
head/neck,Abnormality of the head,HP:0000234,2,7,9/19 (47.4%),9
head/neck,Abnormal oral cavity morphology,HP:0000163,1,6,7/19 (36.8%),7


In [13]:
from IPython.display import HTML, display

In [14]:
patient_d.get('Individual 3')
pplist = list(patient_d.values())

In [15]:
table = PhenopacketTable(phenopacket_list = pplist)

In [16]:
display(HTML(table.to_html()))

Individual,Genotype,Phenotypic features
Individual 3 (MALE; P16Y),NM_015133.4:c.1732C>T(heterozygous),"Motor delay (HP:0001270); Delayed ability to sit (HP:0025336); Delayed ability to walk (HP:0031936); Intellectual disability, profound (HP:0002187); Delayed speech and language development (HP:0000750); Spastic diplegia (HP:0001264); Loss of ambulation (HP:0002505); Seizure (HP:0001250); Hypoplasia of the corpus callosum (HP:0002079); Prominent nasal bridge (HP:0000426); Thin upper lip vermilion (HP:0000219); Obesity (HP:0004322)"
7 (MALE; P38Y),NM_020952.6:c.2509G>A(heterozygous),"Intellectual disability, moderate (HP:0002342); Delayed ability to walk (HP:0031936); Delayed speech and language development (HP:0000750); Typical absence seizure (HP:0011147); Hypotonia (HP:0001252); Facial asymmetry (HP:0000324); Ptosis (HP:0000508); Telecanthus (HP:0000506); Bulbous nose (HP:0000414); Micrognathia (HP:0000347); Short neck (HP:0000470); Exotropia (HP:0000577); Strabismus (HP:0000486); Athetosis (HP:0002305); Pes planus (HP:0001763)"
8 (FEMALE; P5Y),NM_015133.4:c.1732C>T(heterozygous),Spastic paraplegia (HP:0001258); Hypotonia (HP:0001252); Seizure (HP:0001250); Global developmental delay (HP:0001263)
10 (MALE; P4Y),NM_015133.4:c.2982C>G(heterozygous),Hypotonia (HP:0001252); Seizure (HP:0001250); Global developmental delay (HP:0001263)
1 (MALE; P14Y8M),NM_015133.4:c.65del(heterozygous),Ataxia (HP:0001251); Autism (HP:0000717); Hypotonia (HP:0001252); Global developmental delay (HP:0001263)
6 (FEMALE; P9Y),NM_015133.4:c.1331T>C(heterozygous),Hypotonia (HP:0001252); Global developmental delay (HP:0001263)
13 (FEMALE; P19Y),NM_015133.4:c.3436C>T(heterozygous),Oromotor apraxia (HP:0007301); Poor coordination (HP:0002370); Unsteady gait (HP:0002317); Hypotonia (HP:0001252); Cerebral visual impairment (HP:0100704); Global developmental delay (HP:0001263)
Individual 1 (MALE; P29Y),NM_015133.4:c.1732C>T(heterozygous),"Motor delay (HP:0001270); Delayed ability to walk (HP:0031936); Intellectual disability, severe (HP:0010864); Delayed speech and language development (HP:0000750); Spastic diplegia (HP:0001264); Loss of ambulation (HP:0002505); Cerebral atrophy (HP:0002059); Delayed CNS myelination (HP:0002188); Hypoplasia of the corpus callosum (HP:0002079); Round face (HP:0000311); Thin upper lip vermilion (HP:0000219); Precocious puberty (HP:0004322)"
Individual 4 (MALE; P5Y),NM_015133.4:c.3436C>T(heterozygous),"Motor delay (HP:0001270); Persistent head lag (HP:0032988); Delayed ability to roll over (HP:0032989); Delayed ability to sit (HP:0025336); Delayed ability to walk (HP:0031936); Intellectual disability, severe (HP:0010864); Autistic behavior (HP:0000729); Absent speech (HP:0001344); Delayed gross motor development (HP:0002194); Infantile muscular hypotonia (HP:0008947); Cerebral atrophy (HP:0002059); Hypoplasia of the corpus callosum (HP:0002079); Round face (HP:0000311); Prominent nasal bridge (HP:0000426); Thin upper lip vermilion (HP:0000219); Short stature (HP:0004322)"
5 (MALE; P6Y3M),NM_020952.6:c.2509G>A(heterozygous),"Intellectual disability, severe (HP:0010864); Delayed ability to walk (HP:0031936); Delayed speech and language development (HP:0000750); Autistic behavior (HP:0000729); Hypotonia (HP:0001252); Broad forehead (HP:0000337); Depressed nasal bridge (HP:0005280); Preauricular pit (HP:0004467); Broad thumb (HP:0011304); Cryptorchidism (HP:0000028); Micropenis (HP:0000054); Bilateral talipes equinovarus (HP:0001776)"


In [17]:
i3 = patient_d.get('Individual 3')
sv = i3.get_variant_list()