<h1>Generate table from collection of phenopackets</h1>
<p>A common task for the analysis of a cohort of individuals with pathogenic variants in a given gene is to generate a table with a summary of the findings. The pyphetool package has functionality to ingest a collection of phenopakcets and to generate several different kinds of tables that may be useful for publications of supplementary material sections.</p>

In [1]:
import phenopackets as php
from google.protobuf.json_format import MessageToDict, MessageToJson
from google.protobuf.json_format import Parse, ParseDict
import pandas as pd
import os
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from collections import defaultdict

from phenopackets import Phenopacket
from google.protobuf.json_format import Parse
import json
import hpotk
from pyphetools.output import *

import importlib.metadata
__version__ = importlib.metadata.version("pyphetools")
print(f"Using pyphetools version {__version__}")

Using pyphetools version 0.4.13


In [2]:
phenopacket_dir = "phenopackets"
ingestor = PhenopacketIngestor(indir=phenopacket_dir)

In [3]:
patient_d = ingestor.get_patient_dictionary()
print(f"We got {len(patient_d)} phenopackets")

We got 20 phenopackets


In [4]:
from hpotk.ontology import Ontology
from hpotk.ontology.load.obographs import load_ontology
if os.path.isfile('hpo_data/hp.json'):
    hpo_ontology = load_ontology('hpo_data/hp.json')
else:
    hpo_ontology = load_ontology('https://raw.githubusercontent.com/obophenotype/human-phenotype-ontology/master/hp.json')

In [5]:
focus_id = 'PMID_35146895_Individual_5'
ftab = FocusCountTable(patient_d=patient_d, focus_id=focus_id, ontology=hpo_ontology)

In [6]:
df = ftab.get_simple_table()

In [7]:
pd.set_option('display.max_rows', None)
df

Unnamed: 0_level_0,term,HP:id,focus,other,total,total_count
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
cardiovascular,Bradycardia,HP:0001662,0,1,1/20 (5.0%),1
constitutional,Pain,HP:0012531,0,1,1/20 (5.0%),1
digestive,Gastroesophageal reflux,HP:0002020,0,1,1/20 (5.0%),1
ear,Hearing impairment,HP:0000365,0,1,1/20 (5.0%),1
endocrine,Precocious puberty,HP:0000826,0,2,2/20 (10.0%),2
endocrine,Increased circulating osteocalcin level,HP:0031428,0,1,1/20 (5.0%),1
eye,Cerebral visual impairment,HP:0100704,0,2,2/20 (10.0%),2
eye,Hypertelorism,HP:0000316,0,1,1/20 (5.0%),1
growth,Short stature,HP:0004322,0,4,4/20 (20.0%),4
growth,Obesity,HP:0001513,0,3,3/20 (15.0%),3


In [8]:
df2 = ftab.get_thresholded_table(min_proportion=0.33)

Output terms with at least 7 counts


In [9]:
ftab.get_thresholded_table(min_proportion=0.2)

Output terms with at least 4 counts


Unnamed: 0_level_0,term,HP:id,focus,other,total,total_count
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
growth,Growth abnormality,HP:0001507,0,5,5/20 (25.0%),5
growth,Abnormality of body height,HP:0000002,0,4,4/20 (20.0%),4
growth,Short stature,HP:0004322,0,4,4/20 (20.0%),4
growth,Growth delay,HP:0001510,0,4,4/20 (20.0%),4
head/neck,Abnormality of the head,HP:0000234,0,7,7/20 (35.0%),7
head/neck,Abnormality of head or neck,HP:0000152,0,7,7/20 (35.0%),7
head/neck,Abnormal oral morphology,HP:0031816,0,6,6/20 (30.0%),6
head/neck,Abnormality of the face,HP:0000271,0,6,6/20 (30.0%),6
head/neck,Abnormality of the mouth,HP:0000153,0,6,6/20 (30.0%),6
head/neck,Abnormal oral cavity morphology,HP:0000163,0,6,6/20 (30.0%),6


In [10]:
from IPython.display import HTML, display

In [11]:
pplist = list(patient_d.values())

In [15]:
type(pplist[0])

pyphetools.output.simple_patient.SimplePatient

In [12]:
table = PhenopacketTable(phenopacket_list = pplist)

In [13]:
display(HTML(table.to_html()))

Individual,Genotype,Phenotypic features
Individual 2 (FEMALE; P27Y),NM_015133.4:c.1732C>T (heterozygous),"Motor delay (HP:0001270); Delayed ability to roll over (HP:0032989); Delayed ability to walk (HP:0031936); Spastic diplegia (HP:0001264); Cerebral atrophy (HP:0002059); Delayed CNS myelination (HP:0002188); Thin upper lip vermilion (HP:0000219); Round face (HP:0000311); Short stature (HP:0004322); Obesity (HP:0001513); Precocious puberty (HP:0000826); Intellectual disability, severe (HP:0010864); Delayed speech and language development (HP:0000750); Loss of ambulation (HP:0002505)"
12 (FEMALE; P4Y6M),NM_015133.4:c.3436C>T (heterozygous),Spastic diplegia (HP:0001264); Cerebral palsy (HP:0100021); Global developmental delay (HP:0001263)
2 (MALE; P4Y9M),NM_020952.6:c.2509G>A (heterozygous),"Intellectual disability, moderate (HP:0002342); Delayed ability to walk (HP:0031936); Delayed speech and language development (HP:0000750); Infantile spasms (HP:0012469); Hypotonia (HP:0001252)"
3 (FEMALE; P6Y),NM_020952.6:c.2509G>A (heterozygous),"Intellectual disability, moderate (HP:0002342); Delayed ability to walk (HP:0031936); Delayed speech and language development (HP:0000750); Autistic behavior (HP:0000729); Bilateral tonic-clonic seizure (HP:0002069); Hypotonia (HP:0001252)"
13 (FEMALE; P19Y),NM_015133.4:c.3436C>T (heterozygous),Oromotor apraxia (HP:0007301); Poor coordination (HP:0002370); Unsteady gait (HP:0002317); Hypotonia (HP:0001252); Cerebral visual impairment (HP:0100704); Global developmental delay (HP:0001263)
1 (MALE; P14Y8M),NM_015133.4:c.65del (heterozygous),Ataxia (HP:0001251); Autism (HP:0000717); Hypotonia (HP:0001252); Global developmental delay (HP:0001263)
17-mo-old boy (MALE; P1Y8M),NM_054027.4:c.1129_1131del (),Hearing impairment (HP:0000365); Nasal congestion (HP:0001742); Snoring (HP:0025267); Reduced bone mineral density (HP:0004349); Macrocephaly (HP:0000256); Mandibular prognathia (HP:0000303); Wide nasal bridge (HP:0000431); Choanal stenosis (HP:0000452); Prominent forehead (HP:0011220); Widely spaced teeth (HP:0000687); Hypertelorism (HP:0000316); Facial palsy (HP:0010628); Pain (HP:0012531); Flared metaphysis (HP:0003015); Increased bone mineral density (HP:0011001); Clavicular sclerosis (HP:0100923); Erlenmeyer flask deformity of the femurs (HP:0004975); Thickened calvaria (HP:0002684); Sclerosis of skull base (HP:0002694); Increased circulating osteocalcin level (HP:0031428); Increased circulating beta-C-terminal telopeptide concentration (HP:0031425); Decreased circulating calcifediol concentration (HP:0012053); Elevated circulating alkaline phosphatase concentration (HP:0003155)
Individual 3 (MALE; P16Y),NM_015133.4:c.1732C>T (heterozygous),"Motor delay (HP:0001270); Delayed ability to sit (HP:0025336); Delayed ability to walk (HP:0031936); Spastic diplegia (HP:0001264); Prominent nasal bridge (HP:0000426); Thin upper lip vermilion (HP:0000219); Short stature (HP:0004322); Obesity (HP:0001513); Intellectual disability, profound (HP:0002187); Delayed speech and language development (HP:0000750); Loss of ambulation (HP:0002505)"
4 (MALE; P5Y11M),NM_020952.6:c.2509G>A (heterozygous),"Intellectual disability, severe (HP:0010864); Delayed ability to walk (HP:0031936); Delayed speech and language development (HP:0000750); Autistic behavior (HP:0000729); Status epilepticus (HP:0002133); Hypotonia (HP:0001252)"
probandA (FEMALE; P1Y6M),NM_015133.4:c.1735C>T (),Bradycardia (HP:0001662); Apnea (HP:0002104); Delayed ability to crawl (HP:0033128); Cerebral palsy (HP:0100021); Thin corpus callosum (HP:0033725); Spastic diplegia (HP:0001264); Hypertonia (HP:0001276); Global developmental delay (HP:0001263); Microcephaly (HP:0000252); Central apnea (HP:0002871); Gastroesophageal reflux (HP:0002020); Lower limb asymmetry (HP:0100559); Scoliosis (HP:0002650); Dystonia (HP:0001332)


In [22]:
i3 = patient_d.get('Individual 3')
sv = i3.get_variant_list()

In [23]:
sv

[<pyphetools.output.simple_variant.SimpleVariant at 0x11f9eb690>]