<H1>Creation of phenopackets from PMID:31278393</H1>
<P>In this notebook, we show how to create phenopackets from table 1 of <a href="https://pubmed.ncbi.nlm.nih.gov/31278393/" target="__blank">Dyment DA et al. (2019) De novo substitutions of TRPM3 cause intellectual disability and epilepsy. Eur J Hum Genet. 27:1611-1618</a>.</P>

In [1]:
import phenopackets as php
from google.protobuf.json_format import MessageToDict, MessageToJson
from google.protobuf.json_format import Parse, ParseDict
from IPython.display import display, HTML
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from collections import defaultdict
import pyphetools
from pyphetools.creation import *
from pyphetools.validation import CohortValidator
from pyphetools.visualization import *

print(f"pyphetools version {pyphetools.__version__}")

pyphetools version 0.9.77




In [2]:
# Import HPO data
parser = HpoParser()
hpo_ontology = parser.get_ontology()
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
PMID = "PMID:31278393"
title = "De novo substitutions of TRPM3 cause intellectual disability and epilepsy"
cite = Citation(pmid=PMID, title=title)
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199", citation=cite)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2024-04-26


In [3]:
df = pd.read_excel('data/PMID_31278393.xlsx')
df.head(2)

Unnamed: 0,Individual,1,2,3,4,5,6,7,8
0,cDNA (NM_020952.4),c.2509G>A,c.2509G>A,c.2509G>A,c.2509G>A,c.2509G>A,c.2509G>A,c.2509G>A,c.2810C>A
1,Polypeptide (NP_066003.3),p.(Val837Met),p.(Val837Met),p.(Val837Met),p.(Val837Met),p.(Val837Met),p.(Val837Met),p.(Val837Met),p.(Pro937Gln)


In [4]:
# Convert to row based format
dft = df.transpose()
dft.columns = dft.iloc[0]
dft.drop(dft.index[0], inplace=True)
dft.head()
# Note that the Individual is now the row index but we need it to be available as a column
# Therefore, add it as an explicit, new column
dft['individual_id'] = dft.index
dft.head()

Individual,cDNA (NM_020952.4),Polypeptide (NP_066003.3),Genomic DNA (NC_000009.11),Zygosity,Segregation,Clinical features,Gestation (weeks),Perinatal history,Birth weight (kg),Sex,...,Craniofacial gestalt,Morphological features,Other clinical features,Brain MRI,Apparent heat or pain insensitivity,Genetic investigations,aCGH,Fragile X,Other (nondiagnostic) genetic investigations,individual_id
1,c.2509G>A,p.(Val837Met),g.73213379C>T,Heterozygous,De novo,,38,C/S,NR,M,...,Nondysmorphic,"Broad forehead, deeply set eyes, ptosis, bulbous nasal tip, micrognathia, prominent lobule of ear, tapering fingers",C1 spinal stenosis; Chiari I malformation; scoliosis; torticollis; plagiocephaly; thickened filum terminale; bilateral talipes equinovarus; strabismus (exotropia OU),Possible mild cerebral volume loss,+ (Heat),,Normal,Normal,"ID panel (170 genes), PHF6",1
2,c.2509G>A,p.(Val837Met),g.73213379C>T,Heterozygous,De novo,,40,N,3.6,M,...,Nondysmorphic,"Short philtrum, long nose, turricephaly",EMG/NCS normal,Normal,NR,,Normal,Normal,NR,2
3,c.2509G>A,p.(Val837Met),g.73213379C>T,Heterozygous,De novo,,42,N,3.2,F,...,Nondysmorphic,NR,−,Normal,NR,,Normal,Normal,"MECP2, SMA",3
4,c.2509G>A,p.(Val837Met),g.73213379C>T,Heterozygous,De novo,,39,N,3.48,M,...,NR,"Broad forehead, deeply set eyes, flat midface, short philtrum, micrognathia, broad halluces, fifth-finger clinodactyly, pectus excavatum",Strabismus,Normal,NR,,Normal,Normal,NR,4
5,c.2509G>A,p.(Val837Met),g.73213379C>T,Heterozygous,De novo,,38 + 3,N,3.378,M,...,NR,"Broad forehead, low nasal bridge, unilateral preauricular pit, short broad thumbs","Cryptorchidism, micropenis, bilateral talipes equinovarus","Ventriculomegaly, nonspecific periventricular white matter hyperintensities",+ (Pain),,Normal,,NR,5


<h2>Column mappers</h2>

In [5]:
column_mapper_list = list()

In [6]:
# Developmental delay/intellectual disability  -- use code to intellectual disability 
severity_id = {'+ (Severe)': 'Intellectual disability, severe',
               '+ (Moderate)': 'Intellectual disability, moderate',
               '+ (Moderate-to-severe)':'Intellectual disability, moderate'}
idMapper = OptionColumnMapper(column_name='Developmental delay/intellectual disability',
                              concept_recognizer=hpo_cr, option_d=severity_id)
column_mapper_list.append(idMapper)
idMapper.preview_column(dft)

Unnamed: 0,mapping,count
0,"Intellectual disability, severe (HP:0010864) (observed)",4
1,"Intellectual disability, moderate (HP:0002342) (observed)",4


In [7]:
# By inspection, all entries of this column indicate delayed ability to walk. Therefore, use ConstantColumnMapper
# the alternative would be to code each of the varied entries
delayedWalkColumn = ConstantColumnMapper(column_name='Ambulate independently (age achieved)',
                                         hpo_id='HP:0031936', hpo_label='Delayed ability to walk')
column_mapper_list.append(delayedWalkColumn)
delayedWalkColumn.preview_column(dft)

Unnamed: 0,mapping,count
0,+ (5 years) -> observed,1
1,+ (With walker) (3 years) -> observed,1
2,− -> observed,2
3,+ (With walker) -> observed,1
4,+ (4.5 years) -> observed,1
5,+ (4 years) -> observed,1
6,+ (3.5 years) -> observed,1


In [8]:
## Same comments for speech
delayedSpeechColumn = ConstantColumnMapper(column_name='Any speech (age attained)',
                                           hpo_id='HP:0000750', hpo_label='Delayed speech and language development')
column_mapper_list.append(delayedSpeechColumn)
delayedSpeechColumn.preview_column(dft)

Unnamed: 0,mapping,count
0,+ (5 years) -> observed,2
1,− -> observed,5
2,+ (2.5 years) -> observed,1


In [9]:
## 'Autism-like features' # Autistic behavior HP:
autisticFeaturesMapper = SimpleColumnMapper(column_name='Autism-like features',
                                            hpo_id='HP:0000729', hpo_label='Autistic behavior', observed="+", excluded="−")
column_mapper_list.append(autisticFeaturesMapper)
autisticFeaturesMapper.preview_column(dft)

Unnamed: 0,mapping,count
0,"original value: ""+ "" -> HP: Autistic behavior (HP:0000729) (observed)",4
1,"original value: ""NR "" -> HP: Autistic behavior (HP:0000729) (not measured)",2
2,"original value: ""− "" -> HP: Autistic behavior (HP:0000729) (excluded)",2


In [10]:
seizure_d = {'Absence': 'Typical absence seizure',
             'Infantile spasms': 'Infantile spasms',
             'GTC':'Bilateral tonic-clonic seizure',
             'ESES': 'Status epilepticus'}
seizureMapper = OptionColumnMapper(column_name='Seizure types',
                                   concept_recognizer=hpo_cr, option_d=seizure_d)
column_mapper_list.append(seizureMapper)
seizureMapper.preview_column(dft)

Unnamed: 0,mapping,count
0,Typical absence seizure (HP:0011147) (observed),4
1,Infantile spasms (HP:0012469) (observed),1
2,Bilateral tonic-clonic seizure (HP:0002069) (observed),2
3,Status epilepticus (HP:0002133) (observed),1


In [11]:
# Hypotonia HP:0001252 -- note that we include   + (mixed tone abnormality)  as Hypotonia
hypotoniaMapper = SimpleColumnMapper(column_name='Hypotonia',
                                     hpo_id='HP:0001252', hpo_label='Hypotonia', 
                                     observed=['+', '+ (mixed tone abnormality)'], excluded='−')
column_mapper_list.append(hypotoniaMapper)
hypotoniaMapper.preview_column(dft)

Unnamed: 0,mapping,count
0,"original value: ""+ "" -> HP: Hypotonia (HP:0001252) (observed)",6
1,"original value: ""− "" -> HP: Hypotonia (HP:0001252) (excluded)",1
2,"original value: ""+ (mixed tone abnormality) "" -> HP: Hypotonia (HP:0001252) (observed)",1


In [12]:
morph_d = {
    'bulbous nasal tip': 'Bulbous nose'
}
morphologicalMapper = OptionColumnMapper(column_name='Morphological features',
                                         concept_recognizer=hpo_cr, option_d=morph_d)
column_mapper_list.append(morphologicalMapper)
morphologicalMapper.preview_column(dft)

Unnamed: 0,mapping,count
0,Broad forehead (HP:0000337) (observed),4
1,Ptosis (HP:0000508) (observed),2
2,Bulbous nose (HP:0000414) (observed),2
3,Micrognathia (HP:0000347) (observed),4
4,Tapered finger (HP:0001182) (observed),1
5,Short philtrum (HP:0000322) (observed),3
6,Long nose (HP:0003189) (observed),1
7,Turricephaly (HP:0000262) (observed),1
8,Midface retrusion (HP:0011800) (observed),1
9,Broad hallux (HP:0010055) (observed),1


In [13]:
other_d = {
    'Chiari I malformation': 'Chiari type I malformation',
    'C1 spinal stenosis':'Cervical spinal canal stenosis'
}
otherMapper = OptionColumnMapper(column_name='Other clinical features',
                                 concept_recognizer=hpo_cr, option_d=other_d)
column_mapper_list.append(otherMapper)
otherMapper.preview_column(dft)

Unnamed: 0,mapping,count
0,Cervical spinal canal stenosis (HP:0008445) (observed),1
1,Chiari type I malformation (HP:0007099) (observed),1
2,Scoliosis (HP:0002650) (observed),3
3,Torticollis (HP:0000473) (observed),1
4,Plagiocephaly (HP:0001357) (observed),1
5,Bilateral talipes equinovarus (HP:0001776) (observed),2
6,Exotropia (HP:0000577) (observed),2
7,Strabismus (HP:0000486) (observed),4
8,Cryptorchidism (HP:0000028) (observed),1
9,Micropenis (HP:0000054) (observed),1


In [14]:
ageMapper = AgeColumnMapper.by_year('Age (years)')
#ageMapper.preview_column(dft)
sexMapper = SexColumnMapper(male_symbol='M', female_symbol='F', column_name='Sex')
#sexMapper.preview_column(dft)

In [15]:
TRPM3_transcript='NM_020952.6'
TRPM3_id = "HGNC:17992"
vman = VariantManager(df=dft, individual_column_name="individual_id", gene_id=TRPM3_id, gene_symbol="TRPM3", transcript=TRPM3_transcript,allele_1_column_name='cDNA (NM_020952.4) ')

In [16]:
# Note there is an extra space at the end of the column name
varMapper = VariantColumnMapper(variant_d=vman.get_variant_d(),
                                variant_column_name='cDNA (NM_020952.4) ', 
                                default_genotype='heterozygous')
#varMapper.preview_column(column=dft['cDNA (NM_020952.4) '])

In [17]:
encoder = CohortEncoder(df=dft, 
                        hpo_cr=hpo_cr, 
                        column_mapper_list=column_mapper_list, 
                        individual_column_name="individual_id", 
                        age_at_last_encounter_mapper=ageMapper, 
                        sexmapper=sexMapper,
                        variant_mapper=varMapper,
                        metadata=metadata)
omim_id = "OMIM:620224"
omim_label = "Neurodevelopmental disorder with hypotonia, dysmorphic facies, and skeletal anomalies, with or without seizures"
disease = Disease(disease_id=omim_id, disease_label=omim_label)
encoder.set_disease(disease=disease)

In [18]:
individuals = encoder.get_individuals()

<h2>Validate</h2>
<p>pyphetools offers a quick validation that phenopackets contain a mininum number of variants and HPO terms.
We recommend additional validation with <a href="https://github.com/phenopackets/phenopacket-tools">phenopacket-tools</a>.</p>

In [19]:
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1,
                                allelic_requirement=AllelicRequirement.MONO_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
WARNING,REDUNDANT,2
INFORMATION,NOT_MEASURED,2


<h2>Visualization</h2>
<p>pyphetools can output summary tables of the main data contained in the cohort.</p>

In [20]:
individuals = cvalidator.get_error_free_individual_list()
table = IndividualTable(individuals)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
1 (MALE; P16Y),"Neurodevelopmental disorder with hypotonia, dysmorphic facies, and skeletal anomalies, with or without seizures (OMIM:620224)",NM_020952.6:c.2509G>A (heterozygous),"Intellectual disability, severe (HP:0010864); Delayed ability to walk (HP:0031936); Delayed speech and language development (HP:0000750); Autistic behavior (HP:0000729); Typical absence seizure (HP:0011147); Hypotonia (HP:0001252); Broad forehead (HP:0000337); Ptosis (HP:0000508); Bulbous nose (HP:0000414); Micrognathia (HP:0000347); Tapered finger (HP:0001182); Cervical spinal canal stenosis (HP:0008445); Chiari type I malformation (HP:0007099); Scoliosis (HP:0002650); Torticollis (HP:0000473); Plagiocephaly (HP:0001357); Bilateral talipes equinovarus (HP:0001776); Exotropia (HP:0000577)"
2 (MALE; P4Y9M),"Neurodevelopmental disorder with hypotonia, dysmorphic facies, and skeletal anomalies, with or without seizures (OMIM:620224)",NM_020952.6:c.2509G>A (heterozygous),"Intellectual disability, moderate (HP:0002342); Delayed ability to walk (HP:0031936); Delayed speech and language development (HP:0000750); Infantile spasms (HP:0012469); Hypotonia (HP:0001252); Short philtrum (HP:0000322); Long nose (HP:0003189); Turricephaly (HP:0000262)"
3 (FEMALE; P6Y),"Neurodevelopmental disorder with hypotonia, dysmorphic facies, and skeletal anomalies, with or without seizures (OMIM:620224)",NM_020952.6:c.2509G>A (heterozygous),"Intellectual disability, moderate (HP:0002342); Delayed ability to walk (HP:0031936); Delayed speech and language development (HP:0000750); Autistic behavior (HP:0000729); Bilateral tonic-clonic seizure (HP:0002069); Hypotonia (HP:0001252)"
4 (MALE; P5Y11M),"Neurodevelopmental disorder with hypotonia, dysmorphic facies, and skeletal anomalies, with or without seizures (OMIM:620224)",NM_020952.6:c.2509G>A (heterozygous),"Intellectual disability, severe (HP:0010864); Delayed ability to walk (HP:0031936); Delayed speech and language development (HP:0000750); Autistic behavior (HP:0000729); Status epilepticus (HP:0002133); Hypotonia (HP:0001252); Broad forehead (HP:0000337); Midface retrusion (HP:0011800); Short philtrum (HP:0000322); Micrognathia (HP:0000347); Broad hallux (HP:0010055); Finger clinodactyly (HP:0040019); Pectus excavatum (HP:0000767); Strabismus (HP:0000486)"
5 (MALE; P6Y3M),"Neurodevelopmental disorder with hypotonia, dysmorphic facies, and skeletal anomalies, with or without seizures (OMIM:620224)",NM_020952.6:c.2509G>A (heterozygous),"Intellectual disability, severe (HP:0010864); Delayed ability to walk (HP:0031936); Delayed speech and language development (HP:0000750); Autistic behavior (HP:0000729); Hypotonia (HP:0001252); Broad forehead (HP:0000337); Depressed nasal bridge (HP:0005280); Preauricular pit (HP:0004467); Broad thumb (HP:0011304); Cryptorchidism (HP:0000028); Micropenis (HP:0000054); Bilateral talipes equinovarus (HP:0001776)"
6 (MALE; P28Y),"Neurodevelopmental disorder with hypotonia, dysmorphic facies, and skeletal anomalies, with or without seizures (OMIM:620224)",NM_020952.6:c.2509G>A (heterozygous),"Intellectual disability, severe (HP:0010864); Delayed ability to walk (HP:0031936); Delayed speech and language development (HP:0000750); Bilateral tonic-clonic seizure (HP:0002069); Typical absence seizure (HP:0011147); Micrognathia (HP:0000347); High palate (HP:0000218); Neonatal hypoglycemia (HP:0001998); Hip dysplasia (HP:0001385); Scoliosis (HP:0002650); excluded: Autistic behavior (HP:0000729); excluded: Hypotonia (HP:0001252)"
7 (MALE; P38Y),"Neurodevelopmental disorder with hypotonia, dysmorphic facies, and skeletal anomalies, with or without seizures (OMIM:620224)",NM_020952.6:c.2509G>A (heterozygous),"Intellectual disability, moderate (HP:0002342); Delayed ability to walk (HP:0031936); Delayed speech and language development (HP:0000750); Typical absence seizure (HP:0011147); Hypotonia (HP:0001252); Facial asymmetry (HP:0000324); Ptosis (HP:0000508); Telecanthus (HP:0000506); Bulbous nose (HP:0000414); Micrognathia (HP:0000347); Short neck (HP:0000470); Exotropia (HP:0000577); Athetosis (HP:0002305); Pes planus (HP:0001763)"
8 (FEMALE; P8Y1M),"Neurodevelopmental disorder with hypotonia, dysmorphic facies, and skeletal anomalies, with or without seizures (OMIM:620224)",NM_020952.6:c.2810C>A (heterozygous),"Intellectual disability, moderate (HP:0002342); Delayed ability to walk (HP:0031936); Delayed speech and language development (HP:0000750); Typical absence seizure (HP:0011147); Hypotonia (HP:0001252); Broad forehead (HP:0000337); Upslanted palpebral fissure (HP:0000582); Anteverted nares (HP:0000463); Short philtrum (HP:0000322); Wide mouth (HP:0000154); Facial capillary hemangioma (HP:0000996); Choreoathetosis (HP:0001266); Strabismus (HP:0000486); Scoliosis (HP:0002650); excluded: Autistic behavior (HP:0000729)"


<h2>Output</h2>

In [21]:
output_directory = "phenopackets"
Individual.output_individuals_as_phenopackets(individual_list=individuals,
                                             metadata=metadata,
                                             outdir=output_directory)

We output 8 GA4GH phenopackets to the directory phenopackets
