<h1>Noon syndrome 1 phenopackets</h1>
<p>Data from <a href="https://pubmed.ncbi.nlm.nih.gov/28074573/">Pannone L, et al. (2017) Structural, Functional, and Clinical Characterization of a Novel PTPN11 Mutation Cluster Underlying Noonan Syndrome. Hum Mutat 38(4):451-459</a>.</p>

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import display, HTML
import pyphetools
from pyphetools.creation import *
from pyphetools.visualization import *
from pyphetools.validation import CohortValidator
print(f"pyphetools version {pyphetools.__version__}")

pyphetools version 0.9.79




In [2]:
PMID = "PMID:28074573"
title = "Structural, Functional, and Clinical Characterization of a Novel PTPN11 Mutation Cluster Underlying Noonan Syndrome"
citation = Citation(pmid=PMID, title=title)
parser = HpoParser()
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199", citation=citation)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2024-04-26


In [3]:
df = pd.read_excel('input/panone2017.xlsx')

In [4]:
df.head()

Unnamed: 0,Patient code,CR1,58760,KK1,HC1,AS1,GB1,GB2,GB3,ST1,...,MA1,MA2,HC2,GC1,HC3,HC4,HC5,HC5.1,HC6,ADL1
0,Sporadic/familial,sporadic,nd,sporadic,familial,sporadic,familial,familial,familial(father),familial,...,familial,familial(mother),sporadic,sporadic,familial,sporadic,familial,familial(mother),nd,sporadic
1,Amino acid substitution,L261F,L261F,L261F,L261F,L261F,L261H,L261H,L261H,L262F,...,L262R,L262R,L262R,R265Q,R265Q,R265Q,R265Q,R265Q,R265Q,L261F;R265Q
2,HGVS,NM_002834.5(PTPN11):c.781C>T (p.Leu261Phe),NM_002834.5(PTPN11):c.781C>T (p.Leu261Phe),NM_002834.5(PTPN11):c.781C>T (p.Leu261Phe),NM_002834.5(PTPN11):c.781C>T (p.Leu261Phe),NM_002834.5(PTPN11):c.781C>T (p.Leu261Phe),NM_002834.5(PTPN11):c.782T>A (p.Leu261His),NM_002834.5(PTPN11):c.782T>A (p.Leu261His),NM_002834.5(PTPN11):c.782T>A (p.Leu261His),NM_002834.5(PTPN11):c.784C>T (p.Leu262Phe),...,NM_002834.5(PTPN11):c.785T>G (p.Leu262Arg),NM_002834.5(PTPN11):c.785T>G (p.Leu262Arg),NM_002834.5(PTPN11):c.785T>G (p.Leu262Arg),NM_002834.5(PTPN11):c.794G>A (p.Arg265Gln),NM_002834.5(PTPN11):c.794G>A (p.Arg265Gln),NM_002834.5(PTPN11):c.794G>A (p.Arg265Gln),NM_002834.5(PTPN11):c.794G>A (p.Arg265Gln),NM_002834.5(PTPN11):c.794G>A (p.Arg265Gln),NM_002834.5(PTPN11):c.794G>A (p.Arg265Gln),NM_002834.5(PTPN11):c.794G>A (p.Arg265Gln)
3,transcript,c.781C>T,c.781C>T,c.781C>T,c.781C>T,c.781C>T,c.782T>A,c.782T>A,c.782T>A,c.784C>T,...,c.785T>G,c.785T>G,c.785T>G,c.794G>A,c.794G>A,c.794G>A,c.794G>A,c.794G>A,c.794G>A,c.794G>A
4,Sex,female,male,female,male,male,female,female,male,female,...,male,female,male,male,female,male,male,female,female,male


In [5]:
# need to convert to column-based format
dft = df.transpose()
dft.columns = dft.iloc[0]
dft.drop(dft.index[0], inplace=True)
dft.head()
dft['individual_id'] = dft.index

In [6]:
dft.head()

Patient code,Sporadic/familial,Amino acid substitution,HGVS,transcript,Sex,age at diagnosis (yrs),Feeding difficulties,Developmental delay,Short stature (<3rd percentile),Triangular face,...,Wide spaced nipples,Hyperextensibility of joints,Cognitive delay,Psychomotor delay,Hypotonia,Hearing loss,MRI findings,Ocular anomalies,Malignancy/hematologic findings,individual_id
CR1,sporadic,L261F,NM_002834.5(PTPN11):c.781C>T (p.Leu261Phe),c.781C>T,female,2.3,-,-,+,-,...,-,-,-,-,-,nd,nd,nd,-,CR1
58760,nd,L261F,NM_002834.5(PTPN11):c.781C>T (p.Leu261Phe),c.781C>T,male,2.6,nd,-,+,nd,...,nd,nd,-,-,-,-,nd,-,-,58760
KK1,sporadic,L261F,NM_002834.5(PTPN11):c.781C>T (p.Leu261Phe),c.781C>T,female,27.0,-,-,+,+,...,+,-,-,-,-,-,nd,-,-,KK1
HC1,familial,L261F,NM_002834.5(PTPN11):c.781C>T (p.Leu261Phe),c.781C>T,male,15.0,nd,nd,-a,+,...,+,-,-,nd,nd,-,nd,-,-,HC1
AS1,sporadic,L261F,NM_002834.5(PTPN11):c.781C>T (p.Leu261Phe),c.781C>T,male,6.3,-,-,+,+,...,+,+,-,-,+,-,IL,anisomethropia,-,AS1


In [7]:
column_mapper_list = list()

In [8]:
items = {
    'Feeding difficulties': ["Feeding difficulties","HP:0011968"],
    "Developmental delay" : ["Global developmental delay", "HP:0001263"],
    "Short stature (<3rd percentile)" :["Short stature", "HP:0004322"],
    "Triangular face": ["Triangular face", "HP:0000325"],
    "Hypertelorism" : ["Hypertelorism", "HP:0000316"],
    "Palpebral ptosis" : ["Ptosis", "HP:0000508"],
    "Downslanting palpebral fissures" : ["Downslanted palpebral fissures", "HP:0000494"],
    "Epicanthal folds" : ["Epicanthus", "HP:0000286"],
    "High/prominent forehead" : ["Prominent forehead","HP:0011220"],
    "Flat nasal bridge" : ["Depressed nasal bridge","HP:0005280"],
    "Long/deep philtrum": ["Long philtrum","HP:0000343"],
    "Dysmorphic/low set ears":["Low-set ears","HP:0000369"],
    "Low posterior hairline": ["Low posterior hairline","HP:0002162"],
    "Pulmonary valve stenosis": ["Pulmonic stenosis","HP:0001642"],
    "Pulmonary valve dysplasia":["Dysplastic pulmonary valve","HP:0005164"],
    "Septal defect": ["Atrial septal defect","HP:0001631"], # assume atrial, the most common. No other info
    "Hypertrophic cardiomyopathy":["Hypertrophic cardiomyopathy","HP:0001639"],
    "Café-au-lait spots": ["Few cafe-au-lait spots","HP:0007429"],
    # "Dark skin" uncertain what this is referring to
    "Keratosis pilaris": ["Keratosis pilaris","HP:0032152"],
    "Thin/sparse hair": ["Sparse hair","HP:0008070"],
    "Sparse eyebrows": ["Sparse eyebrow","HP:0045075"],
    "Short/webbed neck": ["Webbed neck","HP:0000465"],
    "Chest abnormalities": ["Shield chest","HP:0000914"],
    "Cubitus valgus" : ["Cubitus valgus","HP:0002967"],
    "Wide spaced nipples": ["Wide intermamillary distance","HP:0006610"],
    "Hyperextensibility of joints": ["Joint hypermobility","HP:0001382"],
    "Cognitive delay": ["Intellectual disability","HP:0001249"],
    "Psychomotor delay": ["Global developmental delay","HP:0001263"],
    "Hypotonia": ["Hypotonia","HP:0001252"],
    "Hearing loss": ["Hearing impairment","HP:0000365"]
    }
# The following two require Option mappers!
#MRI findings
#Ocular anomalies

In [9]:
for original_text, hpo_term in items.items():
    myMapper = SimpleColumnMapper(column_name=original_text,
                                  hpo_id=hpo_term[1], hpo_label=hpo_term[0], observed='+', excluded='−')
    #print(myMapper.preview_column(dft))
    column_mapper_list.append(myMapper)

In [10]:
# MRI findings	nd	nd	nd	nd	IL	nd	nd	nd	nd	ChiariI	-	nd	nd	nd	nd	nd	nd	nd	nd	-
# ChiariI - Chiari type I malformation HP:0007099
# no term for infrapeduncular lipoma (IL)
mri_d = {'ChiariI': 'Chiari type I malformation', }
mriMapper = OptionColumnMapper(column_name='MRI findings',concept_recognizer=hpo_cr, option_d=mri_d)
column_mapper_list.append(mriMapper)
mriMapper.preview_column(dft)

Unnamed: 0,mapping,count
0,Chiari type I malformation (HP:0007099) (observed),1


In [11]:
# Ocular anomalies	anisomethropia myopia	strabismus		strabismus	-
ocular_d = {"anisomethropia" : "Anisometropia",
           "strabismus": "Strabismus",
           "myopia": "Myopia"}
ocularMapper = OptionColumnMapper(column_name='Ocular anomalies',concept_recognizer=hpo_cr, option_d=ocular_d)
column_mapper_list.append(ocularMapper)
ocularMapper.preview_column(dft)

Unnamed: 0,mapping,count
0,Anisometropia (HP:0012803) (observed),1
1,Myopia (HP:0000545) (observed),1
2,Strabismus (HP:0000486) (observed),2


In [12]:
ptpn11_transcript='NM_002834.5'
vman = VariantManager(df=dft, 
                      gene_symbol="PTPN11",
                      allele_1_column_name="transcript", 
                      individual_column_name="individual_id", 
                      transcript=ptpn11_transcript)

[INFO] encoding variant "c.781C>T"
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_002834.5%3Ac.781C>T/NM_002834.5?content-type=application%2Fjson
[INFO] encoding variant "c.784C>T"
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_002834.5%3Ac.784C>T/NM_002834.5?content-type=application%2Fjson
[INFO] encoding variant "c.785T>G"
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_002834.5%3Ac.785T>G/NM_002834.5?content-type=application%2Fjson
[INFO] encoding variant "c.794G>A"
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_002834.5%3Ac.794G>A/NM_002834.5?content-type=application%2Fjson
[INFO] encoding variant "c.782T>A"
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_002834.5%3Ac.782T>A/NM_002834.5?content-type=application%2Fjson


In [13]:
varMapper = VariantColumnMapper(variant_d=vman.get_variant_d(),
                                variant_column_name='transcript', 
                                default_genotype="heterozygous")

In [14]:
ageMapper = AgeColumnMapper.by_year('age at diagnosis (yrs)')
#ageMapper.preview_column(dft['age at diagnosis (yrs)'])
sexMapper = SexColumnMapper(male_symbol='male', female_symbol='female', column_name='Sex')
#sexMapper.preview_column(dft['Sex'])

In [15]:
encoder = CohortEncoder(df=dft, 
                        hpo_cr=hpo_cr, 
                        column_mapper_list=column_mapper_list, 
                        individual_column_name="individual_id", 
                        age_of_onset_mapper=ageMapper, 
                        sexmapper=sexMapper,
                        metadata=metadata,
                        variant_mapper=varMapper)
noonan = Disease(disease_id="OMIM:163950", disease_label="Noonan syndrome 1")
encoder.set_disease(noonan)

In [16]:
individuals = encoder.get_individuals()
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.MONO_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
ERROR,INSUFFICIENT_HPOS,1
INFORMATION,NOT_MEASURED,450

ID,Level,Category,Message,HPO Term
PMID_28074573_HC4,ERROR,INSUFFICIENT_HPOS,Minimum HPO terms required 1 but only 0 found,


In [17]:
individuals = cvalidator.get_error_free_individual_list()
table = PhenopacketTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
CR1 (FEMALE; n/a),Noonan syndrome 1 (OMIM:163950),NM_002834.5:c.781C>T (heterozygous),Short stature (HP:0004322); Hypertelorism (HP:0000316); Prominent forehead (HP:0011220); Low-set ears (HP:0000369); Low posterior hairline (HP:0002162); Few cafe-au-lait spots (HP:0007429); Sparse hair (HP:0008070); Shield chest (HP:0000914)
58760 (MALE; n/a),Noonan syndrome 1 (OMIM:163950),NM_002834.5:c.781C>T (heterozygous),Short stature (HP:0004322); Pulmonic stenosis (HP:0001642)
KK1 (FEMALE; n/a),Noonan syndrome 1 (OMIM:163950),NM_002834.5:c.781C>T (heterozygous),Short stature (HP:0004322); Triangular face (HP:0000325); Epicanthus (HP:0000286); Long philtrum (HP:0000343); Low posterior hairline (HP:0002162); Sparse eyebrow (HP:0045075); Webbed neck (HP:0000465); Shield chest (HP:0000914); Wide intermamillary distance (HP:0006610)
HC1 (MALE; n/a),Noonan syndrome 1 (OMIM:163950),NM_002834.5:c.781C>T (heterozygous),Triangular face (HP:0000325); Hypertelorism (HP:0000316); Prominent forehead (HP:0011220); Webbed neck (HP:0000465); Shield chest (HP:0000914); Wide intermamillary distance (HP:0006610)
AS1 (MALE; n/a),Noonan syndrome 1 (OMIM:163950),NM_002834.5:c.781C>T (heterozygous),Short stature (HP:0004322); Triangular face (HP:0000325); Ptosis (HP:0000508); Downslanted palpebral fissures (HP:0000494); Prominent forehead (HP:0011220); Long philtrum (HP:0000343); Low-set ears (HP:0000369); Low posterior hairline (HP:0002162); Sparse eyebrow (HP:0045075); Webbed neck (HP:0000465); Shield chest (HP:0000914); Wide intermamillary distance (HP:0006610); Joint hypermobility (HP:0001382); Hypotonia (HP:0001252); Anisometropia (HP:0012803)
GB1 (FEMALE; n/a),Noonan syndrome 1 (OMIM:163950),NM_002834.5:c.782T>A (heterozygous),Hypertelorism (HP:0000316); Epicanthus (HP:0000286); Prominent forehead (HP:0011220); Depressed nasal bridge (HP:0005280); Low-set ears (HP:0000369); Low posterior hairline (HP:0002162); Pulmonic stenosis (HP:0001642); Webbed neck (HP:0000465)
GB2 (FEMALE; n/a),Noonan syndrome 1 (OMIM:163950),NM_002834.5:c.782T>A (heterozygous),Triangular face (HP:0000325); Hypertelorism (HP:0000316); Downslanted palpebral fissures (HP:0000494); Epicanthus (HP:0000286); Prominent forehead (HP:0011220); Depressed nasal bridge (HP:0005280); Low-set ears (HP:0000369); Low posterior hairline (HP:0002162); Pulmonic stenosis (HP:0001642); Webbed neck (HP:0000465)
GB3 (MALE; n/a),Noonan syndrome 1 (OMIM:163950),NM_002834.5:c.782T>A (heterozygous),Triangular face (HP:0000325); Hypertelorism (HP:0000316); Ptosis (HP:0000508); Downslanted palpebral fissures (HP:0000494); Epicanthus (HP:0000286); Depressed nasal bridge (HP:0005280); Low-set ears (HP:0000369); Low posterior hairline (HP:0002162); Webbed neck (HP:0000465); Myopia (HP:0000545)
ST1 (FEMALE; n/a),Noonan syndrome 1 (OMIM:163950),NM_002834.5:c.784C>T (heterozygous),Feeding difficulties (HP:0011968); Global developmental delay (HP:0001263); Short stature (HP:0004322); Triangular face (HP:0000325); Hypertelorism (HP:0000316); Downslanted palpebral fissures (HP:0000494); Low-set ears (HP:0000369); Low posterior hairline (HP:0002162); Webbed neck (HP:0000465); Shield chest (HP:0000914); Wide intermamillary distance (HP:0006610); Hypotonia (HP:0001252)
GB4 (FEMALE; n/a),Noonan syndrome 1 (OMIM:163950),NM_002834.5:c.785T>G (heterozygous),Global developmental delay (HP:0001263); Short stature (HP:0004322); Triangular face (HP:0000325); Hypertelorism (HP:0000316); Depressed nasal bridge (HP:0005280); Low-set ears (HP:0000369); Low posterior hairline (HP:0002162); Pulmonic stenosis (HP:0001642); Atrial septal defect (HP:0001631); Webbed neck (HP:0000465); Shield chest (HP:0000914); Intellectual disability (HP:0001249); Chiari type I malformation (HP:0007099)


In [18]:
Individual.output_individuals_as_phenopackets(individual_list=individuals,
                                              metadata=metadata)

We output 19 GA4GH phenopackets to the directory phenopackets
