<H1>MAPK8IP3:  Iwasawa et al (2019)</H1>
<p>This notebook uses the <a href="https://github.com/monarch-initiative/pyphetools" target="__blank">pyphetools</a> library
to create GA4GH phenopackets from the data in  
<a href="https://pubmed.ncbi.nlm.nih.gov/30945334/" target="__blank">Iwasawa S, et al. (2019) Recurrent de novo MAPK8IP3 variants cause neurological phenotypes. Ann Neurol. 85:927-933</a>. See the <a href="https://monarch-initiative.github.io/pyphetools/index.html" target="__blank">Pyphetools documentation</a> for more information about the code.</p>
<p>The original article describes dentified 5 individuals from four families with recurrent de novo variants in MAPK8IP3. </p>
<p>This notebook parses the information in the Supplemental Table (an Excel file).</p>

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import display, HTML
from pyphetools.creation import *
from pyphetools.visualization import *
from pyphetools.validation import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.9.4


<h2>Importing HPO data</h2>

In [2]:
PMID="PMID:30945334"
title = "Recurrent de novo MAPK8IP3 variants cause neurological phenotypes"
cite = Citation(pmid=PMID, title=title)
parser = HpoParser()
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199", citation=cite)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2023-10-09


<H2>Importing the supplemental file.</H2>

In [3]:
df = pd.read_excel('input/PMID_30945334.xlsx')
df.head()

Unnamed: 0,identifier,Individual 1,Individual 2,Individual 3,Individual 4,Individual 5
0,"Variant (hg19, NM_015133.4)",c.1732C>T,c.1732C>T,c.1732C>T,c.3436C>T,c.3436C>T
1,Protein variant,(p.Arg578Cys),(p.Arg578Cys),(p.Arg578Cys),(p.Arg1146Cys),(p.Arg1146Cys)
2,Age (yr),29,27,16,5,5
3,Sex,Male,Female,Male,Male,Female
4,Gestational ages (weeks),39,40,40,36,41


<h2>Converting to row-based format</h2>

In [4]:
dft = df.transpose()
dft.columns = dft.iloc[0]
dft.drop(dft.index[0], inplace=True)
dft['patient_id'] = dft.index
dft.head()

identifier,"Variant (hg19, NM_015133.4)",Protein variant,Age (yr),Sex,Gestational ages (weeks),Delayed motor development,Age at head control (months),Age at rolling (months),Age at unsupported sitting (months),Age at crawling (months),...,Facial dysmorphism,Round face,Prominent nasal bridge,Thin upper lip,Others,Other,Short stature,Obesity,Precocious puberty,patient_id
Individual 1,c.1732C>T,(p.Arg578Cys),29,Male,39,+,2.5,ND,7,Not acquired,...,,+,−,+,,,+,+,+,Individual 1
Individual 2,c.1732C>T,(p.Arg578Cys),27,Female,40,+,3.5,11,6,11,...,,+,−,+,,,+,+,+,Individual 2
Individual 3,c.1732C>T,(p.Arg578Cys),16,Male,40,+,4.0,6,Not acquired,ND,...,,−,+,+,,,+,+,ND,Individual 3
Individual 4,c.3436C>T,(p.Arg1146Cys),5,Male,36,+,5.0,7,15,18,...,,+,+,+,,,+,−,−,Individual 4
Individual 5,c.3436C>T,(p.Arg1146Cys),5,Female,41,+,5.0,6,11,18,...,,+,+,+,"Long and thick eyebrows, upper slanted palpebral fissures, anteverted nares, short philtrum",,−,−,−,Individual 5


## Column mappers

In [5]:
column_mapper_d = {}

In [6]:
delayedMotorMapper = SimpleColumnMapper(hpo_id='HP:0001270',
    hpo_label='Motor delay',
    observed='+',
    excluded='-')
delayedMotorMapper.preview_column(dft['Delayed motor development'])
column_mapper_d['Delayed motor development'] = delayedMotorMapper

In [7]:
headLagMapper = ThresholdedColumnMapper(hpo_id="HP:0032988", hpo_label="Persistent head lag", 
                                        threshold=4, call_if_above=True)
headLagMapper.preview_column(dft["Age at head control (months)"])
column_mapper_d["Age at head control (months)"] = headLagMapper

In [8]:
rollOverMappper = ThresholdedColumnMapper(hpo_id="HP:0032989", hpo_label="Delayed ability to roll over", 
                                        threshold=6, call_if_above=True)
rollOverMappper.preview_column(dft["Age at rolling (months)"])
column_mapper_d["Age at rolling (months)"] = rollOverMappper

In [9]:
# Age at unsupported sitting (months) 	threshold: 9 months
delayedSittingMapper =  ThresholdedColumnMapper(hpo_id="HP:0025336", hpo_label="Delayed ability to sit", 
                                        threshold=9, call_if_above=True, observed_code='Not acquired')
delayedSittingMapper.preview_column(dft["Age at unsupported sitting (months)"])
column_mapper_d["Age at unsupported sitting (months)"] = delayedSittingMapper

In [10]:
# Age at walking (months) - 15 months -- Delayed ability to walk HP:0031936
delayedWalkingMapper =  ThresholdedColumnMapper(hpo_id="HP:0031936", hpo_label="Delayed ability to walk", 
                                        threshold=15, call_if_above=True, observed_code='Not acquired')
delayedWalkingMapper.preview_column(dft["Age at walking (months)"])
column_mapper_d["Age at walking (months)"] = delayedWalkingMapper

In [11]:
items = {
    'History of regression': ["Developmental regression","HP:0002376"],
    'Spastic diplegia':['Spastic diplegia', 'HP:0001264'],     #       
    'Autistic behavior': ['Autistic behavior', 'HP:0000729'],  # 
    'Infantile hypotonia':['Infantile muscular hypotonia','HP:0008947'], # 
    'Cerebral atrophy':["Cerebral atrophy","HP:0002059"], #
    'Delayed myelination':["Delayed CNS myelination","HP:0002188"], #
    'Corpus callosum hypoplasia':['Hypoplasia of the corpus callosum','HP:0002079'],#
    'Prominent nasal bridge':['Prominent nasal bridge','HP:0000426'], #
    'Thin upper lip':["Thin upper lip vermilion","HP:0000219"],
    "Round face":["Round face","HP:0000311"],
    "Short stature":["Short stature","HP:0004322"],
    "Obesity":["Obesity", "HP:0001513"],
    "Precocious puberty":["Precocious puberty", "HP:0000826"],
}
item_column_mapper_d = hpo_cr.initialize_simple_column_maps(column_name_to_hpo_label_map=items, observed='+',
    excluded='-')
print(f"We created {len(item_column_mapper_d)} simple column mappers")
# Transfer to column_mapper_d
for k, v in item_column_mapper_d.items():
    column_mapper_d[k] = v

We created 13 simple column mappers


In [12]:
severity_d = {'Severe': 'Intellectual disability, severe',
             'Profound': 'Intellectual disability, profound'}
idMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=severity_d)
#idMapper.preview_column(dft['Intellectual disability'])
column_mapper_d['Intellectual disability'] = idMapper

In [13]:
# Language skills
language_d = {'Simple two-word sentences': 'Delayed speech and language development',
             'Simple words': 'Delayed speech and language development',
             'Nonverbal': 'Absent speech'}
languageMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=language_d)
# languageMapper.preview_column(dft['Language skills'])
column_mapper_d['Language skills'] = languageMapper

In [14]:
# Gross motor skills Wheelchair bound 	Wheelchair bound 	Wheelchair bound 	Cruising (5y)	Walking  (5y)
gms_d = {
    "Wheelchair bound": "Loss of ambulation",
    "Cruising": "Delayed gross motor development"
}
gmsMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=gms_d)
# gmsMapper.preview_column(dft['Gross motor skills'])
column_mapper_d['Gross motor skills'] = gmsMapper

In [15]:
# Others
other_d = {'upper slanted palpebral fissures': 'Upslanted palpebral fissure'}
otherMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=other_d)
#otherMapper.preview_column(dft['Others'])
column_mapper_d['Others'] = otherMapper

<h2>Variant Data</h2>
<p>MAPK8IP3 variants reported by Platzer et al, Iwasama et al., and Yechieli et al. We have transformed the variants, which were originally expressed using the transcript  NM_015133.4 to be expressed using the MANE select transcript NM_001318852.2</p>
<p>pyphetools maps variants using the VariantValidator API.</p>

In [16]:
d_NM_015133_to_NM_001318852 = {
"c.45C>G": "c.45C>G",
"c.65delG":"c.65del",
"c.79G>T":"c.79G>T",
"c.111C>G": "c.111C>G",
"c.1198G>A": "c.1201G>A",
"c.1331T>C": "c.1334T>C",
"c.1574G>A": "c.1577G>A",
"c.1732C>T": "c.1735C>T",
"c.2982C>G": "c.2985C>G",
"c.3436C>T": "c.3439C>T"
}

dft['NM_001318852'] = dft['Variant (hg19, NM_015133.4)'].apply(lambda x: d_NM_015133_to_NM_001318852.get(x.replace(" ","")))
dft['NM_001318852']

Individual 1    c.1735C>T
Individual 2    c.1735C>T
Individual 3    c.1735C>T
Individual 4    c.3439C>T
Individual 5    c.3439C>T
Name: NM_001318852, dtype: object

In [17]:
transcript='NM_001318852.2'
vvalidator = VariantValidator(genome_build="hg38", transcript=transcript)
var_d = {}
for v in dft['NM_001318852']:
    var = vvalidator.encode_hgvs(v)
    var_d[v] = var

varMapper = VariantColumnMapper(variant_d=var_d,
                                variant_column_name='NM_001318852', 
                                default_genotype='heterozygous')
#varMapper.preview_column(dft['NM_001318852'])

https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001318852.2%3Ac.1735C>T/NM_001318852.2?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001318852.2%3Ac.1735C>T/NM_001318852.2?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001318852.2%3Ac.1735C>T/NM_001318852.2?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001318852.2%3Ac.3439C>T/NM_001318852.2?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001318852.2%3Ac.3439C>T/NM_001318852.2?content-type=application%2Fjson


<h2>Demographic data</h2>

In [18]:
ageMapper = AgeColumnMapper.by_year('Age (yr)')
#ageMapper.preview_column(dft['Age (yr)'])

In [19]:
sexMapper = SexColumnMapper(male_symbol='Male', female_symbol='Female', column_name='Sex')
#sexMapper.preview_column(dft['Sex'])

In [20]:
encoder = CohortEncoder(df=dft, hpo_cr=hpo_cr, column_mapper_d=column_mapper_d, 
                        individual_column_name="patient_id", 
                        agemapper=ageMapper, 
                        sexmapper=sexMapper,
                        metadata=metadata,
                        variant_mapper=varMapper)
disease_id = "OMIM:618443"
disease_label = "Neurodevelopmental disorder with or without variable brain abnormalities"
disease = Disease(disease_id=disease_id, disease_label=disease_label)
encoder.set_disease(disease=disease)

In [21]:
individuals = encoder.get_individuals()
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.MONO_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
WARNING,REDUNDANT,6
INFORMATION,NOT_MEASURED,30


In [22]:
individuals = cvalidator.get_error_free_individual_list()
table = PhenopacketTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
Individual 1 (MALE; P29Y),Neurodevelopmental disorder with or without variable brain abnormalities (OMIM:618443),NM_001318852.2:c.1735C>T (heterozygous),"Delayed ability to walk (HP:0031936); Spastic diplegia (HP:0001264); Cerebral atrophy (HP:0002059); Delayed CNS myelination (HP:0002188); Thin upper lip vermilion (HP:0000219); Round face (HP:0000311); Short stature (HP:0004322); Obesity (HP:0001513); Precocious puberty (HP:0000826); Intellectual disability, severe (HP:0010864); Delayed speech and language development (HP:0000750); Loss of ambulation (HP:0002505); excluded: Persistent head lag (HP:0032988); excluded: Delayed ability to sit (HP:0025336)"
Individual 2 (FEMALE; P27Y),Neurodevelopmental disorder with or without variable brain abnormalities (OMIM:618443),NM_001318852.2:c.1735C>T (heterozygous),"Delayed ability to roll over (HP:0032989); Delayed ability to walk (HP:0031936); Spastic diplegia (HP:0001264); Cerebral atrophy (HP:0002059); Delayed CNS myelination (HP:0002188); Thin upper lip vermilion (HP:0000219); Round face (HP:0000311); Short stature (HP:0004322); Obesity (HP:0001513); Precocious puberty (HP:0000826); Intellectual disability, severe (HP:0010864); Delayed speech and language development (HP:0000750); Loss of ambulation (HP:0002505); excluded: Persistent head lag (HP:0032988); excluded: Delayed ability to sit (HP:0025336)"
Individual 3 (MALE; P16Y),Neurodevelopmental disorder with or without variable brain abnormalities (OMIM:618443),NM_001318852.2:c.1735C>T (heterozygous),"Delayed ability to sit (HP:0025336); Delayed ability to walk (HP:0031936); Spastic diplegia (HP:0001264); Prominent nasal bridge (HP:0000426); Thin upper lip vermilion (HP:0000219); Short stature (HP:0004322); Obesity (HP:0001513); Intellectual disability, profound (HP:0002187); Delayed speech and language development (HP:0000750); Loss of ambulation (HP:0002505); excluded: Persistent head lag (HP:0032988); excluded: Delayed ability to roll over (HP:0032989)"
Individual 4 (MALE; P5Y),Neurodevelopmental disorder with or without variable brain abnormalities (OMIM:618443),NM_001318852.2:c.3439C>T (heterozygous),"Persistent head lag (HP:0032988); Delayed ability to roll over (HP:0032989); Delayed ability to sit (HP:0025336); Delayed ability to walk (HP:0031936); Autistic behavior (HP:0000729); Infantile muscular hypotonia (HP:0008947); Cerebral atrophy (HP:0002059); Prominent nasal bridge (HP:0000426); Thin upper lip vermilion (HP:0000219); Round face (HP:0000311); Short stature (HP:0004322); Intellectual disability, severe (HP:0010864); Absent speech (HP:0001344)"
Individual 5 (FEMALE; P5Y),Neurodevelopmental disorder with or without variable brain abnormalities (OMIM:618443),NM_001318852.2:c.3439C>T (heterozygous),"Persistent head lag (HP:0032988); Delayed ability to sit (HP:0025336); Delayed ability to walk (HP:0031936); Spastic diplegia (HP:0001264); Autistic behavior (HP:0000729); Infantile muscular hypotonia (HP:0008947); Cerebral atrophy (HP:0002059); Delayed CNS myelination (HP:0002188); Prominent nasal bridge (HP:0000426); Thin upper lip vermilion (HP:0000219); Round face (HP:0000311); Intellectual disability, severe (HP:0010864); Absent speech (HP:0001344); Thick eyebrow (HP:0000574); Upslanted palpebral fissure (HP:0000582); Anteverted nares (HP:0000463); Short philtrum (HP:0000322); excluded: Delayed ability to roll over (HP:0032989)"


In [23]:
Individual.output_individuals_as_phenopackets(individual_list=individuals, 
                                              metadata=metadata, 
                                              outdir="phenopackets")

We output 5 GA4GH phenopackets to the directory phenopackets


In [24]:
# pxf validate --hpo hp.json *.json
# no errors