<H1>MAPK8IP3: Platzer et al (2019)</H1>
<p>This notebook uses the <a href="https://github.com/monarch-initiative/pyphetools" target="__blank">pyphetools</a> library
to create GA4GH phenopackets from the data in  <a href="https://pubmed.ncbi.nlm.nih.gov/30612693/" target="__blank">Platzer K., et al. (2019) De Novo Variants in MAPK8IP3 Cause Intellectual Disability with Variable Brain Anomalies</a>. See the <a href="https://monarch-initiative.github.io/pyphetools/index.html" target="__blank">Pyphetools documentation</a> for more information about the code.</p>
<p>The original article describes de novo variants in MAPK8IP3 in 13 unrelated individuals presenting with an overlapping phenotype of mild to severe intellectual disability. </p>
<p>This notebook parses the information in Supplemental Table S1 (an Excel file).</p>

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import display, HTML
from pyphetools.creation import *
from pyphetools.visualization import *
from pyphetools.validation import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.9.78


<h2>Importing HPO data</h2>

In [2]:
PMID = "PMID:30612693"
title = "De Novo Variants in MAPK8IP3 Cause Intellectual Disability with Variable Brain Anomalies"
cite = Citation(pmid=PMID, title=title)
parser = HpoParser(hpo_json_file="../hp.json")
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199", citation=cite)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2024-04-03


<H2>Importing supplemental file S1.</H2>

In [3]:
df = pd.read_excel('input/platzer_2019_supplement.xlsx')
df.head(2)

Unnamed: 0,Indvidual\nin\nmanuscript,g.(hg19) Chr16:,Transcript\nNM_015133.4\nc.,p.,origin,genetic testing,Sex,age at last assesment,prenatal period,Exam at birth,...,neurological examination,result of external MRI,seizures,Sz onset and Sz types,AEDs used,Sz outcome,EEG,Additional symptoms,family history,further results of genetic testing
0,1,1756405,c.65delG,p.Gly22Alafs*3,de novo,TrioWES,M,14 y 8 m,,41 weeks:\nlength: 53.3 cm\nweight: 3.941 kg\nOFC: NA,...,ataxia,"mild cerebellar atrophy, hypointensity of the globi pallidi and substantia nigra, possible mild degree of abnormal iron or mineral deposition",no,,,,,"speech is ataxic but speaks in sentences/short phrases; attention issues, impulse control and emotional lability, OCD symptoms; recently developed scoliosis",unremarkable,
1,2,1756419,c.79G>T,p.Glu27*,de novo,SingleWES,M,4 y,,length: 49 cm\nweigth: 3215 g\nOFC: 35 cm,...,ataxia,normal,no,,,,,pre-natal pelvi-ureteric junction stenosis (spontaneous resolution at 6 m),,


<h2>Collecting column mappers</h2>

In [4]:
column_mapper_list = list()

In [5]:
neuro_exam_custom_map = {'low extremity weakness': 'Lower limb muscle weakness',
                         'unstable gait': 'Unsteady gait',
                         'dysfunction of the corticospinal pathways':'Upper motor neuron dysfunction',
                         'spastic': 'Spasticity',
                         'orobuccal dyspraxia': 'Oromotor apraxia',
                         'difficulty in coordination':'Poor coordination'
                        }
neuroMapper = OptionColumnMapper(column_name='neurological examination',concept_recognizer=hpo_cr, option_d=neuro_exam_custom_map, )
column_mapper_list.append(neuroMapper)
neuroMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Ataxia (HP:0001251) (observed),2
1,Spastic paraplegia (HP:0001258) (observed),1
2,Spasticity (HP:0001257) (observed),2
3,Upper motor neuron dysfunction (HP:0002493) (observed),1
4,Lower limb muscle weakness (HP:0007340) (observed),1
5,Spastic diplegia (HP:0001264) (observed),1
6,Cerebral palsy (HP:0100021) (observed),1
7,Oromotor apraxia (HP:0007301) (observed),1
8,Poor coordination (HP:0002370) (observed),1
9,Unsteady gait (HP:0002317) (observed),1


In [6]:
severity_d = {'moderate\n(IQ 48)':'Intellectual disability, moderate',
             'moderate':'Intellectual disability, moderate',
             'moderate\n(IQ 49)': 'Intellectual disability, moderate',
             'severe': 'Intellectual disability, severe',
             'mild': 'Intellectual disability, mild'}
severityOfIdMapper = OptionColumnMapper(column_name='severity of ID',concept_recognizer=hpo_cr, option_d=severity_d)
column_mapper_list.append(severityOfIdMapper)
severityOfIdMapper.preview_column(df)

Unnamed: 0,mapping,count
0,"Intellectual disability, moderate (HP:0002342) (observed)",7
1,"Intellectual disability, severe (HP:0010864) (observed)",3
2,"Intellectual disability, mild (HP:0001256) (observed)",3


In [7]:
mri_custom_map = {'hypomyelination': 'CNS hypomyelination',
                  'thinning of CC': 'Thin corpus callosum',
                  'white matter volume loss':'Reduced cerebral white matter volume',
                  'widened lateral ventricles': 'Lateral ventricle dilatation',
                  'dysgenesis of corpus callosum': 'Dysplastic corpus callosum',
                  'hypoplasia of mesencephalon and brainstem': 'Hypoplasia of the brainstem'
                  }
mriMapper = OptionColumnMapper(column_name='result of external MRI',concept_recognizer=hpo_cr, option_d=mri_custom_map, )
column_mapper_list.append(mriMapper)
mriMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Cerebellar atrophy (HP:0001272) (observed),1
1,Perisylvian polymicrogyria (HP:0012650) (observed),2
2,Thin corpus callosum (HP:0033725) (observed),4
3,CNS hypomyelination (HP:0003429) (observed),1
4,Hypoplasia of the brainstem (HP:0002365) (observed),1
5,Reduced cerebral white matter volume (HP:0034295) (observed),2
6,Polymicrogyria (HP:0002126) (observed),1
7,Syringomyelia (HP:0003396) (observed),1
8,Lateral ventricle dilatation (HP:0006956) (observed),1
9,Periventricular leukomalacia (HP:0006970) (observed),1


In [8]:
additional_custom_map = {'OCD': 'Obsessive-compulsive behavior',
                         '5th finger clinodactyly': 'Clinodactyly of the 5th finger',
                         'small teeth':'Reduced cerebral white matter volume',
                         'widened lateral ventricles': 'Lateral ventricle dilatation',
                         'dysgenesis of corpus callosum': 'Dysplastic corpus callosum',
                         'dramatic increased weight': 'Obesity'
                        }
excluded = {'pseudostrabismus': "Strabismus"}
additionalFeaturesMapper = OptionColumnMapper(column_name='Additional symptoms',
                                              concept_recognizer=hpo_cr,
                                              option_d=mri_custom_map,
                                              excluded_d=excluded)
column_mapper_list.append(additionalFeaturesMapper)
additionalFeaturesMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Emotional lability (HP:0000712) (observed),1
1,Scoliosis (HP:0002650) (observed),3
2,Nystagmus (HP:0000639) (observed),1
3,Hearing impairment (HP:0000365) (observed),1
4,Hypertelorism (HP:0000316) (observed),1
5,Protruding ear (HP:0000411) (observed),1
6,Hypodontia (HP:0000668) (observed),1
7,Finger clinodactyly (HP:0040019) (observed),1
8,Synophrys (HP:0000664) (observed),1
9,Encopresis (HP:0040183) (observed),1


<h2>Simple mappers</h2>

In [9]:
items = {
    'regression': ["Developmental regression","HP:0002376"],
    'autism': ['Autism', 'HP:0000717'],
    'hypotonia': ['Hypotonia', 'HP:0001252'],
    'movement disorder': ['Abnormality of movement', 'HP:0100022'],
    'CVI': ['Cerebral visual impairment', 'HP:0100704'], # CVI stands for Cortical visual impairment HP:0100704
    'seizures': ['Seizure','HP:0001250'],
    'DD': ['Global developmental delay', 'HP:0001263']
}
item_column_mapper_d = hpo_cr.initialize_simple_column_maps(column_name_to_hpo_label_map=items, observed='yes',
    excluded='no')
print(f"We created {len(item_column_mapper_d)} simple column mappers")
# Transfer to column_mapper_d
for k, v in item_column_mapper_d.items():
    column_mapper_list.append(v)

We created 7 simple column mappers


<h2>Option mapper</h2>

In [10]:
severity_d = {'moderate\n(IQ 48)':'Intellectual disability, moderate',
             'moderate':'Intellectual disability, moderate',
             'moderate\n(IQ 49)': 'Intellectual disability, moderate',
             'severe': 'Intellectual disability, severe',
             'mild': 'Intellectual disability, mild'}
idMapper = OptionColumnMapper(column_name='severity of ID',concept_recognizer=hpo_cr, option_d=severity_d)
column_mapper_list.append(idMapper)
idMapper.preview_column(df)

Unnamed: 0,mapping,count
0,"Intellectual disability, moderate (HP:0002342) (observed)",7
1,"Intellectual disability, severe (HP:0010864) (observed)",3
2,"Intellectual disability, mild (HP:0001256) (observed)",3


In [11]:
mri_custom_map = {'hypomyelination': 'CNS hypomyelination',
                  'thinning of CC': 'Thin corpus callosum',
                  'white matter volume loss':'Reduced cerebral white matter volume',
                  'widened lateral ventricles': 'Lateral ventricle dilatation',
                  'dysgenesis of corpus callosum': 'Dysplastic corpus callosum',
                  'hypoplasia of mesencephalon and brainstem': 'Hypoplasia of the brainstem'
                  }
mriMapper = OptionColumnMapper(column_name='result of external MRI',concept_recognizer=hpo_cr, option_d=mri_custom_map, )
column_mapper_list.append(mriMapper)
mriMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Cerebellar atrophy (HP:0001272) (observed),1
1,Perisylvian polymicrogyria (HP:0012650) (observed),2
2,Thin corpus callosum (HP:0033725) (observed),4
3,CNS hypomyelination (HP:0003429) (observed),1
4,Hypoplasia of the brainstem (HP:0002365) (observed),1
5,Polymicrogyria (HP:0002126) (observed),1
6,Reduced cerebral white matter volume (HP:0034295) (observed),2
7,Syringomyelia (HP:0003396) (observed),1
8,Lateral ventricle dilatation (HP:0006956) (observed),1
9,Periventricular leukomalacia (HP:0006970) (observed),1


In [12]:
additional_custom_map = {'OCD': 'Obsessive-compulsive behavior',
                         '5th finger clinodactyly': 'Clinodactyly of the 5th finger',
                         'small teeth':'Reduced cerebral white matter volume',
                         'widened lateral ventricles': 'Lateral ventricle dilatation',
                         'dysgenesis of corpus callosum': 'Dysplastic corpus callosum',
                         'dramatic increased weight': 'Obesity'
                        }
excluded = {'pseudostrabismus': "Strabismus"}
additionalFeaturesMapper = OptionColumnMapper(column_name='Additional symptoms',
                                              concept_recognizer=hpo_cr,
                                              option_d=mri_custom_map,
                                              excluded_d=excluded)
column_mapper_list.append(additionalFeaturesMapper)
additionalFeaturesMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Emotional lability (HP:0000712) (observed),1
1,Scoliosis (HP:0002650) (observed),3
2,Nystagmus (HP:0000639) (observed),1
3,Hearing impairment (HP:0000365) (observed),1
4,Hypertelorism (HP:0000316) (observed),1
5,Protruding ear (HP:0000411) (observed),1
6,Hypodontia (HP:0000668) (observed),1
7,Finger clinodactyly (HP:0040019) (observed),1
8,Synophrys (HP:0000664) (observed),1
9,Encopresis (HP:0040183) (observed),1


<H1>Mapping variants</H1>
<p>MAPK8IP3 variants reported by Platzer et al, Iwasama et al., and Yechieli et al. We have transformed the variants, which were originally expressed using the transcript  NM_015133.4 to be expressed using the MANE select transcript NM_001318852.2</p>
<p>pyphetools maps variants using the VariantValidator API.</p>

In [13]:
d_NM_015133_to_NM_001318852 = {
"c.45C>G": "c.45C>G",
"c.65delG":"c.65del",
"c.79G>T":"c.79G>T",
"c.111C>G": "c.111C>G",
"c.1198G>A": "c.1201G>A",
"c.1331T>C": "c.1334T>C",
"c.1574G>A": "c.1577G>A",
"c.1732C>T": "c.1735C>T",
"c.2982C>G": "c.2985C>G",
"c.3436C>T": "c.3439C>T"
}

df['NM_001318852'] = df['Transcript\nNM_015133.4\nc.'].apply(lambda x: d_NM_015133_to_NM_001318852.get(x.replace(" ","")))

In [14]:
df['NM_001318852']

0       c.65del
1       c.79G>T
2      c.111C>G
3     c.1201G>A
4     c.1334T>C
5     c.1334T>C
6     c.1577G>A
7     c.1735C>T
8     c.1735C>T
9     c.2985C>G
10    c.3439C>T
11    c.3439C>T
12    c.3439C>T
Name: NM_001318852, dtype: object

In [15]:
MAKP8IP3_transcript='NM_001318852.2'
individual_column_name = 'Indvidual\nin\nmanuscript'
vman = VariantManager(df=df, individual_column_name='Indvidual\nin\nmanuscript', allele_1_column_name='NM_001318852',
                     gene_symbol="MAKP8IP3", transcript=MAKP8IP3_transcript)
var_d = vman.get_variant_d()
varMapper = VariantColumnMapper(variant_d=var_d,
                                variant_column_name='NM_001318852',
                                default_genotype='heterozygous')
# varMapper.preview_column(df['NM_001318852'])

In [16]:
ageMapper = AgeColumnMapper.by_year_and_month('age at last assesment')
#ageMapper.preview_column(df['age at last assesment'])
sexMapper = SexColumnMapper(male_symbol='M', female_symbol='F', column_name='Sex')
#sexMapper.preview_column(df['Sex'])



encoder = CohortEncoder(df=df,
                        hpo_cr=hpo_cr,
                        column_mapper_list=column_mapper_list,
                        individual_column_name=individual_column_name,
                        age_at_last_encounter_mapper=ageMapper,
                        sexmapper=sexMapper,
                        variant_mapper=varMapper,
                        metadata=metadata)
disease_id = 'OMIM:618443'
disease_name = 'Neurodevelopmental disorder with or without variable brain abnormalities'
disease = Disease(disease_id=disease_id, disease_label=disease_name)
encoder.set_disease(disease=disease)

<h2>Getting individual data and exporting to GA4GH Phenopacket Schema format</h2>

In [17]:
individuals = encoder.get_individuals()
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.MONO_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
ERROR,CONFLICT,1
WARNING,REDUNDANT,1
INFORMATION,NOT_MEASURED,14


In [18]:
individuals = cvalidator.get_error_free_individual_list()
table = PhenopacketTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
1 (MALE; P14Y8M),Neurodevelopmental disorder with or without variable brain abnormalities (OMIM:618443),NM_001318852.2:c.65del (heterozygous),"Cerebellar atrophy (HP:0001272); Hypotonia (HP:0001252); Scoliosis (HP:0002650); Emotional lability (HP:0000712); Ataxia (HP:0001251); Global developmental delay (HP:0001263); Autism (HP:0000717); Intellectual disability, moderate (HP:0002342); excluded: Developmental regression (HP:0002376); excluded: Abnormality of movement (HP:0100022); excluded: Cerebral visual impairment (HP:0100704); excluded: Seizure (HP:0001250)"
2 (MALE; P4Y),Neurodevelopmental disorder with or without variable brain abnormalities (OMIM:618443),NM_001318852.2:c.79G>T (heterozygous),"Ataxia (HP:0001251); Hypotonia (HP:0001252); Global developmental delay (HP:0001263); Intellectual disability, severe (HP:0010864); excluded: Developmental regression (HP:0002376); excluded: Autism (HP:0000717); excluded: Abnormality of movement (HP:0100022); excluded: Cerebral visual impairment (HP:0100704); excluded: Seizure (HP:0001250)"
3 (MALE; P4Y),Neurodevelopmental disorder with or without variable brain abnormalities (OMIM:618443),NM_001318852.2:c.111C>G (heterozygous),"Nystagmus (HP:0000639); Hypotonia (HP:0001252); Global developmental delay (HP:0001263); Intellectual disability, moderate (HP:0002342); excluded: Developmental regression (HP:0002376); excluded: Autism (HP:0000717); excluded: Abnormality of movement (HP:0100022); excluded: Cerebral visual impairment (HP:0100704); excluded: Seizure (HP:0001250)"
4 (MALE; P7Y6M),Neurodevelopmental disorder with or without variable brain abnormalities (OMIM:618443),NM_001318852.2:c.1201G>A (heterozygous),"Synophrys (HP:0000664); Finger clinodactyly (HP:0040019); Hypertelorism (HP:0000316); Hypodontia (HP:0000668); Encopresis (HP:0040183); Hearing impairment (HP:0000365); Protruding ear (HP:0000411); Global developmental delay (HP:0001263); Autism (HP:0000717); Intellectual disability, mild (HP:0001256); excluded: Developmental regression (HP:0002376); excluded: Hypotonia (HP:0001252); excluded: Abnormality of movement (HP:0100022); excluded: Cerebral visual impairment (HP:0100704); excluded: Seizure (HP:0001250)"
5 (MALE; P10Y),Neurodevelopmental disorder with or without variable brain abnormalities (OMIM:618443),NM_001318852.2:c.1334T>C (heterozygous),"Hypotonia (HP:0001252); Scoliosis (HP:0002650); Seizure (HP:0001250); Perisylvian polymicrogyria (HP:0012650); Global developmental delay (HP:0001263); Microdontia (HP:0000691); Intellectual disability, moderate (HP:0002342); excluded: Developmental regression (HP:0002376); excluded: Autism (HP:0000717); excluded: Abnormality of movement (HP:0100022)"
6 (FEMALE; P9Y),Neurodevelopmental disorder with or without variable brain abnormalities (OMIM:618443),NM_001318852.2:c.1334T>C (heterozygous),"Perisylvian polymicrogyria (HP:0012650); Hypotonia (HP:0001252); Intellectual disability, mild (HP:0001256); Global developmental delay (HP:0001263); excluded: Developmental regression (HP:0002376); excluded: Autism (HP:0000717); excluded: Seizure (HP:0001250)"
7 (FEMALE; P3Y),Neurodevelopmental disorder with or without variable brain abnormalities (OMIM:618443),NM_001318852.2:c.1577G>A (heterozygous),"Global developmental delay (HP:0001263); Intellectual disability, mild (HP:0001256); excluded: Developmental regression (HP:0002376); excluded: Autism (HP:0000717); excluded: Hypotonia (HP:0001252); excluded: Seizure (HP:0001250)"
8 (FEMALE; P5Y),Neurodevelopmental disorder with or without variable brain abnormalities (OMIM:618443),NM_001318852.2:c.1735C>T (heterozygous),"Hypotonia (HP:0001252); Long philtrum (HP:0000343); Spastic paraplegia (HP:0001258); Seizure (HP:0001250); CNS hypomyelination (HP:0003429); Global developmental delay (HP:0001263); Full cheeks (HP:0000293); Intellectual disability, severe (HP:0010864); Thin corpus callosum (HP:0033725); excluded: Developmental regression (HP:0002376); excluded: Autism (HP:0000717)"
9 (FEMALE; P6Y),Neurodevelopmental disorder with or without variable brain abnormalities (OMIM:618443),NM_001318852.2:c.1735C>T (heterozygous),"Hypotonia (HP:0001252); Thin corpus callosum (HP:0033725); Polymicrogyria (HP:0002126); Small hand (HP:0200055); Seizure (HP:0001250); Hypoplasia of the brainstem (HP:0002365); Reduced cerebral white matter volume (HP:0034295); Global developmental delay (HP:0001263); Syringomyelia (HP:0003396); Lower limb muscle weakness (HP:0007340); Spasticity (HP:0001257); Intellectual disability, moderate (HP:0002342); excluded: Developmental regression (HP:0002376); excluded: Autism (HP:0000717)"
10 (MALE; P4Y),Neurodevelopmental disorder with or without variable brain abnormalities (OMIM:618443),NM_001318852.2:c.2985C>G (heterozygous),"Global developmental delay (HP:0001263); Hypotonia (HP:0001252); Seizure (HP:0001250); Intellectual disability, moderate (HP:0002342); excluded: Developmental regression (HP:0002376); excluded: Autism (HP:0000717)"


<h2>Output results in phenopacket format</h2>

In [19]:
Individual.output_individuals_as_phenopackets(individual_list=individuals,
                                              metadata=metadata,
                                              outdir="phenopackets")

We output 13 GA4GH phenopackets to the directory phenopackets


In [20]:
# pxf validate --hpo hp.json *.json
# no errors