<H1>MAPK8IP3: Platzer et al (2019)</H1>
<p>This notebook uses the <a href="https://github.com/monarch-initiative/pyphetools" target="__blank">pyphetools</a> library
to create GA4GH phenopackets from the data in  <a href="https://pubmed.ncbi.nlm.nih.gov/30612693/" target="__blank">Platzer K., et al. (2019) De Novo Variants in MAPK8IP3 Cause Intellectual Disability with Variable Brain Anomalies</a>. See the <a href="https://monarch-initiative.github.io/pyphetools/index.html" target="__blank">Pyphetools documentation</a> for more information about the code.</p>
<p>The original article describes de novo variants in MAPK8IP3 in 13 unrelated individuals presenting with an overlapping phenotype of mild to severe intellectual disability. </p>
<p>This notebook parses the information in Supplemental Table S1 (an Excel file).</p>

In [1]:
import phenopackets as php
from google.protobuf.json_format import MessageToDict, MessageToJson
from google.protobuf.json_format import Parse, ParseDict
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from collections import defaultdict
from pyphetools.creation import *
import importlib.metadata
__version__ = importlib.metadata.version("pyphetools")
print(f"Using pyphetools version {__version__}")

Using pyphetools version 0.4.12


<h2>Importing HPO data</h2>

In [2]:
parser = HpoParser()
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199")
metadata.default_versions_with_hpo(version=hpo_version)

<H2>Importing supplemental file S1.</H2>

In [3]:
df = pd.read_excel('input/platzer_2019_supplement.xlsx')
df.head()

Unnamed: 0,Indvidual\nin\nmanuscript,g.(hg19) Chr16:,Transcript\nNM_015133.4\nc.,p.,origin,genetic testing,Sex,age at last assesment,prenatal period,Exam at birth,...,neurological examination,result of external MRI,seizures,Sz onset and Sz types,AEDs used,Sz outcome,EEG,Additional symptoms,family history,further results of genetic testing
0,1,1756405,c.65delG,p.Gly22Alafs*3,de novo,TrioWES,M,14 y 8 m,,41 weeks:\nlength: 53.3 cm\nweight: 3.941 kg\nOFC: NA,...,ataxia,"mild cerebellar atrophy, hypointensity of the globi pallidi and substantia nigra, possible mild degree of abnormal iron or mineral deposition",no,,,,,"speech is ataxic but speaks in sentences/short phrases; attention issues, impulse control and emotional lability, OCD symptoms; recently developed scoliosis",unremarkable,
1,2,1756419,c.79G>T,p.Glu27*,de novo,SingleWES,M,4 y,,length: 49 cm\nweigth: 3215 g\nOFC: 35 cm,...,ataxia,normal,no,,,,,pre-natal pelvi-ureteric junction stenosis (spontaneous resolution at 6 m),,
2,3,1756451,c.111C>G,p.Tyr37*,de novo,TrioWES,M,4 y,,length: 20.5 in\nweight: 8 lb 2 oz\nOFC: NA,...,,Stable areas of T2 hyperintensity involving the central tegmental tracts,no,,,,,Nystagmus,unremarkable,770 kb duplicaion of 20p12.3 on chromosome microarray
3,4,1798706,c.1198G>A,p.Gly400Arg,de novo,TrioWES,M,7 y 6 m,"no prenatal care, no known problems","32 weeks:\nlength: NA,\nweight: 4 lbs,\nOFC: NA\n\nhad a 30 day hospital course",...,,no MRI done,no,,,,,"Left hearing loss; Dysmorphic features: hypertelorism inner canthal distance 4.3cm; low set prominent ears, slight overhangin columella, hypodontia; 5th finger clinodactyly and 5th finger brachydactylky; synophrys; Encopresis",Mother with learning disorder; finished 11th grade; Father with ADHD and learning disorder; finished 9th grade; Full sister with learning disorder; Full sister no known problems; Full brother with learning disorder,
4,5,1810410,c.1331T>C,p.Leu444Pro,de novo,TrioWES,M,10 y,,"40 weeks, length: 52 cm\nweight: 3810 g\nOFC: 36 cm",...,,perisylvian polymicrogyria,yes,10 y:\none event of a generalized seizure,,,"pathological EEG with normal age-related background activity (alpha-type), increased appearance of slowing over temporal and occipital regions","no dysmorphism, small teeth, severe s-configured scoliosis of thoracic and lumbar spine",,


<h2>Collecting column mappers</h2>

In [4]:
column_mapper_d = defaultdict(ColumnMapper)

In [5]:
neuro_exam_custom_map = {'low extremity weakness': 'Lower limb muscle weakness',  
                         'unstable gait': 'Unsteady gait',
                         'dysfunction of the corticospinal pathways':'Upper motor neuron dysfunction',
                         'spastic': 'Spasticity',
                         'orobuccal dyspraxia': 'Oromotor apraxia',
                         'difficulty in coordination':'Poor coordination'
                        }
neuroMapper = CustomColumnMapper(concept_recognizer=hpo_cr, custom_map_d=neuro_exam_custom_map, )
#neuroMapper.preview_column(df['neurological examination'])
column_mapper_d['neurological examination'] = neuroMapper

In [6]:
severity_d = {'moderate\n(IQ 48)':'Intellectual disability, moderate',
             'moderate':'Intellectual disability, moderate',
             'moderate\n(IQ 49)': 'Intellectual disability, moderate',
             'severe': 'Intellectual disability, severe',
             'mild': 'Intellectual disability, mild'}
severityOfIdMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=severity_d)
#severityOfIdMapper.preview_column(df['severity of ID'])
column_mapper_d['severity of ID'] = severityOfIdMapper

In [7]:
mri_custom_map = {'hypomyelination': 'CNS hypomyelination',  
                  'thinning of CC': 'Thin corpus callosum',
                  'white matter volume loss':'Reduced cerebral white matter volume',
                  'widened lateral ventricles': 'Lateral ventricle dilatation',
                  'dysgenesis of corpus callosum': 'Dysplastic corpus callosum',
                  'hypoplasia of mesencephalon and brainstem': 'Hypoplasia of the brainstem'
                  }
mriMapper = CustomColumnMapper(concept_recognizer=hpo_cr, custom_map_d=mri_custom_map, )
#mriMapper.preview_column(df['result of external MRI'])
column_mapper_d['result of external MRI'] = mriMapper

In [8]:
additional_custom_map = {'OCD': 'Obsessive-compulsive behavior',  
                         '5th finger clinodactyly': 'Clinodactyly of the 5th finger',
                         'small teeth':'Reduced cerebral white matter volume',
                         'widened lateral ventricles': 'Lateral ventricle dilatation',
                         'dysgenesis of corpus callosum': 'Dysplastic corpus callosum',
                         'dramatic increased weight': 'Obesity'
                        }
excluded = {'pseudostrabismus'}
additionalFeaturesMapper = CustomColumnMapper(concept_recognizer=hpo_cr, 
                                              custom_map_d=mri_custom_map, 
                                              excluded_set=excluded)
additionalFeaturesMapper.preview_column(df['Additional symptoms'])
column_mapper_d['Additional symptoms'] = additionalFeaturesMapper

<h2>Simple mappers</h2>

In [9]:
items = {
    'regression': ["Developmental regression","HP:0002376"],
    'autism': ['Autism', 'HP:0000717'],
    'hypotonia': ['Hypotonia', 'HP:0001252'],
    'movement disorder': ['Abnormality of movement', 'HP:0100022'],
    'CVI': ['Cerebral visual impairment', 'HP:0100704'], # CVI stands for Cortical visual impairment HP:0100704
    'seizures': ['Seizure','HP:0001250'],
    'DD': ['Global developmental delay', 'HP:0001263']
}
item_column_mapper_d = hpo_cr.initialize_simple_column_maps(column_name_to_hpo_label_map=items, observed='yes',
    excluded='no')
print(f"We created {len(item_column_mapper_d)} simple column mappers")
# Transfer to column_mapper_d
for k, v in item_column_mapper_d.items():
    column_mapper_d[k] = v

We created 7 simple column mappers


<h2>Option mapper</h2>

In [11]:
severity_d = {'moderate\n(IQ 48)':'Intellectual disability, moderate',
             'moderate':'Intellectual disability, moderate',
             'moderate\n(IQ 49)': 'Intellectual disability, moderate',
             'severe': 'Intellectual disability, severe',
             'mild': 'Intellectual disability, mild'}
severityOfIdMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=severity_d)
severityOfIdMapper.preview_column(df['severity of ID'])
column_mapper_d['severity of ID'] = severityOfIdMapper

In [12]:
mri_custom_map = {'hypomyelination': 'CNS hypomyelination',  
                  'thinning of CC': 'Thin corpus callosum',
                  'white matter volume loss':'Reduced cerebral white matter volume',
                  'widened lateral ventricles': 'Lateral ventricle dilatation',
                  'dysgenesis of corpus callosum': 'Dysplastic corpus callosum',
                  'hypoplasia of mesencephalon and brainstem': 'Hypoplasia of the brainstem'
                  }
mriMapper = CustomColumnMapper(concept_recognizer=hpo_cr, custom_map_d=mri_custom_map, )
mriMapper.preview_column(df['result of external MRI'])
column_mapper_d['result of external MRI'] = mriMapper

In [13]:
additional_custom_map = {'OCD': 'Obsessive-compulsive behavior',  
                         '5th finger clinodactyly': 'Clinodactyly of the 5th finger',
                         'small teeth':'Reduced cerebral white matter volume',
                         'widened lateral ventricles': 'Lateral ventricle dilatation',
                         'dysgenesis of corpus callosum': 'Dysplastic corpus callosum',
                         'dramatic increased weight': 'Obesity'
                        }
excluded = {'pseudostrabismus'}
additionalFeaturesMapper = CustomColumnMapper(concept_recognizer=hpo_cr, 
                                              custom_map_d=mri_custom_map, 
                                              excluded_set=excluded)
additionalFeaturesMapper.preview_column(df['Additional symptoms'])
column_mapper_d['Additional symptoms'] = additionalFeaturesMapper

<H1>Mapping variants</H1>
<p>pyphetools maps variants using the VariantValidator API.</p>

In [14]:
genome = 'hg38'
transcript='NM_015133.4'
varMapper = VariantColumnMapper(assembly=genome,
                                column_name='Transcript\nNM_015133.4\nc.', 
                                transcript=transcript, 
                                default_genotype='heterozygous')

In [15]:
varMapper.preview_column(df['Transcript\nNM_015133.4\nc.'])

https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.65delG/NM_015133.4?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.79G>T/NM_015133.4?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.111C>G/NM_015133.4?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1198G>A  /NM_015133.4?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1331T>C/NM_015133.4?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1331T>C/NM_015133.4?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1574G>A/NM_015133.4?content-type=application%2Fjson
https://rest.va

Unnamed: 0,variant
0,NM_015133.4:c.65del
1,NM_015133.4:c.79G>T
2,NM_015133.4:c.111C>G
3,NM_015133.4:c.1198G>A
4,NM_015133.4:c.1331T>C
5,NM_015133.4:c.1331T>C
6,NM_015133.4:c.1574G>A
7,NM_015133.4:c.1732C>T
8,NM_015133.4:c.1732C>T
9,NM_015133.4:c.2982C>G


In [16]:
ageMapper = AgeColumnMapper.by_year_and_month('age at last assesment')
ageMapper.preview_column(df['age at last assesment'])

sexMapper = SexColumnMapper(male_symbol='M', female_symbol='F', column_name='Sex')
sexMapper.preview_column(df['Sex'])

individual_column_name = 'Indvidual\nin\nmanuscript'

pmid = "PMID:30612693"
encoder = CohortEncoder(df=df, 
                        hpo_cr=hpo_cr, 
                        column_mapper_d=column_mapper_d, 
                        individual_column_name=individual_column_name,
                        agemapper=ageMapper, 
                        sexmapper=sexMapper,
                        variant_mapper=varMapper,
                        metadata=metadata,
                        pmid=pmid)
disease_id = 'OMIM:618443'
disease_name = 'Neurodevelopmental disorder with or without variable brain abnormalities'

encoder.set_disease(disease_id=disease_id, label=disease_name)

In [17]:
encoder.preview_dataframe()

Unnamed: 0_level_0,sex,age,phenotypic features
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,MALE,P14Y8M,"Ataxia (HP:0001251)\nIntellectual disability, moderate (HP:0002342)\nIntellectual disability, moderate (HP:0002342)\nCerebellar atrophy (HP:0001272)\nEmotional lability (HP:0000712)\nScoliosis (HP:0002650)\nexcluded: Developmental regression (HP:0002376)\nAutism (HP:0000717)\nHypotonia (HP:0001252)\nexcluded: Abnormality of movement (HP:0100022)\nexcluded: Cerebral visual impairment (HP:0100704)\nexcluded: Seizure (HP:0001250)\nGlobal developmental delay (HP:0001263)"
2,MALE,P4Y,"Ataxia (HP:0001251)\nIntellectual disability, severe (HP:0010864)\nexcluded: Developmental regression (HP:0002376)\nexcluded: Autism (HP:0000717)\nHypotonia (HP:0001252)\nexcluded: Abnormality of movement (HP:0100022)\nexcluded: Cerebral visual impairment (HP:0100704)\nexcluded: Seizure (HP:0001250)\nGlobal developmental delay (HP:0001263)"
3,MALE,P4Y,"Intellectual disability, moderate (HP:0002342)\nNystagmus (HP:0000639)\nexcluded: Developmental regression (HP:0002376)\nexcluded: Autism (HP:0000717)\nHypotonia (HP:0001252)\nexcluded: Abnormality of movement (HP:0100022)\nexcluded: Cerebral visual impairment (HP:0100704)\nexcluded: Seizure (HP:0001250)\nGlobal developmental delay (HP:0001263)"
4,MALE,P7Y6M,"Intellectual disability, mild (HP:0001256)\nHearing impairment (HP:0000365)\nHypertelorism (HP:0000316)\nProtruding ear (HP:0000411)\nHypodontia (HP:0000668)\nFinger clinodactyly (HP:0040019)\nSynophrys (HP:0000664)\nEncopresis (HP:0040183)\nexcluded: Developmental regression (HP:0002376)\nAutism (HP:0000717)\nexcluded: Hypotonia (HP:0001252)\nexcluded: Abnormality of movement (HP:0100022)\nexcluded: Cerebral visual impairment (HP:0100704)\nexcluded: Seizure (HP:0001250)\nGlobal developmental delay (HP:0001263)"
5,MALE,P10Y,"Intellectual disability, moderate (HP:0002342)\nPerisylvian polymicrogyria (HP:0012650)\nMicrodontia (HP:0000691)\nScoliosis (HP:0002650)\nexcluded: Developmental regression (HP:0002376)\nexcluded: Autism (HP:0000717)\nHypotonia (HP:0001252)\nexcluded: Abnormality of movement (HP:0100022)\nnot measured: Cerebral visual impairment (HP:0100704)\nSeizure (HP:0001250)\nGlobal developmental delay (HP:0001263)"
6,FEMALE,P9Y,"Intellectual disability, mild (HP:0001256)\nPerisylvian polymicrogyria (HP:0012650)\nexcluded: Developmental regression (HP:0002376)\nexcluded: Autism (HP:0000717)\nHypotonia (HP:0001252)\nnot measured: Abnormality of movement (HP:0100022)\nnot measured: Cerebral visual impairment (HP:0100704)\nexcluded: Seizure (HP:0001250)\nGlobal developmental delay (HP:0001263)"
7,FEMALE,P3Y,"Intellectual disability, mild (HP:0001256)\nexcluded: Developmental regression (HP:0002376)\nexcluded: Autism (HP:0000717)\nexcluded: Hypotonia (HP:0001252)\nnot measured: Abnormality of movement (HP:0100022)\nnot measured: Cerebral visual impairment (HP:0100704)\nexcluded: Seizure (HP:0001250)\nGlobal developmental delay (HP:0001263)"
8,FEMALE,P5Y,"Spastic paraplegia (HP:0001258)\nIntellectual disability, severe (HP:0010864)\nCNS hypomyelination (HP:0003429)\nThin corpus callosum (HP:0033725)\nFull cheeks (HP:0000293)\nLong philtrum (HP:0000343)\nexcluded: Developmental regression (HP:0002376)\nexcluded: Autism (HP:0000717)\nHypotonia (HP:0001252)\nnot measured: Abnormality of movement (HP:0100022)\nnot measured: Cerebral visual impairment (HP:0100704)\nSeizure (HP:0001250)\nGlobal developmental delay (HP:0001263)"
9,FEMALE,P6Y,"Spasticity (HP:0001257)\nLower limb muscle weakness (HP:0007340)\nUpper motor neuron dysfunction (HP:0002493)\nIntellectual disability, moderate (HP:0002342)\nThin corpus callosum (HP:0033725)\nHypoplasia of the brainstem (HP:0002365)\nReduced cerebral white matter volume (HP:0034295)\nThin corpus callosum (HP:0033725)\nPolymicrogyria (HP:0002126)\nReduced cerebral white matter volume (HP:0034295)\nPica (HP:0011856)\nSyringomyelia (HP:0003396)\nSmall hand (HP:0200055)\nexcluded: Developmental regression (HP:0002376)\nexcluded: Autism (HP:0000717)\nHypotonia (HP:0001252)\nnot measured: Abnormality of movement (HP:0100022)\nnot measured: Cerebral visual impairment (HP:0100704)\nSeizure (HP:0001250)\nGlobal developmental delay (HP:0001263)"
10,MALE,P4Y,"Intellectual disability, moderate (HP:0002342)\nexcluded: Developmental regression (HP:0002376)\nexcluded: Autism (HP:0000717)\nHypotonia (HP:0001252)\nnot measured: Abnormality of movement (HP:0100022)\nnot measured: Cerebral visual impairment (HP:0100704)\nSeizure (HP:0001250)\nGlobal developmental delay (HP:0001263)"


<h2>Getting individual data and exporting to GA4GH Phenopacket Schema format</h2>

In [18]:
individuals = encoder.get_individuals()
print(f"We extracted {len(individuals)} individuals")

https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.65delG/NM_015133.4?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.79G>T/NM_015133.4?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.111C>G/NM_015133.4?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1198G>A  /NM_015133.4?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1331T>C/NM_015133.4?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1331T>C/NM_015133.4?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1574G>A/NM_015133.4?content-type=application%2Fjson
https://rest.va

In [19]:
i1 = individuals[0]
phenopacket1 = i1.to_ga4gh_phenopacket(metadata=metadata.to_ga4gh())
json_string = MessageToJson(phenopacket1)
print(json_string)

{
  "id": "1",
  "subject": {
    "id": "1",
    "timeAtLastEncounter": {
      "age": {
        "iso8601duration": "P14Y8M"
      }
    },
    "sex": "MALE"
  },
  "phenotypicFeatures": [
    {
      "type": {
        "id": "HP:0001251",
        "label": "Ataxia"
      }
    },
    {
      "type": {
        "id": "HP:0002342",
        "label": "Intellectual disability, moderate"
      }
    },
    {
      "type": {
        "id": "HP:0002342",
        "label": "Intellectual disability, moderate"
      }
    },
    {
      "type": {
        "id": "HP:0001272",
        "label": "Cerebellar atrophy"
      }
    },
    {
      "type": {
        "id": "HP:0000712",
        "label": "Emotional lability"
      }
    },
    {
      "type": {
        "id": "HP:0002650",
        "label": "Scoliosis"
      }
    },
    {
      "type": {
        "id": "HP:0002376",
        "label": "Developmental regression"
      },
      "excluded": true
    },
    {
      "type": {
        "id": "HP:0000717",
 

<h2>Output results in phenopacket format</h2>

In [20]:
Individual.output_individuals_as_phenopackets(individual_list=individuals, 
                                              pmid=pmid, 
                                              metadata=metadata.to_ga4gh(), 
                                              outdir="phenopackets")

13