<H1>MAPK8IP3:  Iwasawa et al (2019)</H1>
<p>This notebook uses the <a href="https://github.com/monarch-initiative/pyphetools" target="__blank">pyphetools</a> library
to create GA4GH phenopackets from the data in  <a href="https://pubmed.ncbi.nlm.nih.gov/30945334/" target="__blank">Platzer K., et al. (2019) De Novo Variants in MAPK8IP3 Cause Intellectual Disability with Variable Brain Anomalies</a>. See the <a href="https://monarch-initiative.github.io/pyphetools/index.html" target="__blank">Pyphetools documentation</a> for more information about the code.</p>
<p>The original article describes dentified 5 individuals from four families with recurrent de novo variants c.1732C>T (p.Arg578Cys) and c.3436C>T (p.Arg1146Cys) in MAPK8IP3. </p>
<p>This notebook parses the information in the Supplemental Table (an Excel file).</p>

In [1]:
import phenopackets as php
from google.protobuf.json_format import MessageToDict, MessageToJson
from google.protobuf.json_format import Parse, ParseDict
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from collections import defaultdict
from pyphetools.creation import *
import importlib.metadata
__version__ = importlib.metadata.version("pyphetools")
print(f"Using pyphetools version {__version__}")

Using pyphetools version 0.4.13


<h2>Importing HPO data</h2>

In [2]:
parser = HpoParser()
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199")
metadata.default_versions_with_hpo(version=hpo_version)

<H2>Importing the supplemental file.</H2>

In [4]:
df = pd.read_excel('input/PMID_30945334.xlsx')
df.head()

Unnamed: 0,identifier,Individual 1,Individual 2,Individual 3,Individual 4,Individual 5
0,"Variant (hg19, NM_015133.4)",c.1732C>T,c.1732C>T,c.1732C>T,c.3436C>T,c.3436C>T
1,Protein variant,(p.Arg578Cys),(p.Arg578Cys),(p.Arg578Cys),(p.Arg1146Cys),(p.Arg1146Cys)
2,Age (yr),29,27,16,5,5
3,Sex,Male,Female,Male,Male,Female
4,Gestational ages (weeks),39,40,40,36,41


<h2>Converting to row-based format</h2>

In [5]:
dft = df.transpose()
dft.columns = dft.iloc[0]
dft.drop(dft.index[0], inplace=True)
dft['patient_id'] = dft.index
dft.head()

identifier,"Variant (hg19, NM_015133.4)",Protein variant,Age (yr),Sex,Gestational ages (weeks),Delayed motor development,Age at head control (months),Age at rolling (months),Age at unsupported sitting (months),Age at crawling (months),...,Facial dysmorphism,Round face,Prominent nasal bridge,Thin upper lip,Others,Other,Short stature,Obesity,Precocious puberty,patient_id
Individual 1,c.1732C>T,(p.Arg578Cys),29,Male,39,+,2.5,ND,7,Not acquired,...,,+,−,+,,,+,+,+,Individual 1
Individual 2,c.1732C>T,(p.Arg578Cys),27,Female,40,+,3.5,11,6,11,...,,+,−,+,,,+,+,+,Individual 2
Individual 3,c.1732C>T,(p.Arg578Cys),16,Male,40,+,4.0,6,Not acquired,ND,...,,−,+,+,,,+,+,ND,Individual 3
Individual 4,c.3436C>T,(p.Arg1146Cys),5,Male,36,+,5.0,7,15,18,...,,+,+,+,,,+,−,−,Individual 4
Individual 5,c.3436C>T,(p.Arg1146Cys),5,Female,41,+,5.0,6,11,18,...,,+,+,+,"Long and thick eyebrows, upper slanted palpebral fissures, anteverted nares, short philtrum",,−,−,−,Individual 5


## Column mappers

In [6]:
column_mapper_d = defaultdict(ColumnMapper)

In [24]:
delayedMotorMapper = SimpleColumnMapper(hpo_id='HP:0001270',
    hpo_label='Motor delay',
    observed='+',
    excluded='-')
delayedMotorMapper.preview_column(dft['Delayed motor development'])
column_mapper_d['Delayed motor development'] = delayedMotorMapper

Unnamed: 0,term,status
0,Motor delay (HP:0001270),observed
1,Motor delay (HP:0001270),observed
2,Motor delay (HP:0001270),observed
3,Motor delay (HP:0001270),observed
4,Motor delay (HP:0001270),observed


In [8]:
headLagMapper = ThresholdedColumnMapper(hpo_id="HP:0032988", hpo_label="Persistent head lag", 
                                        threshold=4, call_if_above=True)
headLagMapper.preview_column(dft["Age at head control (months)"])
column_mapper_d["Age at head control (months)"] = headLagMapper

In [9]:
rollOverMappper = ThresholdedColumnMapper(hpo_id="HP:0032989", hpo_label="Delayed ability to roll over", 
                                        threshold=6, call_if_above=True)
rollOverMappper.preview_column(dft["Age at rolling (months)"])
column_mapper_d["Age at rolling (months)"] = rollOverMappper

In [10]:
# Age at unsupported sitting (months) 	threshold: 9 months
delayedSittingMapper =  ThresholdedColumnMapper(hpo_id="HP:0025336", hpo_label="Delayed ability to sit", 
                                        threshold=9, call_if_above=True, observed_code='Not acquired')
delayedSittingMapper.preview_column(dft["Age at unsupported sitting (months)"])
column_mapper_d["Age at unsupported sitting (months)"] = delayedSittingMapper

In [11]:
# Age at walking (months) - 15 months -- Delayed ability to walk HP:0031936
delayedWalkingMapper =  ThresholdedColumnMapper(hpo_id="HP:0031936", hpo_label="Delayed ability to walk", 
                                        threshold=15, call_if_above=True, observed_code='Not acquired')
delayedWalkingMapper.preview_column(dft["Age at walking (months)"])
column_mapper_d["Age at walking (months)"] = delayedWalkingMapper

In [12]:
items = {
    'History of regression': ["Developmental regression","HP:0002376"],
    'Spastic diplegia':['Spastic diplegia', 'HP:0001264'],     #       
    'Autistic behavior': ['Autistic behavior', 'HP:0000729'],  # 
    'Infantile hypotonia':['Infantile muscular hypotonia','HP:0008947'], # 
    'Cerebral atrophy':["Cerebral atrophy","HP:0002059"], #
    'Delayed myelination':["Delayed CNS myelination","HP:0002188"], #
    'Corpus callosum hypoplasia':['Hypoplasia of the corpus callosum','HP:0002079'],#
    'Prominent nasal bridge':['Prominent nasal bridge','HP:0000426'], #
    'Thin upper lip':["Thin upper lip vermilion","HP:0000219"],
    "Round face":["Round face","HP:0000311"],
    "Short stature":["Short stature","HP:0004322"],
    "Obesity":["Obesity", "HP:0001513"],
    "Precocious puberty":["Precocious puberty", "HP:0000826"],
}
item_column_mapper_d = hpo_cr.initialize_simple_column_maps(column_name_to_hpo_label_map=items, observed='+',
    excluded='-')
print(f"We created {len(item_column_mapper_d)} simple column mappers")
# Transfer to column_mapper_d
for k, v in item_column_mapper_d.items():
    column_mapper_d[k] = v

We created 13 simple column mappers


In [13]:
severity_d = {'Severe': 'Intellectual disability, severe',
             'Profound': 'Intellectual disability, profound'}
idMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=severity_d)
#idMapper.preview_column(dft['Intellectual disability'])
column_mapper_d['Intellectual disability'] = idMapper

In [14]:
# Language skills
language_d = {'Simple two-word sentences': 'Delayed speech and language development',
             'Simple words': 'Delayed speech and language development',
             'Nonverbal': 'Absent speech'}
languageMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=language_d)
# languageMapper.preview_column(dft['Language skills'])
column_mapper_d['Language skills'] = languageMapper

In [15]:
# Gross motor skills Wheelchair bound 	Wheelchair bound 	Wheelchair bound 	Cruising (5y)	Walking  (5y)
gms_d = {
    "Wheelchair bound": "Loss of ambulation",
    "Cruising": "Delayed gross motor development"
}
gmsMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=gms_d)
# gmsMapper.preview_column(dft['Gross motor skills'])
column_mapper_d['Gross motor skills'] = gmsMapper

In [16]:
# Others
other_d = {'upper slanted palpebral fissures': 'Upslanted palpebral fissure'}
otherMapper = CustomColumnMapper(concept_recognizer=hpo_cr, custom_map_d=other_d)
#otherMapper.preview_column(dft['Others'])
column_mapper_d['Others'] = otherMapper

<h2>Variant Data</h2>
<p>The variant data (HGVS< transcript) is listed in the Variant (hg19, NM_015133.4) column.</p>

In [17]:
genome = 'hg38'
transcript='NM_015133.4'
varMapper = VariantColumnMapper(assembly=genome,
                                column_name='Variant (hg19, NM_015133.4)', 
                                transcript=transcript, 
                                default_genotype='heterozygous')
varMapper.preview_column(dft['Variant (hg19, NM_015133.4)'])

https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1732C>T/NM_015133.4?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1732C>T/NM_015133.4?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1732C>T/NM_015133.4?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.3436C>T/NM_015133.4?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.3436C>T/NM_015133.4?content-type=application%2Fjson


Unnamed: 0,variant
0,NM_015133.4:c.1732C>T
1,NM_015133.4:c.1732C>T
2,NM_015133.4:c.1732C>T
3,NM_015133.4:c.3436C>T
4,NM_015133.4:c.3436C>T


<h2>Demographic data</h2>

In [18]:
ageMapper = AgeColumnMapper.by_year('Age (yr)')
ageMapper.preview_column(dft['Age (yr)'])

Unnamed: 0,original column contents,age
0,29,P29Y
1,27,P27Y
2,16,P16Y
3,5,P5Y


In [19]:
sexMapper = SexColumnMapper(male_symbol='Male', female_symbol='Female', column_name='Sex')
sexMapper.preview_column(dft['Sex'])


Unnamed: 0,original column contents,sex
0,Male,MALE
1,Female,FEMALE
2,Male,MALE
3,Male,MALE
4,Female,FEMALE


In [20]:
pmid = "PMID:30612693"
encoder = CohortEncoder(df=dft, hpo_cr=hpo_cr, column_mapper_d=column_mapper_d, 
                        individual_column_name="patient_id", 
                        agemapper=ageMapper, 
                        sexmapper=sexMapper,
                        metadata=metadata,
                        variant_mapper=varMapper,
                        pmid=pmid)
disease_id = "OMIM:618443"
disease_label = "Neurodevelopmental disorder with or without variable brain abnormalities"
encoder.set_disease(disease_id=disease_id, label=disease_label)

In [21]:
individuals = encoder.get_individuals()

https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1732C>T/NM_015133.4?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1732C>T/NM_015133.4?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.1732C>T/NM_015133.4?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.3436C>T/NM_015133.4?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_015133.4%3Ac.3436C>T/NM_015133.4?content-type=application%2Fjson


In [22]:
i1 = individuals[0]
phenopacket1 = i1.to_ga4gh_phenopacket(metadata=metadata.to_ga4gh())
json_string = MessageToJson(phenopacket1)
print(json_string)

{
  "id": "PMID_30612693_individual_Individual 1",
  "subject": {
    "id": "Individual 1",
    "timeAtLastEncounter": {
      "age": {
        "iso8601duration": "P29Y"
      }
    },
    "sex": "MALE"
  },
  "phenotypicFeatures": [
    {
      "type": {
        "id": "HP:0001270",
        "label": "Motor delay"
      }
    },
    {
      "type": {
        "id": "HP:0032988",
        "label": "Persistent head lag"
      },
      "excluded": true
    },
    {
      "type": {
        "id": "HP:0025336",
        "label": "Delayed ability to sit"
      },
      "excluded": true
    },
    {
      "type": {
        "id": "HP:0031936",
        "label": "Delayed ability to walk"
      }
    },
    {
      "type": {
        "id": "HP:0001264",
        "label": "Spastic diplegia"
      }
    },
    {
      "type": {
        "id": "HP:0002059",
        "label": "Cerebral atrophy"
      }
    },
    {
      "type": {
        "id": "HP:0002188",
        "label": "Delayed CNS myelination"
      }


In [23]:
Individual.output_individuals_as_phenopackets(individual_list=individuals, pmid=pmid, metadata=metadata.to_ga4gh(), outdir="phenopackets")

We output 5 GA4GH phenopackets to the directory phenopackets
