<h1>Gain and loss of function variants in EZH1 disrupt neurogenesis and cause dominant and recessive neurodevelopmental disorders</h1>
<p>Extract the clinical data from <a href="https://pubmed.ncbi.nlm.nih.gov/37433783/"target="__blank">Gracia-Diaz C, et al. (2023) Gain and loss of function variants in EZH1 disrupt neurogenesis and cause dominant and recessive neurodevelopmental disorders. Nat Commun. 14:4109. (PMID:37433783)</a>.<p>
<p>The authors found differeneces beween individuals with uni- and biallelic variation in EZH1.</p>

In [1]:
import phenopackets as php
from google.protobuf.json_format import MessageToDict, MessageToJson
from google.protobuf.json_format import Parse, ParseDict
import pandas as pd
import math
from csv import DictReader
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from collections import defaultdict
import re
import pyphetools
from pyphetools.creation import *
from pyphetools.output import PhenopacketTable
print(f"pyphetools version {pyphetools.__version__}")

pyphetools version 0.5.8


In [2]:
parser = HpoParser()
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
pmid="PMID:37433783"
title = "Gain and loss of function variants in EZH1 disrupt neurogenesis and cause dominant and recessive neurodevelopmental disorders"
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199", pmid=pmid, pubmed_title=title)
metadata.default_versions_with_hpo(version=hpo_version)
metadata.mondo()

In [3]:
df = pd.read_excel("input/41467_2023_39645_MOESM4_ESM.xlsx");
df.head()

Unnamed: 0,Patient ID,P1,P2,P3,P4,P5,P6,P7,P8,P9,P10,P11,P12,P13,P14,P15,P16,P17 new branch of P13-14),P18,P19 (New patient)
0,Type of mutation,p.R406H (heterozygous),p.E438D (heterozygous),p.K612M (heterozygous),p.A678G (heterozygous),p.R728G (heterozygous),p.R728G (heterozygous),p.Q731E (heterozygous),p.L735F (heterozygous),p.L735F (heterozygous),Stop gain (p.R258X) homozyogus,Stop gain (p.R258X) homozyogus,Splice+deletion (compound heterozygous),Stop gain (p.E485X) homozyogus,Stop gain (p.E485X) homozyogus,Stop gain (p.E485X) homozyogus,Stop gain (p.E485X) homozyogus,Stop gain (p.E485X) homozyogus,Stop gain (p.E485X) homozyogus,Stop gain (p.Q413X) homozygous
1,Gender,Female,NR,Male,Male,Male,Female,Male,Male,Female,Male,Female,Female,Female,Female,Female,Male,Female,Female,Male
2,Gestational age full term (yes/no),no,NR,yes,NR,no,yes,NR,no,yes,no,no,yes,NR,NR,NR,NR,no,yes,yes
3,"Weight (kg, %ile)",<1%ile,NR,25%ile,30%ile,>97%ile,52%ile,4%ile,<1%ile,92%ile,<3%ile,<3%ile,<5%ile,NR,NR,NR,NR,25%ile,NR,Z+ 3.31
4,"Height (cm, %ile)",<1%,NR,3%ile,14%ile,75-90%ile,17%ile,NR,41%ile,75%ile,25%ile,50-75%ile,6%ile,NR,NR,NR,NR,25%ile,NR,Z +2.41


In [4]:
# Transpose table and set patient_id column
patient_id = df.columns
df = df.set_index('Patient ID').T.reset_index()
df["patient_id"] = df["index"]
df.head()

Patient ID,index,Type of mutation,Gender,Gestational age full term (yes/no),"Weight (kg, %ile)","Height (cm, %ile)","Head circunference (cm, %ile, SD)",Intellectual Disability (yes/no),Autistic-like behavior (yes/no),Agressivenes,...,Joint stiffness,Skeletal deformities,Metabolic screening,Urine screening,Peripheral blood smear,Muscle wasting,VEP/ERG,EEG,Other features,patient_id
0,P1,p.R406H (heterozygous),Female,no,<1%ile,<1%,<1%ile,"yes, IQ 61",no,NR,...,yes- right shoulder pain and disuse,scoliosis,NR,normal,NR,no,NR,NR,"1) Microarray duplications at 13q12.11 and 13q31.1. 2) Genome also found VUS in KMT2A c.11578G>A,p.Gly3860Ser, but features not consistent with Wiedemann-Steiner 3) nearsighted",P1
1,P2,p.E438D (heterozygous),NR,NR,NR,NR,NR,NR,NR,NR,...,NR,scolosis,NR,NR,NR,NR,NR,NR,Provisional Wooster-Drought Clinical Diagnosis,P2
2,P3,p.K612M (heterozygous),Male,yes,25%ile,3%ile,1%ile,not checked,no,no,...,no,no,normal,NR,normochromic normocytic anemia,no,NR,NR. No histroiy of seizures.,"1) Provisional Wolf-Hirschorn. Karyotype normal. 2)History of thretened abortion at 3 months of gestation , elective caesarian section , immediate cry but became cyanosed and needed NICU care for 12 days ultrasound head showed grade 1 IVH .",P3
3,P4,p.A678G (heterozygous),Male,NR,30%ile,14%ile,40%ile,yes,no (no formal neuropsych eval yet),no,...,none,none,NR,NR,NR,no,NR,mild generalized background slowing,"1) Cryptorchidism\n2) EMG/Nerve conduction studies: normal\n3) Hypopigmentation, hair biopsy: focal absence of pigment and focal clumping of melanin pigment (has been reported in Griselli syndrome but is non diagnostic)\n4) Muscle biopsy: \n- skeletal muscle with preserved sarcomeric organization\nand no degeneration or inclusio\n- occasional clusters of mitochondria and possibly larger mitochondria.",P4
4,P5,p.R728G (heterozygous),Male,no,>97%ile,75-90%ile,>97%ile,yes,stereotypies,"yes, auto and heteroagressivity",...,yes (knie + upper ankle),NR,normal,normal,NR,no,NR,normal,"obesity, sleep disturbances, constipation",P5


<h2>Ingest clinical data</h2>

In [5]:
scg = SimpleColumnMapperGenerator(df=df,
                                  observed='yes',
                                  excluded='no',
                                  hpo_cr=hpo_cr)

In [6]:
column_mapper_d = scg.try_mapping_columns()

In [7]:
from IPython.display import display, HTML
display(HTML(scg.to_html()))

Result,Columns
Mapped,Joint stiffness; Muscle wasting
Unmapped,"index; Type of mutation; Gender; Gestational age full term (yes/no); Weight (kg, %ile); Height (cm, %ile); Head circunference (cm, %ile, SD); Intellectual Disability (yes/no) ; Autistic-like behavior (yes/no); Agressivenes; Seizures (yes/no); Regression of skills; Gross motor (normal/delayed/absent); Fine motor (normal/delayed/absent) ; Language (normal/delayed/absent); Social interaction (normal/delayed/absent); Cognitive (normal/delayed/absent); Hypertonia (yes/no); Hypotonia (yes/no); Dystonia (yes/no); Reflexes (normal/reduced/absent); Postural control (normal/reduced/absent); Dysmetria (yes/no); Gate (normal/abnormal(describe)); Muscle wasting (yes/no); Central visual impairment (yes/no); Primary optic atrophy (yes/no); Nistagmus (yes/no); Visual fixation and following; Cerebellum; Pons; Cerebral cortex; Ventricles; White mater; Others; Facial features; Cardiovascular findings; Hearing loss (yes/no); Skeletal deformities; Metabolic screening; Urine screening; Peripheral blood smear; VEP/ERG; EEG; Other features; patient_id"


In [8]:
# Now get the unmapped columns and try option mappers
# The following was only needed to write the notebook
#unmapped_columns = scg.get_unmapped_columns()
#omit_columns = set(column_mapper_d.keys())
#omit_columns.update(['index'])
#auto_results = OptionColumnMapper.autoformat(df=df, concept_recognizer=hpo_cr, omit_columns=omit_columns)
#print(auto_results)

In [9]:
intellectual_disability_d = {
 'IQ 61': 'Intellectual disability, mild',
 'yes mild': 'Intellectual disability, mild',
 'yes (severe to profound)': 'Intellectual disability, severe',
 'too young for formal testing (Developmental delay - yes)': 'Global developmental delay',
 'yes IQ: 48': 'Intellectual disability, moderate',
'yes': 'Intellectual disability',}
intellectual_disabilityMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=intellectual_disability_d)
intellectual_disabilityMapper.preview_column(df['Intellectual Disability (yes/no) '])
#column_mapper_d['Intellectual Disability (yes/no) '] =  intellectual_disabilityMapper

Unnamed: 0,terms
0,"HP:0001249 (Intellectual disability/observed); HP:0001256 (Intellectual disability, mild/observed)"
1,
2,
3,HP:0001249 (Intellectual disability/observed)
4,HP:0001249 (Intellectual disability/observed)
5,"HP:0001256 (Intellectual disability, mild/observed); HP:0001249 (Intellectual disability/observed)"
6,HP:0001249 (Intellectual disability/observed)
7,HP:0001249 (Intellectual disability/observed)
8,"HP:0010864 (Intellectual disability, severe/observed); HP:0001249 (Intellectual disability/observed)"
9,HP:0001249 (Intellectual disability/observed)


In [10]:

autistic_like_behavior_d = {
 'stereotypies': 'Motor stereotypy',
 'yes - was being investigated': 'Autism',
 'Severe autism': 'Autism',
 'yes': 'Autism'}
exluded_d = {'no': 'Autism',
            'no formal testing (no concern for ASD)': 'Autism',}
autisticMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=autistic_like_behavior_d, excluded_d=exluded_d)
autisticMapper.preview_column(df['Autistic-like behavior (yes/no)'])
column_mapper_d['Autistic-like behavior (yes/no)'] = autisticMapper

In [11]:
agressivenes_d = {
 'no': 'PLACEHOLDER',
 'yes': 'Aggressive behavior',
 'auto and heteroagressivity': ['Self-injurious behavior', 'Aggressive behavior']}
excluded_D = {"no":'Aggressive behavior' }
agressivenesMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=agressivenes_d, excluded_d=excluded_D)
agressivenesMapper.preview_column(df['Agressivenes'])
column_mapper_d['Agressivenes'] = agressivenesMapper

In [12]:
seizures_d = {
 'yes': 'Seizure',
 'generalized tonic/astatic type epilepsy': 'Seizure',
 '1 episode 3 years ago while sleeping.': 'Seizure'}
excluded_D = {"no":'Seizure' }
seizureMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=seizures_d)
seizureMapper.preview_column(df['Seizures (yes/no)'])
column_mapper_d['Seizures (yes/no)'] = seizureMapper

In [13]:
regression_of_skills_d = {
 'less than 10 single words by age 2.5 but stopped talking at 4yo': 'Developmental regression'}
excluded = { 'no': "Developmental regression"}
regression_of_skillsMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=regression_of_skills_d, 
                                                excluded_d=excluded)
regression_of_skillsMapper.preview_column(df['Regression of skills'])
column_mapper_d['Regression of skills'] = regression_of_skillsMapper

In [14]:
gross_motor_d = {'delayed': 'Delayed gross motor development',
 'lower limb: tone low': 'Appendicular hypotonia',
 'bulk and power (3/5) decreased reflexes brisk  walking started after 5 years of age': 'Delayed gross motor development'}
gross_motorMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=gross_motor_d)
gross_motorMapper.preview_column(df['Gross motor (normal/delayed/absent)'])
column_mapper_d['Gross motor (normal/delayed/absent)'] = gross_motorMapper

In [15]:
fine_motor_d = {'delayed': 'Delayed fine motor development',
 'upper limb tone power normal grip poor': 'Weak grip'}
fine_motorMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=fine_motor_d)
fine_motorMapper.preview_column(df['Fine motor (normal/delayed/absent) '])
column_mapper_d['Fine motor (normal/delayed/absent) '] = fine_motorMapper

In [16]:
language_d = {'delayed': 'Delayed speech and language development',
 'started speaking after 2 1/2 years of age': 'Delayed speech and language development',
 'absent (cries and vocalizes)': 'Absent speech',
 'poor': 'Delayed speech and language development',
 'limited': 'Delayed speech and language development',
 'few words at age 5': 'Delayed speech and language development',
 'non verbal': 'Absent speech'}
languageMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=language_d)
languageMapper.preview_column(df['Language (normal/delayed/absent)'])
column_mapper_d['Language (normal/delayed/absent)'] =languageMapper

In [17]:
social_interaction_d = {'delayed': 'Delayed social development',
 'poor': 'Impaired social interactions',
}
excluded = {'apparently nromal': 'Delayed social development','normal': 'Delayed social development'}
social_interaction_Mapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=social_interaction_d,
                                              excluded_d=excluded)
social_interaction_Mapper.preview_column(df['Social interaction (normal/delayed/absent)'])
column_mapper_d['Social interaction (normal/delayed/absent)'] = social_interaction_Mapper

In [18]:
cognitive__d = {'delayed': 'Intellectual disability',
 'delayed not formally assesed': 'Intellectual disability',
 'moderate to severe ID': 'Intellectual disability'}
cognitive_Mapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=cognitive__d)
cognitive_Mapper.preview_column(df['Cognitive (normal/delayed/absent)'])
column_mapper_d['Cognitive (normal/delayed/absent)'] = cognitive_Mapper

In [19]:
hypertonia_d ={
 'yes': 'Hypertonia'}
hypertoniaMapper = OptionColumnMapper(concept_recognizer=hpo_cr, 
                                               option_d=hypertonia_d, excluded_d={'no': 'Hypertonia',})
hypertoniaMapper.preview_column(df['Hypertonia (yes/no)'])
column_mapper_d['Hypertonia (yes/no)'] = hypertoniaMapper

In [20]:
hypotonia_d = {
 'yes hyper elastic': 'Hypotonia',
 'yes': 'Hypotonia',
 'axial tone more affected than  appendicular': 'Axial hypotonia',
 'muscular hypotonia in the first years': 'Hypotonia'}
hypotoniaMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=hypotonia_d,
                                    excluded_d={'no': 'Hypotonia'})
hypotoniaMapper.preview_column(df['Hypotonia (yes/no)'])
column_mapper_d['Hypotonia (yes/no)'] = hypotoniaMapper

In [21]:
dystonia_d = {
 'but hyperkinetic and dyskinetic movements': 'Dystonia',
 'brisk': 'Dystonia'}
dystonia_Mapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=dystonia_d,
                                    excluded_d={"no":"Dystonia"})
dystonia_Mapper.preview_column(df['Dystonia (yes/no)'])
column_mapper_d['Dystonia (yes/no)'] = dystonia_Mapper

In [22]:
reflexes_d = {
 'brisk with clonus present': 'Clonus',
 'increased': 'Hyperreflexia',
 'resuced': 'Hyporeflexia'}
reflexesMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=reflexes_d,
                                   excluded_d={"no":"Abnormal reflex"})
reflexesMapper.preview_column(df['Reflexes (normal/reduced/absent)'])
column_mapper_d['Reflexes (normal/reduced/absent)'] = reflexesMapper

In [23]:
postural_control_d = {
 'poor balance': 'Postural instability',
 'significantly reduced': 'Postural instability',
 'clumsiness': 'Clumsiness',
 'Absent': 'Postural instability',
 'reduced': 'Postural instability',
 'absent': 'Postural instability'}
postural_controlMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=postural_control_d,
                                           excluded_d={"no":"Postural instability"})
postural_controlMapper.preview_column(df['Postural control (normal/reduced/absent)'])
column_mapper_d['Postural control (normal/reduced/absent)'] = postural_controlMapper

In [24]:
dysmetria_d = {'no': 'PLACEHOLDER',
 'dysmetria': 'Dysmetria',
 'Tremor': 'Tremor',
 'yes': 'Dysmetria',}
dysmetria_Mapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=dysmetria_d,
                                     excluded_d={"no":"Dysmetria", "normal":"Dysmetria"})
dysmetria_Mapper.preview_column(df['Dysmetria (yes/no)'])
column_mapper_d['Dysmetria (yes/no)'] = dysmetria_Mapper

In [25]:
gate_d = {
 'ataxic gait': 'Gait ataxia',
 'no walking or crawling': 'Delayed ability to walk',
 'ataxic gate': 'Gait ataxia',
 'mild ataxia': 'Ataxia',
 'delayed': 'Delayed ability to walk',
 'wide-base gait and toe walking since she started walking at 2.5 yo': 'Tip-toe gait',
 'abnormal/broad based': ' Broad-based gait ',
 'started walking at 2 years can not run at 4yo': 'Delayed ability to walk',
 'abnormal': 'Gait disturbance',
 'abnormal - unsteady': 'Unsteady gait'}
gate_Mapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=gate_d,
                                excluded_d={"no":"Gait disturbance",'normal':"Gait disturbance"})
gate_Mapper.preview_column(df['Gate (normal/abnormal(describe))'])
column_mapper_d['Gate (normal/abnormal(describe))'] = gate_Mapper

In [26]:
muscle_wasting_d = {
 'yes': 'Skeletal muscle atrophy'}
muscle_wasting_Mapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=muscle_wasting_d,
                                          excluded_d={"no":"Skeletal muscle atrophy"})
muscle_wasting_Mapper.preview_column(df['Muscle wasting (yes/no)'])
column_mapper_d['Muscle wasting (yes/no)'] = muscle_wasting_Mapper

In [27]:
central_visual_impairment_d = {
  'yes': 'Cerebral visual impairment'}
central_visual_impairment_Mapper = OptionColumnMapper(concept_recognizer=hpo_cr,
                                                              option_d=central_visual_impairment_d,
                                                             excluded_d={"no":"Cerebral visual impairment"})
central_visual_impairment_Mapper.preview_column(df['Central visual impairment (yes/no)'])
column_mapper_d['Central visual impairment (yes/no)'] = central_visual_impairment_Mapper

In [28]:
primary_optic_atrophy_d = {
 'yes': 'Optic atrophy'}
primary_optic_atrophy_Mapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=primary_optic_atrophy_d,
                                                 excluded_d={"no":"Optic atrophy"})
primary_optic_atrophy_Mapper.preview_column(df['Primary optic atrophy (yes/no)'])
column_mapper_d['Primary optic atrophy (yes/no)'] = primary_optic_atrophy_Mapper

In [29]:
nistagmus_d = {}
nistagmus_Mapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=nistagmus_d,
                                     excluded_d={"no":"Nystagmus"})
nistagmus_Mapper.preview_column(df['Nistagmus (yes/no)'])
column_mapper_d['Nistagmus (yes/no)'] = nistagmus_Mapper


In [30]:
visual_fixation_and_following_d = {
 'esotropia': 'Esotropia',
}
visual_fixation_and_followingMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=visual_fixation_and_following_d)
visual_fixation_and_followingMapper.preview_column(df['Visual fixation and following'])
column_mapper_d['Visual fixation and following'] = visual_fixation_and_followingMapper

In [31]:
ventricles_d = {
 'moderate enlargement of the lateral ventricles': 'Lateral ventricle dilatation',
 'left greater than right': 'Lateral ventricle dilatation',
}
ventriclesMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=ventricles_d,
                                     excluded_d={"normal": "Ventriculomegaly", 'no ventriculomegaly': 'Ventriculomegaly'})
ventriclesMapper.preview_column(df['Ventricles'])
column_mapper_d['Ventricles'] = ventriclesMapper

In [32]:
white_mater_d = {
 'diffuse white matter loss': 'Reduced cerebral white matter volume',
 'particularly diffuse on the posterior left': 'Reduced cerebral white matter volume',
 'atrophy': 'Cerebral white matter atrophy'}
white_materMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=white_mater_d)
white_materMapper.preview_column(df['White mater'])
column_mapper_d['White mater'] = white_materMapper

In [33]:
facial_features_d = {'microcephaly': 'Microcephaly',
 'upslanting palpebral fissures': 'Upslanted palpebral fissure',
 'small ears': 'Microtia',
 'high palate': 'High palate',
 'craniofacial dysmorphism': 'Abnormal facial shape',
 'trigonocephaly': 'Trigonocephaly',
 'almond shaped down slanting eyes': 'Downslanted palpebral fissures',
 'thin upper lip': 'Thin upper lip vermilion',
 'prominent metopic suture giving a helmet  shaped appearance': 'Prominent metopic ridge',
 'frontal bossing': 'Frontal bossing',
 'prognathism': 'Mandibular prognathia',
 'flattened midface': 'Midface retrusion',
 'deep-set eyes': 'Deeply set eye',
 'down-turned corners of mouth': 'Downturned corners of mouth',
 'widely spaced teeth': 'Widely spaced teeth',
 'sparse hair': 'Sparse hair',
 'macrocephally': 'Macrocephaly',
 'coarse face': 'Coarse facial features',
 'deep set eyes': 'Deeply set eye',
 'short philtrum': 'Short philtrum',
 'open mouth': 'Open mouth',
 'everted lips': 'Eclabion',
 'prognathia': 'Mandibular prognathia',
 'mild drooling': 'Drooling',
 'hypoplastic nasal bridge': 'Hypoplastic nasal bridge',
 'thick eyebrows': 'Thick eyebrow',
 'midface retrusion': 'Midface retrusion',
 'prominent fetal finger pads': 'Prominent fingertip pads',
 'frontal bossin with tall forehead': 'High forehead',
 'infraorbital creases': 'Infra-orbital crease',
 'flat nasal bridge with small nose': 'Short nose',
 'macrocephaly': 'Macrocephaly',
 'plagiocephaly': 'Plagiocephaly',
 'intermittent extropia': 'Exotropia',
 'telecanthus': 'Telecanthus',
 'hypertelorism': 'Hypertelorism',
 'horizontal nystagmus': 'Horizontal nystagmus',
 'over folded upper helix': 'Overfolded helix',
 'brad and prominent nasal bridge with saddle nose': 'Concave nasal ridge',
 'flat facial profile': 'Flat face',
 'mild prognathia': 'Mandibular prognathia',
 'elongated facies': 'Long face',
 'thick full lips': 'Thick vermilion border',
 'bulbous tip of nose': 'Prominent nasal tip',
 'proad forehead': 'Broad forehead',
 'hypertolerism': 'Hypertelorism',
 'down slanting palpebral fissure and depressed nasal bridge': 'Depressed nasal bridge',
 'dysmorphic (deep sited eyes hypertelorism': 'Hypertelorism',
 'thick lips': 'Thick vermilion border',
 'prominent ears)': 'Protruding ear'}
facial_featuresMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=facial_features_d)
facial_featuresMapper.preview_column(df['Facial features'])
column_mapper_d['Facial features'] = facial_featuresMapper

In [34]:
cardiovascular_findings_d = {'hypertension with normal renal ultrasound': 'Hypertension',
 'mitral valve prolapse': 'Mitral valve prolapse',
 }
cardiovascular_findingsMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=cardiovascular_findings_d)
cardiovascular_findingsMapper.preview_column(df['Cardiovascular findings'])
column_mapper_d['Cardiovascular findings'] = cardiovascular_findingsMapper

In [35]:
hearing_loss_d = {
 'yes': 'Hearing impairment',}
hearing_loss_Mapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=hearing_loss_d,
                                        excluded_d={'no':'Hearing impairment', 'normal': 'Hearing impairment'})
hearing_loss_Mapper.preview_column(df['Hearing loss (yes/no)'])
column_mapper_d['Hearing loss (yes/no)'] = hearing_loss_Mapper

In [36]:
joint_stiffness_d = {'yes- right shoulder pain and disuse': 'Shoulder pain',
 'yes (knie + upper ankle)': 'Joint stiffness',
 'yes due to lack of use': 'Joint stiffness',}
joint_stiffnessMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=joint_stiffness_d,
                                          excluded_d={"no":"Joint stiffness",
                                                     "none":"Joint stiffness",
                                                      "normal":"Joint stiffness",
                                                     }
                                          )
joint_stiffnessMapper.preview_column(df['Joint stiffness'])
column_mapper_d['Joint stiffness'] = joint_stiffnessMapper

In [37]:
skeletal_deformities_d = {'scoliosis': 'Scoliosis',
 'scolosis': 'Scoliosis',
 'flat feet': 'Pes planus',
 'short limbs relative to trunk': 'Limb undergrowth',
 'pectus carinatum': 'Pectus carinatum',
 'right leg bowed': 'Bowing of the legs',}
skeletal_deformitiesMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=skeletal_deformities_d)
skeletal_deformitiesMapper.preview_column(df['Skeletal deformities'])
column_mapper_d['Skeletal deformities'] = skeletal_deformitiesMapper

In [38]:
others_d = {
 '3) thin corpus callosum 4) hypoplastic optic nerves/chiasm': 'Thin corpus callosum',
 'septo-optic dysplasia with hypoplasia of optic chiasma nd optic nerves anterior pituitary small without a bright spot': 'Septo-optic dysplasia',}
othersMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=others_d)
othersMapper.preview_column(df['Others'])
column_mapper_d['Others'] = othersMapper

In [39]:
metabolic_screening_d = {
 'hypoglycemia': 'Hypoglycemia',
 'High Albumin': 'Hyperalbuminemia',
}
metabolic_screeningMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=metabolic_screening_d)
metabolic_screeningMapper.preview_column(df['Metabolic screening'])
column_mapper_d['Metabolic screening'] = metabolic_screeningMapper

In [40]:
peripheral_blood_smear_d = {
 'normochromic normocytic anemia': 'Normocytic anemia',}
peripheral_blood_smearMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=peripheral_blood_smear_d,
                                                 excluded_d={"no":"Anemia", "normal":"Anemia"})
peripheral_blood_smearMapper.preview_column(df['Peripheral blood smear'])
column_mapper_d['Peripheral blood smear'] = peripheral_blood_smearMapper

In [41]:
df['Type of mutation']

0                      p.R406H (heterozygous)
1                      p.E438D (heterozygous)
2                      p.K612M (heterozygous)
3                      p.A678G (heterozygous)
4                      p.R728G (heterozygous)
5                      p.R728G (heterozygous)
6                      p.Q731E (heterozygous)
7                      p.L735F (heterozygous)
8                      p.L735F (heterozygous)
9              Stop gain (p.R258X) homozyogus
10             Stop gain (p.R258X) homozyogus
11    Splice+deletion (compound heterozygous)
12             Stop gain (p.E485X) homozyogus
13             Stop gain (p.E485X) homozyogus
14             Stop gain (p.E485X) homozyogus
15             Stop gain (p.E485X) homozyogus
16             Stop gain (p.E485X) homozyogus
17             Stop gain (p.E485X) homozyogus
18             Stop gain (p.Q413X) homozygous
Name: Type of mutation, dtype: object

In [42]:
# Finally, the only non-consanguineous case is a child that carries two variants in compound heterozygosity: 
#a large deletion covering exon 8–12 of EZH1 inherited from the mother 
# (c.[664 + 1_665-1]_[1401 + 1_1402-1]del; p.?) 
# and a de novo splice variant affecting the exon 10 splice acceptor site (c.932-1G>A; p.?)
mutation_d = {
    "p.R406H (heterozygous)": ["c.1217G>A", "heterozygous", "VCV000828189.1"],
    "p.E438D (heterozygous)":["c.1314A>T", "heterozygous", "NP_001982.2:p.(Glu438Asp)"],
    "p.K612M (heterozygous)":["c.1835A>T", "heterozygous", "NP_001982.2:p.(Lys612Met)"],
    "p.A678G (heterozygous)": ["c.2033C>G", "heterozygous", "VCV000977755.3"],
    "p.R728G (heterozygous)":["c.2182A>G", "heterozygous", "NP_001982.2:p.(Arg728Gly) - inferred" ],
    "p.Q731E (heterozygous)": ["c.2191C>G", "heterozygous", "VCV002570596.1"],
    "p.L735F (heterozygous)": ["c.2203C>T", "heterozygous", "VCV002570597.1"],
    "Stop gain (p.R258X) homozyogus": ["c.772C>T", "homozygous", "c.772C>T; p.R258X"],
  # "Splice+deletion (compound heterozygous)": [],
    "splice":  ["c.1401+3_1403del", "heterozygous", "Variant validator"],
    "deletion":  ["c.664+2_665-1del", "heterozygous", "Variant validator"],
    "Stop gain (p.E485X) homozyogus": ["c.1453G>T", "homozygous", "VCV002570598.1"],
    "Stop gain (p.Q413X) homozygous": [  "c.1237C>T", "homozygous", "c.1137C>T; p.Q413X - in original pub. Author corrected via email 17/7/23"]  
}
# Variant validator
# NM_001991.5:c.1401+3_1403del
# NM_001991.5:c.664+2_665-1del

In [43]:
EZH1_transcript = "NM_001991.5"
validator = VariantValidator(genome_build='hg38', transcript=EZH1_transcript)
validated_var_d = defaultdict()

In [44]:
for k, var_arr in mutation_d.items():
    var = var_arr[0]
    genotype = var_arr[1]
    print(f"Validating {var}")
    var_object = validator.encode_hgvs(hgvs=var)
    var_object.set_genotype(genotype)
    validated_var_d[k] = var_object
print(f"We got {len(validated_var_d)} variant objects")

Validating c.1217G>A
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001991.5%3Ac.1217G>A/NM_001991.5?content-type=application%2Fjson
Validating c.1314A>T
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001991.5%3Ac.1314A>T/NM_001991.5?content-type=application%2Fjson
Validating c.1835A>T
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001991.5%3Ac.1835A>T/NM_001991.5?content-type=application%2Fjson
Validating c.2033C>G
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001991.5%3Ac.2033C>G/NM_001991.5?content-type=application%2Fjson
Validating c.2182A>G
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001991.5%3Ac.2182A>G/NM_001991.5?content-type=application%2Fjson
Validating c.2191C>G
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_001991.5%3Ac.2191C>G/NM_001991.5?content-type=application%2Fjson
Validating c.2203C>T
https:/

In [45]:
sexMapper = SexColumnMapper(male_symbol='Male', female_symbol='Female', column_name='Gender')
sexMapper.preview_column(df['Gender'])
ageMapper = AgeColumnMapper.not_provided()
encoder = CohortEncoder(df=df, 
                        hpo_cr=hpo_cr, 
                        column_mapper_d=column_mapper_d, 
                        individual_column_name="patient_id", 
                        agemapper=ageMapper, 
                        sexmapper=sexMapper,
                        metadata=metadata,
                        pmid=pmid)
omim_label = "EHH1-related neurodevelopmental disorder"
omim_id = "MONDO:pending"
encoder.set_disease(disease_id=omim_id, label=omim_label)

Could not map sex symbol NR


In [46]:
individual_list = encoder.get_individuals()

Could not map sex symbol NR


In [47]:
for i in individual_list:
    rows = df.loc[df['patient_id'] == i.id]
    if len(rows) != 1:
        raise ValueError(f"Got {len(rows)} rows but expected only 1")
    variant = rows.iloc[0]['Type of mutation']
    if variant in validated_var_d:
        var_object = validated_var_d.get(variant)
        i.add_variant(var_object)
    elif variant == "Splice+deletion (compound heterozygous)":
        v1 = validated_var_d.get("splice")
        v2 = validated_var_d.get("deletion")
        i.add_variant(v1)
        i.add_variant(v2)
    else:
        raise ValueError(f"Could not find variant data for {variant}")

In [48]:
ppacket_list = [i.to_ga4gh_phenopacket(metadata=metadata.to_ga4gh()) for i in individual_list]
table = PhenopacketTable(phenopacket_list=ppacket_list)
display(HTML(table.to_html()))

Individual,Genotype,Phenotypic features
P1 (FEMALE; ),NM_001991.5:c.1217G>A (heterozygous),Shoulder pain (HP:0030834); Delayed gross motor development (HP:0002194); Delayed fine motor development (HP:0010862); Delayed speech and language development (HP:0000750); Delayed social development (HP:0012434); Intellectual disability (HP:0001249); Microcephaly (HP:0000252); Upslanted palpebral fissure (HP:0000582); Microtia (HP:0008551); High palate (HP:0000218); Hypertension (HP:0000822); Scoliosis (HP:0002650)
P2 (UNKNOWN; ),NM_001991.5:c.1314A>T (heterozygous),Delayed gross motor development (HP:0002194); Delayed fine motor development (HP:0010862); Delayed speech and language development (HP:0000750); Intellectual disability (HP:0001249); Mitral valve prolapse (HP:0001634); Scoliosis (HP:0002650)
P3 (MALE; ),NM_001991.5:c.1835A>T (heterozygous),Delayed gross motor development (HP:0002194); Appendicular hypotonia (HP:0012389); Delayed fine motor development (HP:0010862); Weak grip (HP:0033466); Delayed speech and language development (HP:0000750); Intellectual disability (HP:0001249); Hypotonia (HP:0001252); Clonus (HP:0002169); Postural instability (HP:0002172); Gait ataxia (HP:0002066); Abnormal facial shape (HP:0001999); Trigonocephaly (HP:0000243); Downslanted palpebral fissures (HP:0000494); Thin upper lip vermilion (HP:0000219); Prominent metopic ridge (HP:0005487); Normocytic anemia (HP:0001897)
P4 (MALE; ),NM_001991.5:c.2033C>G (heterozygous),Delayed gross motor development (HP:0002194); Delayed fine motor development (HP:0010862); Absent speech (HP:0001344); Delayed social development (HP:0012434); Intellectual disability (HP:0001249); Hypotonia (HP:0001252); Axial hypotonia (HP:0008936); Dystonia (HP:0001332); Postural instability (HP:0002172); Delayed ability to walk (HP:0031936); Frontal bossing (HP:0002007); Mandibular prognathia (HP:0000303); Midface retrusion (HP:0011800); Deeply set eye (HP:0000490); Downturned corners of mouth (HP:0002714); Widely spaced teeth (HP:0000687); Sparse hair (HP:0008070)
P5 (MALE; ),NM_001991.5:c.2182A>G (heterozygous),Joint stiffness (HP:0001387); Motor stereotypy (HP:0000733); Aggressive behavior (HP:0000718); Self-injurious behavior (HP:0100716); Delayed gross motor development (HP:0002194); Delayed fine motor development (HP:0010862); Delayed speech and language development (HP:0000750); Impaired social interactions (HP:0000735); Intellectual disability (HP:0001249); Hypertonia (HP:0001276); Hypotonia (HP:0001252); Hyperreflexia (HP:0001347); Dysmetria (HP:0001310); Tremor (HP:0001337); Gait ataxia (HP:0002066); Esotropia (HP:0000565); Macrocephaly (HP:0000256); Coarse facial features (HP:0000280); Deeply set eye (HP:0000490); Short philtrum (HP:0000322); Open mouth (HP:0000194); Eclabion (HP:0012472); Mandibular prognathia (HP:0000303)
P6 (FEMALE; ),NM_001991.5:c.2182A>G (heterozygous),Aggressive behavior (HP:0000718); Seizure (HP:0001250); Delayed gross motor development (HP:0002194); Delayed fine motor development (HP:0010862); Delayed speech and language development (HP:0000750); Intellectual disability (HP:0001249); Hypotonia (HP:0001252); Clumsiness (HP:0002312); Dysmetria (HP:0001310); Ataxia (HP:0001251); Drooling (HP:0002307); Pes planus (HP:0001763); Hypoglycemia (HP:0001943)
P7 (MALE; ),NM_001991.5:c.2191C>G (heterozygous),Autism (HP:0000717); Delayed gross motor development (HP:0002194); Delayed fine motor development (HP:0010862); Delayed speech and language development (HP:0000750); Delayed social development (HP:0012434); Intellectual disability (HP:0001249); Hypoplastic nasal bridge (HP:0005281); Thick eyebrow (HP:0000574); Midface retrusion (HP:0011800); Prominent fingertip pads (HP:0001212)
P8 (MALE; ),NM_001991.5:c.2203C>T (heterozygous),Aggressive behavior (HP:0000718); Delayed gross motor development (HP:0002194); Delayed fine motor development (HP:0010862); Delayed speech and language development (HP:0000750); Intellectual disability (HP:0001249); Hypotonia (HP:0001252); Delayed ability to walk (HP:0031936); Lateral ventricle dilatation (HP:0006956); Reduced cerebral white matter volume (HP:0034295); High forehead (HP:0000348); Infra-orbital crease (HP:0100876); Short nose (HP:0003196); Limb undergrowth (HP:0009826)
P9 (FEMALE; ),NM_001991.5:c.2203C>T (heterozygous),Joint stiffness (HP:0001387); Autism (HP:0000717); Developmental regression (HP:0002376); Delayed gross motor development (HP:0002194); Delayed fine motor development (HP:0010862); Absent speech (HP:0001344); Delayed social development (HP:0012434); Intellectual disability (HP:0001249); Postural instability (HP:0002172); Tip-toe gait (HP:0030051); Cerebral visual impairment (HP:0100704); Optic atrophy (HP:0000648); Macrocephaly (HP:0000256); Plagiocephaly (HP:0001357); Exotropia (HP:0000577); Telecanthus (HP:0000506); Hypertelorism (HP:0000316); Horizontal nystagmus (HP:0000666); Overfolded helix (HP:0000396); Concave nasal ridge (HP:0011120); Flat face (HP:0012368); Mandibular prognathia (HP:0000303); Hearing impairment (HP:0000365); Septo-optic dysplasia (HP:0100842)
P10 (MALE; ),NM_001991.5:c.772C>T (homozygous),Skeletal muscle atrophy (HP:0003202); Aggressive behavior (HP:0000718); Delayed gross motor development (HP:0002194); Delayed fine motor development (HP:0010862); Delayed speech and language development (HP:0000750); Delayed social development (HP:0012434); Intellectual disability (HP:0001249); Postural instability (HP:0002172); Dysmetria (HP:0001310); Gait disturbance (HP:0001288); Microcephaly (HP:0000252); Long face (HP:0000276); Thick vermilion border (HP:0012471); Microtia (HP:0008551); Prominent nasal tip (HP:0005274); Pectus carinatum (HP:0000768)


In [49]:
output_directory = "phenopackets"
Individual.output_individuals_as_phenopackets(individual_list=individual_list, 
                                              metadata=metadata.to_ga4gh(), 
                                              pmid=pmid, 
                                              outdir=output_directory)

We output 19 GA4GH phenopackets to the directory phenopackets
