<h1>Genotype–phenotype correlation at codon 1740 of SETD2</h1>
<p>Generate phenopackets from the data reported in <a href="https://pubmed.ncbi.nlm.nih.gov/32710489/">Rabin et al., (2020) Genotype-phenotype correlation at codon 1740 of SETD2</a>.</p>

In [1]:
import phenopackets as php
from google.protobuf.json_format import MessageToDict, MessageToJson
from google.protobuf.json_format import Parse, ParseDict
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from collections import defaultdict
from pyphetools.creation import *
from pyphetools.creation.simple_column_mapper import try_mapping_columns
import numpy as np

In [2]:
parser = HpoParser()
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199")
metadata.default_versions_with_hpo(version=hpo_version)

CC: corpus callosum; SS: subarachnoid spaces; LSVC: left superior vena cava; MVP: mitral valve prolapse; IVF: inferior vena cava; VSD: ventricular septal defect; PDA: patent ductus arteriosis; ASD: atrial septal defect; LVOT: left ventricular outflow tract; DORV: double outlet right ventricle; GTT: gastrostomy tube; NG: nastogastric; PEG: percutaneous endoscopic gastrostomy; VCU: vesicoureteral; SIADH: syndrome of inappropriate antidiuretic hormone secretion; cm: centimeter; kg: kilogram; LP: likely pathogenic; VUS: variant of uncertain significance; NA: not available; N/A: not applicable; 1: Fenton preterm growth charts; 2: CDC growth charts; 3: Rollins, J. D., Collins, J. S., & Holden, K. R. (2010). United States head circumference growth reference charts: birth to 21 years. The Journal of pediatrics, 156(6), 907-913.

In [9]:
df = pd.read_excel("input/RabinSupplementaryTable1-SETD2.xlsx")

In [13]:
df[1:5]

Unnamed: 0,Feature,Group 1 Patient 1\n,Group 1 Patient 2,Group 1 Patient 3\n,Group 1 Patient 4,Group 1 Patient 5,Group 1 Patient 6,Group 1 Patient 7,Group 1 Patient 8,Group 1 Patient 9,Group 1 Patient 10,Group 1 Patient 11,Group 1 Patient 12,Group 2 Patient 1\n,Group 2 Patient 2,Group 2 Patient 3
1,Sex,Male,Male,Male,Female,Female,Female,Male,Female,Female,Male,Female,Female,Male,Male,Female
2,Prenatal complications,"Extra fluid in the back of the cerebellum at 35 weeks; fetal MRI at 35 weeks showed VSD, small cerebellum, and agenesis of the corpus callosum; pre-eclampsia; \nIUGR","Polyhydramnios, maternal asthma, Maternal MVP, maternal cholelithiasis",Pre-eclampsia; fetal ventriculomegaly,,"Fetal cerebellar hypoplasia, ventriculomegaly, intraventricular hemorrhage",Preterm labor,"Twin gestation; pre-eclampsia, heart defect (VSD, cardiomegaly), urogenital anomaly, suspected toxemia of pregnancy",Perterm labor,IVF pregnancy conceived with frozen sperm; increased nuchal translucency; pre-eclampsia,Ambiguous genitalia; enlarged cisterna magna; right dysplastic multi cystic kidney; dandy walker variant; cardiac defect,Polyhydramnios noted at 20 wks,,,Polyhydramnios,
3,Gestational age,36 weeks,36 4/7 weeks,36 weeks,full term,33 weeks,32 2/7 weeks,30 6/7 weeks,35 5/7 weeks,35 weeks,34 3/7 weeks,40 weeks,39 2/7 weeks,Full term,39 weeks,40 weels
4,Perinatal complications,Caesarean,,Caesarean,,,,,,,,Emergency caesarean for fetal decelerations,,,,


In [14]:
# Convert to row-based
dft = df.transpose()

dft.columns = dft.iloc[0]
dft.drop(dft.index[0], inplace=True)
dft.index
dft['patient_id'] = dft.index
dft.head()

Feature,Variant,Sex,Prenatal complications,Gestational age,Perinatal complications,Birth weight,Birth length,Birth head circumference,Growth,Weight,...,Cardiac,Gastrointestinal,Renal / urinary tract,Genital,Skeletal,Neuromuscular,Neuroimaging,Other genetic findings,Other,patient_id
Group 1 Patient 1\n,p.(Arg1740Trp),Male,"Extra fluid in the back of the cerebellum at 35 weeks; fetal MRI at 35 weeks showed VSD, small cerebellum, and agenesis of the corpus callosum; pre-eclampsia; \nIUGR",36 weeks,Caesarean,2126 grams (10%)1,47 cm (50%)1,32.5 cm (45%)1,,7.6 kg at 6 months (40%)2,...,VSD with narrow LVOT; hypoplastic aortic valve; transverse arch hypoplasia and coarctation of the aorta; PDA; ASD,GTT at 4 months,Dilated collecting system; malrotation of right kidney,Cryptorchidism; incomplete foreskin; shawl scrotum,Hip dysplasia at birth,Hypotonia; seizure onset at 5 months; neuromuscular scoliosis,"Widening of SS; enlargement of cisterna magna / extra fuid around the cerebellum; dysgenesis of CC; small pons,",,,Group 1 Patient 1\n
Group 1 Patient 2,p.(Arg1740Trp),Male,"Polyhydramnios, maternal asthma, Maternal MVP, maternal cholelithiasis",36 4/7 weeks,,3175 grams (80%)1,,33 cm (50%)1,,26.4 kg at 10 years (15%)2,...,Normal echocardiogram,GGT in infancy; GE reflux; constipation,Bilateral duplicated kidneys; hydronephrosis; left VCU reflux,Cryptorchidism; penoscrotal transposition,Hip dysplasia,Scoliosis; seizure onset at 4 months (seizure free after age 6); spastic paraplegia,At 4 days old showed microcephaly with simplified gyral pattern; inferior cerebellar hypoplasia; mega cisterna magna; and hypogensis of the genu and rostrum of the corpus callosum,"SMPD1 paternal LP and maternal VUS, CEP290 paternal and maternal VUS; positive Factor V Leiden",Polycythemia at birth; blood clot in IVC,Group 1 Patient 2
Group 1 Patient 3\n,p.(Arg1740Trp),Male,Pre-eclampsia; fetal ventriculomegaly,36 weeks,Caesarean,2465 grams (30%)1,47 cm (65%)1,32 cm (45%)1,,,...,Membranous VSD; mild left pulmonary artery hypoplasia and pulmonary vein stenosis,,Normal abdominal ultrasound (death at 1 month),Cryptorchidism; shawl scrotum,,Neonatal rhythmic jerky movements; abnormal EEG; no clinical diagnosis of seizures,Pontocerebellar hypoplasia; thin corpus callosum; dysplasia of the vestibular apparatus,,,Group 1 Patient 3\n
Group 1 Patient 4,p.(Arg1740Trp),Female,,full term,,2750 grams (7%)2,48 cm (25%)2,31.5 cm (<3%)2,,3840 grams at 3 months (<3%)2,...,Tetralogy of Fallot,NG tube feeding; calcified gallblader,Dysplastic/ multicystic kidneys,,Narrow thorax; deep sacral dimple; low conus,"Birth: stiffening of upper and lower extremities and neck, poor suck",Dandy Walker malformation; hypoplasia of cerebellar vermis (imaging by CT scan),,,Group 1 Patient 4
Group 1 Patient 5,p.(Arg1740Trp),Female,"Fetal cerebellar hypoplasia, ventriculomegaly, intraventricular hemorrhage",33 weeks,,2074 grams (70%)1,,,,14.79 kg 3.5 years (50%)2,...,Congenital heart block (maternal anti-SSA),Poor feeding; GTT,,,Small legs; diffuse osteopenia; contractures; dislocated radial heads; disorganized ossification of the femoral and proximal tibial metaphases; 13 ribs,Hypotonia; many seizures with fever; neuromuscular scoliosis/chest deformity,Enlarged extra axial spaces; small uprotated hippocampi; diffuse reduced volume white matter; moderately enlarged 3rd and lateral ventricles; thin corpus callosum; mild cerebellar vermis and borderline hemisphere hypoplasia; and mildly enlarged posterior fossa consistent with mega cisterna magna,Maternal missense variant in the IGBP1 gene,,Group 1 Patient 5


In [15]:
dft.columns

Index(['Variant', 'Sex', 'Prenatal complications', 'Gestational age',
       'Perinatal complications', 'Birth weight', 'Birth length',
       'Birth head circumference', 'Growth', 'Weight', 'Height',
       'Head circumference', 'Development', 'Walking independently ',
       'Sitting independently', 'Rolling', 'Head control', 'use of hands',
       'speech', 'Fontanelle/ skull',
       'midface hypoplasia/maxillary hypoplasia', 'wide nasal bridge',
       'broad nasal tip', 'Low hanging columella',
       'upslanted palbebral fissures', 'narrow/short palbebral fissures',
       'Periorbital fullness', 'arched eyebrows', 'hypertelorism',
       'micrognathia', 'Minor malfromations of hands and feet',
       'Malformations of the ears', 'Other malformations', 'Ophthalmology',
       'Audiology', 'Endocrine', 'Respiratory', 'Cardiac', 'Gastrointestinal',
       'Renal / urinary tract', 'Genital', 'Skeletal', 'Neuromuscular',
       'Neuroimaging', 'Other genetic findings', 'Other', 'pat