<h1>ERI1: Guo et al 2013 </h1>
<p>Extract the clinical data from <a href="https://pubmed.ncbi.nlm.nih.gov/37352860/"target="__blank">Guo L, et al. (2023) Null and missense mutations of ERI1 cause a recessive phenotypic dichotomy in humans. Am J Hum Genet.  PMID:37352860</a>.<p>
<p>The authors report a phenotypic dichotomy associated with bi-allelic ERI1 variants by reporting eight affected individuals from seven unrelated families. A severe spondyloepimetaphyseal dysplasia (SEMD) was identified in five affected individuals with missense variants but not in those with bi-allelic null variants, who showed mild intellectual disability and digital anomalies.</p>

In [28]:
import pandas as pd
import math
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from collections import defaultdict
import re
from pyphetools.creation import *
from pyphetools.visualization import PhenopacketTable, QcVisualizer, HpoaTableBuilder
from pyphetools.validation import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.9.73


In [2]:
PMID="PMID:37352860"
title = "Null and missense mutations of ERI1 cause a recessive phenotypic dichotomy in humans"
cite = Citation(pmid=PMID, title=title)
parser = HpoParser(hpo_json_file="../hp.json")
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199", citation=cite)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2024-03-06


In [3]:
df = pd.read_excel("input/Guo_2023.xlsx")

In [4]:
df = df.set_index('Individual').T.reset_index()
df["individual_id"] = df["index"].apply(lambda x: f"Individual {x}")

In [5]:
scg = SimpleColumnMapperGenerator(df=df,
                                  observed='+',
                                  excluded='-',
                                  hpo_cr=hpo_cr)

In [6]:
column_mapper_list = scg.try_mapping_columns()

In [7]:
from IPython.display import display, HTML
display(HTML(scg.to_html()))

Result,Columns
Mapped,Syndactyly; Cardiac anomaly; Hydronephrosis; Vesicoureteral reflux; Asthma; Conductive hearing impairment; hypernasal speech; Dislocated radial head; Scoliosis; Hip pain; Short stature; Long face; Narrow face; proptosis; Coarse facies; Low-set ears; Limited elbow extension; Finger joint hypermobility; Clinodactyly of the 5th finger; Pes planus; Slender metacarpals; Increased vertebral height; Velopharyngeal insufficiency; Hip dislocation; Patellar dislocation; Narrow forehead; Upslanted palpebral fissure; High palate; Pectus excavatum; Tapered finger; Prominent forehead; Depressed nasal bridge; Micrognathia; Cutaneous syndactyly; Macrotia; Narrow chest; Pulmonary arterial hypertension; Oligodactyly; Tricuspid regurgitation; Platyspondyly; Intrauterine growth retardation; Motor delay; Failure to thrive; Trigonocephaly; Frontal bossing; Sparse hair; Pectus carinatum; Wormian bones; Osteopenia; Delayed skeletal maturation; Inguinal hernia; Ventricular septal defect; Brachycephaly; Anonychia; Strabismus; Low anterior hairline; Epicanthus
Unmapped,index; DNA; allele_1; allele_2; Protein; Sex; age_of_onset; Age at last follow-up; Weight; Height; Consanguinity; Fetal ultrasound; Gestation age; Birth weight; Birth length; Spine anomaly; Metaphyseal anomaly; Epiphyseal anomaly; Brachydactyly/clinodactyly/camptodactyly; Intellectual disability/developmental delay; Zygomatic hypoplasia; Posteriorly rotated ear; Cupped ear ; individual_id


In [8]:
# Now get the unmapped columns and try option mappers
# The following was only needed to write the notebook
# unmapped_columns = scg.get_unmapped_columns()
# omit_columns = set(column_mapper_d.keys())
# omit_columns.update(['index','DNA','Protein','Age at last follow-up','Consanguinity'])
# auto_results = OptionColumnMapper.autoformat(df=df, concept_recognizer=hpo_cr, omit_columns=omit_columns)
# print(auto_results)

In [9]:
weight_d = {'24\xa0kg (−5 SD)': 'Decreased body weight',
 '26\xa0kg (−5 SD)': 'Decreased body weight',
 '3.3\xa0kg (- 4 SD)': 'Decreased body weight',
 'failure to thrive': 'Failure to thrive'}
excluded_d = {
    '22\xa0kg (8th centile)': 'Decreased body weight',
    '62\xa0kg (85th centile)': 'Decreased body weight',
    '27.6\xa0kg (50th centile)': 'Decreased body weight',
    'normal': 'Failure to thrive',
}
weightMapper = OptionColumnMapper(column_name="Weight", concept_recognizer=hpo_cr, option_d=weight_d,
                                 excluded_d=excluded_d)
column_mapper_list.append(weightMapper)
weightMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Decreased body weight (HP:0004325) (observed),3
1,Failure to thrive (HP:0001508) (observed),1
2,Decreased body weight (HP:0004325) (excluded),3
3,Failure to thrive (HP:0001508) (excluded),1


In [10]:
height_d = {'112\xa0cm (−8 SD)': 'Short stature',
 '128\xa0cm (−7 SD)': 'Short stature',
 '50.3\xa0cm (−5 SD)': 'Short stature',
 'short stature': 'Short stature',
 
 '105\xa0cm (<3rd centile)': 'Short stature'}

excluded_d = {
    '130.8\xa0cm (46th centile)': 'Short stature',
 '155\xa0cm (25th centile)': 'Short stature',
 '130\xa0cm (90th centile)': 'Short stature',
}
heightMapper = OptionColumnMapper(column_name="Height", concept_recognizer=hpo_cr, option_d=height_d,
                                excluded_d=excluded_d)
column_mapper_list.append(heightMapper)
heightMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Short stature (HP:0004322) (observed),5
1,Short stature (HP:0004322) (excluded),3


In [11]:
fetal_ultrasound_d = {'hydronephrosis': 'Hydronephrosis',
 'short limbs': 'Limb undergrowth',
 'severe IUGR': 'Intrauterine growth retardation',
 }
excluded = {
    'unremarkable': 'Intrauterine growth retardation'
}
fetal_ultrasoundMapper = OptionColumnMapper(column_name='Fetal ultrasound',concept_recognizer=hpo_cr, option_d=fetal_ultrasound_d)
column_mapper_list.append(fetal_ultrasoundMapper)
fetal_ultrasoundMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Hydronephrosis (HP:0000126) (observed),2
1,Limb undergrowth (HP:0009826) (observed),1
2,Intrauterine growth retardation (HP:0001511) (observed),1


In [12]:
birth_weight_d = {
 '2180\xa0g (−3.2 SD)': 'Small for gestational age',
 '000\xa0g (−3.3 SD)': 'Small for gestational age',}
birth_weightMapper = OptionColumnMapper(column_name='Birth weight',concept_recognizer=hpo_cr, option_d=birth_weight_d)
column_mapper_list.append(birth_weightMapper)
birth_weightMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Small for gestational age (HP:0001518) (observed),2


In [13]:
# Omitting these because we manually curated detailed phenotypes and added them to the input table
#spine_anomaly_d 
#metaphyseal_anomaly_d = {'nan': 'PLACEHOLDER'}
#epiphyseal_anomaly_d = {'+ (wrists)': 'PLACEHOLDER'}

In [14]:
id_gdd_d = {
 'Motor delay': 'Motor delay',
 'Delayed speech and language development': 'Delayed speech and language development',
 'generalized hypotonia': 'Generalized hypotonia',
 'Global developmental delay': 'Global developmental delay',
 'Autism': 'Autism',
 'Intellectual disability mild': 'Intellectual disability, mild',}
id_gddMapper = OptionColumnMapper(column_name='Intellectual disability/developmental delay',
                                  concept_recognizer=hpo_cr, option_d=id_gdd_d)
column_mapper_list.append(id_gddMapper)
id_gddMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Motor delay (HP:0001270) (observed),1
1,Delayed speech and language development (HP:0000750) (observed),1
2,Generalized hypotonia (HP:0001290) (observed),1
3,Global developmental delay (HP:0001263) (observed),4
4,Autism (HP:0000717) (observed),1
5,"Intellectual disability, mild (HP:0001256) (observed)",2
6,Intellectual disability (HP:0001249) (observed),2


In [15]:
ERI1_transcript="NM_153332.4"
vman = VariantManager(df=df,individual_column_name="individual_id",allele_1_column_name="allele_1",
                     allele_2_column_name="allele_2", gene_id="HGNC:23994", gene_symbol="ERI1",
                     transcript=ERI1_transcript)

In [16]:
vman.to_summary()

Unnamed: 0,status,count,alleles
0,mapped,11,"c.464C>T, c.582+1G>A, c.893A>G, c.893A>C, c.352A>T, c.401A>G, c.730C>T, c.450A>T, c.514C>T, c.895T>C, c.62C>A"
1,unmapped,1,g.8783887_9068578del


In [17]:
vman.code_as_chromosomal_deletion({"g.8783887_9068578del"})
vman.to_summary()

Unnamed: 0,status,count,alleles
0,mapped,12,"c.464C>T, c.582+1G>A, c.893A>G, c.893A>C, c.352A>T, c.401A>G, c.730C>T, c.450A>T, c.514C>T, c.895T>C, c.62C>A, g.8783887_9068578del"
1,unmapped,0,


In [18]:
ageMapper = AgeColumnMapper.by_year('Age at last follow-up')
#ageMapper.preview_column(df)
sexMapper = SexColumnMapper(male_symbol='M', female_symbol='F', column_name='Sex')
#sexMapper.preview_column(df)
onsetMapper = AgeColumnMapper.hpo_onset(column_name="age_of_onset")
#onsetMapper.preview_column(df)

<h2>Disease diagnosis</h2>
<p>Diseases related to ERI1 are currenttly not represented in OMIM. For this reason, we represent the diagnosis as preliminary below. The authors write:  SEMD was present in the five individuals with at least one missense variant (Table 1). In contrast, three individuals with ERI1 null mutations and the Eri1 KO mice showed a much milder skeletal phenotype without any evidence for SEMD, consistent with the two individuals reported previously, who had homozygous a 284 kb deletion and p.Lys118∗. Notably, of the five individuals with SEMD, three died within 2 years after birth, suggesting missense variants lead to a poor prognosis.</p>

In [19]:
encoder = CohortEncoder(df=df, 
                        hpo_cr=hpo_cr, 
                        column_mapper_list=column_mapper_list, 
                        individual_column_name="individual_id", 
                        age_at_last_encounter_mapper=ageMapper, 
                        age_of_onset_mapper=onsetMapper,
                        sexmapper=sexMapper,
                        metadata=metadata)
eri1 = Disease(disease_id='OMIM:608739', disease_label='ERI1-related disease')
encoder.set_disease(eri1)

In [20]:
individuals = encoder.get_individuals()

Could not parse the following as ISO8601 ages: na (n=2)


In [21]:
patient_to_variant_d = defaultdict(list)
for _, row in df.iterrows():
    patient_id = row["individual_id"]
    a1 = row["allele_1"]
    a2 = row["allele_2"]
    patient_to_variant_d[patient_id].append(a1)
    patient_to_variant_d[patient_id].append(a2)
validated_var_d = vman.get_variant_d()

In [22]:
for indi in individuals:
    if indi.id not in patient_to_variant_d:
        for k,v in patient_to_variant_d.items():
            print(k, '-'.join(v))
        raise ValueError(f"Error, individual id \"{indi.id}\" without variant data")
    var_list = patient_to_variant_d.get(indi.id)
    if len(var_list) != 2:
        
        raise ValueError(f"Error, malformed variant list for {indi.id}")
    v1 = var_list[0]
    v2 = var_list[1]
    if v1 == v2:
        if v1 == "g.8783887_9068578del":
            #This is a whole-gene deletion
            eri1_id = "HGNC:23994"
            eri1_sumbol = "ERI1"
            var = StructuralVariant.chromosomal_deletion(cell_contents=v1, gene_id=eri1_id, gene_symbol=eri1_sumbol)
            var.set_homozygous()
            indi.add_variant(var)
        else:
            vvar = validated_var_d.get(v1)
            vvar.set_homozygous()
            indi.add_variant(vvar)
    else:
        vvar1 = validated_var_d.get(v1)
        vvar1.set_heterozygous()
        indi.add_variant(vvar1)
        vvar2 = validated_var_d.get(v2)
        vvar2.set_heterozygous()
        indi.add_variant(vvar2)     

# Validation

In [23]:
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.BI_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
WARNING,REDUNDANT,5
INFORMATION,NOT_MEASURED,444


In [24]:
table = PhenopacketTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
Individual 1A (FEMALE; P28Y),ERI1-related disease (OMIM:608739),NM_153332.4:c.450A>T (heterozygous) NM_153332.4:c.893A>G (heterozygous),Clinodactyly of the 5th finger (HP:0004209); Pes planus (HP:0001763); Long face (HP:0000276); Increased vertebral height (HP:0004570); Slender metacarpals (HP:0006236); Hydronephrosis (HP:0000126); Dislocated radial head (HP:0003083); Coarse facial features (HP:0000280); Asthma (HP:0002099); Proptosis (HP:0000520); Finger joint hypermobility (HP:0006094); Decreased body weight (HP:0004325); Hip pain (HP:0030838); Narrow face (HP:0000275); Limited elbow extension (HP:0001377); Scoliosis (HP:0002650); Low-set ears (HP:0000369); Cutaneous syndactyly (HP:0012725); Vesicoureteral reflux (HP:0000076); Conductive hearing impairment (HP:0000405); Hypernasal speech (HP:0001611); Short stature (HP:0004322); excluded: Platyspondyly (HP:0000926); excluded: Failure to thrive (HP:0001508)
Individual 1B (MALE; P26Y),ERI1-related disease (OMIM:608739),NM_153332.4:c.450A>T (heterozygous) NM_153332.4:c.893A>G (heterozygous),Pes planus (HP:0001763); Patellar dislocation (HP:0002999); Narrow forehead (HP:0000341); Increased vertebral height (HP:0004570); Hip dislocation (HP:0002827); Slender metacarpals (HP:0006236); Hydronephrosis (HP:0000126); Pectus excavatum (HP:0000767); Coarse facial features (HP:0000280); Asthma (HP:0002099); Upslanted palpebral fissure (HP:0000582); Decreased body weight (HP:0004325); Velopharyngeal insufficiency (HP:0000220); Limited elbow extension (HP:0001377); High palate (HP:0000218); Scoliosis (HP:0002650); Low-set ears (HP:0000369); Tapered finger (HP:0001182); Cutaneous syndactyly (HP:0012725); Vesicoureteral reflux (HP:0000076); Conductive hearing impairment (HP:0000405); Hypernasal speech (HP:0001611); Short stature (HP:0004322); excluded: Platyspondyly (HP:0000926); excluded: Failure to thrive (HP:0001508)
Individual 2 (MALE; P3Y6M),ERI1-related disease (OMIM:608739),NM_153332.4:c.464C>T (heterozygous) NM_153332.4:c.893A>C (heterozygous),Abnormal heart morphology (HP:0001627); Short stature (HP:0004322); Prominent forehead (HP:0011220); Depressed nasal bridge (HP:0005280); Micrognathia (HP:0000347); Cutaneous syndactyly (HP:0012725); Macrotia (HP:0000400); Narrow chest (HP:0000774); Pulmonary arterial hypertension (HP:0002092); Limb undergrowth (HP:0009826); Small for gestational age (HP:0001518); excluded: Platyspondyly (HP:0000926); excluded: Failure to thrive (HP:0001508)
Individual 3 (MALE; P3Y6M),ERI1-related disease (OMIM:608739),NM_153332.4:c.401A>G (heterozygous) NM_153332.4:c.895T>C (heterozygous),Pulmonary arterial hypertension (HP:0002092); Tricuspid regurgitation (HP:0005180); Small for gestational age (HP:0001518); Oligodactyly (HP:0012165); Syndactyly (HP:0001159); Platyspondyly (HP:0000926); Short stature (HP:0004322); excluded: Failure to thrive (HP:0001508)
Individual 4 (MALE; P2Y),ERI1-related disease (OMIM:608739),NM_153332.4:c.464C>T (heterozygous) NM_153332.4:c.62C>A (heterozygous),Delayed skeletal maturation (HP:0002750); Pectus carinatum (HP:0000768); Wormian bones (HP:0002645); Motor delay (HP:0001270); Hydronephrosis (HP:0000126); Failure to thrive (HP:0001508); Delayed speech and language development (HP:0000750); Intrauterine growth retardation (HP:0001511); Frontal bossing (HP:0002007); Abnormal heart morphology (HP:0001627); Trigonocephaly (HP:0000243); Sparse hair (HP:0008070); Low-set ears (HP:0000369); Global developmental delay (HP:0001263); Micrognathia (HP:0000347); Generalized hypotonia (HP:0001290); Osteopenia (HP:0000938); Vesicoureteral reflux (HP:0000076); Syndactyly (HP:0001159); Short stature (HP:0004322)
Individual 5 (FEMALE; P8Y),ERI1-related disease (OMIM:608739),NM_153332.4:c.514C>T (homozygous),Syndactyly (HP:0001159); Narrow forehead (HP:0000341); Inguinal hernia (HP:0000023); Ventricular septal defect (HP:0001629); Brachycephaly (HP:0000248); Anonychia (HP:0001798); Global developmental delay (HP:0001263); Autism (HP:0000717); excluded: Scoliosis (HP:0002650); excluded: Increased vertebral height (HP:0004570); excluded: Pectus excavatum (HP:0000767); excluded: Platyspondyly (HP:0000926); excluded: Pectus carinatum (HP:0000768); excluded: Wormian bones (HP:0002645); excluded: Osteopenia (HP:0000938); excluded: Delayed skeletal maturation (HP:0002750); excluded: Decreased body weight (HP:0004325); excluded: Short stature (HP:0004322)
Individual 6 (MALE; P13Y),ERI1-related disease (OMIM:608739),NM_153332.4:c.730C>T (homozygous),"Low-set ears (HP:0000369); Strabismus (HP:0000486); Low anterior hairline (HP:0000294); Intellectual disability, mild (HP:0001256); excluded: Scoliosis (HP:0002650); excluded: Increased vertebral height (HP:0004570); excluded: Hip dislocation (HP:0002827); excluded: Patellar dislocation (HP:0002999); excluded: Pectus excavatum (HP:0000767); excluded: Platyspondyly (HP:0000926); excluded: Pectus carinatum (HP:0000768); excluded: Wormian bones (HP:0002645); excluded: Osteopenia (HP:0000938); excluded: Delayed skeletal maturation (HP:0002750); excluded: Decreased body weight (HP:0004325); excluded: Short stature (HP:0004322)"
Individual 7 (FEMALE; P7Y),ERI1-related disease (OMIM:608739),NM_153332.4:c.582+1G>A (homozygous),"Syndactyly (HP:0001159); Hydronephrosis (HP:0000126); Vesicoureteral reflux (HP:0000076); Epicanthus (HP:0000286); Intellectual disability, mild (HP:0001256); excluded: Increased vertebral height (HP:0004570); excluded: Hip dislocation (HP:0002827); excluded: Patellar dislocation (HP:0002999); excluded: Platyspondyly (HP:0000926); excluded: Pectus carinatum (HP:0000768); excluded: Wormian bones (HP:0002645); excluded: Osteopenia (HP:0000938); excluded: Delayed skeletal maturation (HP:0002750); excluded: Decreased body weight (HP:0004325); excluded: Short stature (HP:0004322)"
Individual Hoxha (FEMALE; P7Y),ERI1-related disease (OMIM:608739),NM_153332.4:c.352A>T (homozygous),Intellectual disability (HP:0001249); Global developmental delay (HP:0001263); excluded: Failure to thrive (HP:0001508)
Individual Choucair (MALE; P5Y),ERI1-related disease (OMIM:608739),g.8783887_9068578del: chromosomal_deletion (SO:1000029),Syndactyly (HP:0001159); Abnormal heart morphology (HP:0001627); Short stature (HP:0004322); Intellectual disability (HP:0001249); Global developmental delay (HP:0001263)


In [25]:
output_directory = "phenopackets"
Individual.output_individuals_as_phenopackets(individual_list=individuals, 
                                             metadata=metadata,
                                             outdir=output_directory)

We output 10 GA4GH phenopackets to the directory phenopackets


In [26]:
ppkt_list = [i.to_ga4gh_phenopacket(metadata=metadata) for i in individuals]

In [29]:
builder = HpoaTableBuilder(phenopacket_list=ppkt_list)

In [30]:
PMID = "PMID:37352860" 
builder.autosomal_recessive(PMID)
hpoa_table_creator = builder.build()
df = hpoa_table_creator.get_dataframe()
df.head()

We found a total of 66 unique HPO terms
Extracted disease: ERI1-related disease (OMIM:608739)


Unnamed: 0,#diseaseID,diseaseName,phenotypeID,phenotypeName,onsetID,onsetName,frequency,sex,negation,modifier,description,publication,evidence,biocuration
0,OMIM:608739,ERI1-related disease,HP:0004209,Clinodactyly of the 5th finger,,,1/1,,,,,PMID:37352860,PCS,ORCID:0000-0002-0736-9199
1,OMIM:608739,ERI1-related disease,HP:0001763,Pes planus,,,2/2,,,,,PMID:37352860,PCS,ORCID:0000-0002-0736-9199
2,OMIM:608739,ERI1-related disease,HP:0000276,Long face,,,1/1,,,,,PMID:37352860,PCS,ORCID:0000-0002-0736-9199
3,OMIM:608739,ERI1-related disease,HP:0004570,Increased vertebral height,,,2/5,,,,,PMID:37352860,PCS,ORCID:0000-0002-0736-9199
4,OMIM:608739,ERI1-related disease,HP:0006236,Slender metacarpals,,,2/2,,,,,PMID:37352860,PCS,ORCID:0000-0002-0736-9199


In [31]:
hpoa_table_creator.write_data_frame()

Wrote HPOA disease file to OMIM-608739.tab
