<h1>ERI1: Guo et al 2013 </h1>
<p>Extract the clinical data from <a href="https://pubmed.ncbi.nlm.nih.gov/37352860/"target="__blank">Guo L, et al. (2023) Null and missense mutations of ERI1 cause a recessive phenotypic dichotomy in humans. Am J Hum Genet.  PMID:37352860</a>.<p>
<p>The authors report a phenotypic dichotomy associated with bi-allelic ERI1 variants by reporting eight affected individuals from seven unrelated families. A severe spondyloepimetaphyseal dysplasia (SEMD) was identified in five affected individuals with missense variants but not in those with bi-allelic null variants, who showed mild intellectual disability and digital anomalies.</p>

In [1]:
import pandas as pd
import math
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from collections import defaultdict
import re
from pyphetools.creation import *
from pyphetools.visualization import PhenopacketTable, QcVisualizer
from pyphetools.validation import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.9.35


In [2]:
PMID="PMID:37352860"
title = "Null and missense mutations of ERI1 cause a recessive phenotypic dichotomy in humans"
cite = Citation(pmid=PMID, title=title)
parser = HpoParser(hpo_json_file="../hp.json")
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199", citation=cite)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2024-01-16


In [3]:
df = pd.read_excel("input/Guo_2023.xlsx")

In [4]:
#patient_id = df.columns

In [5]:
df = df.set_index('Individual').T.reset_index()
df["patient_id"] = df["index"]

In [6]:
scg = SimpleColumnMapperGenerator(df=df,
                                  observed='+',
                                  excluded='-',
                                  hpo_cr=hpo_cr)

In [7]:
column_mapper_list = scg.try_mapping_columns()

In [8]:
from IPython.display import display, HTML
display(HTML(scg.to_html()))

Result,Columns
Mapped,Syndactyly; Cardiac anomaly; Hydronephrosis; Vesicoureteral reflux; Asthma; Conductive hearing impairment; hypernasal speech; Dislocated radial head; Scoliosis; Hip pain; Short stature; Long face; Narrow face; proptosis; Coarse facies; Low-set ears; Limited elbow extension; Finger joint hypermobility; Clinodactyly of the 5th finger; Pes planus; Slender metacarpals; Increased vertebral height; Velopharyngeal insufficiency; Hip dislocation; Patellar dislocation; Narrow forehead; Upslanted palpebral fissure; High palate; Pectus excavatum; Tapered finger; Prominent forehead; Depressed nasal bridge; Micrognathia; Cutaneous syndactyly; Macrotia; Narrow chest; Pulmonary arterial hypertension; Oligodactyly; Tricuspid regurgitation; Platyspondyly; Intrauterine growth retardation; Motor delay; Failure to thrive; Trigonocephaly; Frontal bossing; Sparse hair; Pectus carinatum; Wormian bones; Osteopenia; Delayed skeletal maturation; Inguinal hernia; Ventricular septal defect; Brachycephaly; Anonychia; Strabismus; Low anterior hairline; Epicanthus
Unmapped,index; DNA; Protein; Sex; Age at last follow-up; Weight; Height; Consanguinity; Fetal ultrasound; Gestation age; Birth weight; Birth length; Spine anomaly; Metaphyseal anomaly; Epiphyseal anomaly; Brachydactyly/clinodactyly/camptodactyly; Intellectual disability/developmental delay; Zygomatic hypoplasia; Posteriorly rotated ear; Cupped ear ; patient_id


In [9]:
# Now get the unmapped columns and try option mappers
# The following was only needed to write the notebook
# unmapped_columns = scg.get_unmapped_columns()
# omit_columns = set(column_mapper_d.keys())
# omit_columns.update(['index','DNA','Protein','Age at last follow-up','Consanguinity'])
# auto_results = OptionColumnMapper.autoformat(df=df, concept_recognizer=hpo_cr, omit_columns=omit_columns)
# print(auto_results)

In [10]:
weight_d = {'24\xa0kg (−5 SD)': 'Decreased body weight',
 '26\xa0kg (−5 SD)': 'Decreased body weight',
 '3.3\xa0kg (- 4 SD)': 'Decreased body weight',
 'failure to thrive': 'Failure to thrive'}
excluded_d = {
    '22\xa0kg (8th centile)': 'Decreased body weight',
    '62\xa0kg (85th centile)': 'Decreased body weight',
    '27.6\xa0kg (50th centile)': 'Decreased body weight',
    'normal': 'Failure to thrive',
}
weightMapper = OptionColumnMapper(column_name="Weight", concept_recognizer=hpo_cr, option_d=weight_d,
                                 excluded_d=excluded_d)
column_mapper_list.append(weightMapper)
weightMapper.preview_column(df)

Unnamed: 0,mapping,count
0,"original value: ""24 kg (−5 SD)"" -> HP: Decreased body weight (HP:0004325) (observed)",1
1,"original value: ""26 kg (−5 SD)"" -> HP: Decreased body weight (HP:0004325) (observed)",1
2,"original value: ""3.3 kg (- 4 SD)"" -> HP: Decreased body weight (HP:0004325) (observed)",1
3,"original value: ""failure to thrive"" -> HP: Failure to thrive (HP:0001508) (observed)",1
4,"original value: ""22 kg (8th centile)"" -> HP: Decreased body weight (HP:0004325) (excluded)",1
5,"original value: ""62 kg (85th centile)"" -> HP: Decreased body weight (HP:0004325) (excluded)",1
6,"original value: ""27.6 kg (50th centile)"" -> HP: Decreased body weight (HP:0004325) (excluded)",1
7,"original value: ""normal"" -> HP: Failure to thrive (HP:0001508) (excluded)",1


In [11]:
height_d = {'112\xa0cm (−8 SD)': 'Short stature',
 '128\xa0cm (−7 SD)': 'Short stature',
 '50.3\xa0cm (−5 SD)': 'Short stature',
 'short stature': 'Short stature',
 
 '105\xa0cm (<3rd centile)': 'Short stature'}

excluded_d = {
    '130.8\xa0cm (46th centile)': 'Short stature',
 '155\xa0cm (25th centile)': 'Short stature',
 '130\xa0cm (90th centile)': 'Short stature',
}
heightMapper = OptionColumnMapper(column_name="Height", concept_recognizer=hpo_cr, option_d=height_d,
                                excluded_d=excluded_d)
column_mapper_list.append(heightMapper)
heightMapper.preview_column(df)

Unnamed: 0,mapping,count
0,"original value: ""112 cm (−8 SD)"" -> HP: Short stature (HP:0004322) (observed)",1
1,"original value: ""128 cm (−7 SD)"" -> HP: Short stature (HP:0004322) (observed)",1
2,"original value: ""50.3 cm (−5 SD)"" -> HP: Short stature (HP:0004322) (observed)",1
3,"original value: ""short stature"" -> HP: Short stature (HP:0004322) (observed)",1
4,"original value: ""130.8 cm (46th centile)"" -> HP: Short stature (HP:0004322) (excluded)",1
5,"original value: ""155 cm (25th centile)"" -> HP: Short stature (HP:0004322) (excluded)",1
6,"original value: ""130 cm (90th centile)"" -> HP: Short stature (HP:0004322) (excluded)",1
7,"original value: ""105 cm (<3rd centile)"" -> HP: Short stature (HP:0004322) (observed)",1


In [12]:
fetal_ultrasound_d = {'hydronephrosis': 'Hydronephrosis',
 'short limbs': 'Limb undergrowth',
 'severe IUGR': 'Intrauterine growth retardation',
 }
excluded = {
    'unremarkable': 'Intrauterine growth retardation'
}
fetal_ultrasoundMapper = OptionColumnMapper(column_name='Fetal ultrasound',concept_recognizer=hpo_cr, option_d=fetal_ultrasound_d)
column_mapper_list.append(fetal_ultrasoundMapper)
fetal_ultrasoundMapper.preview_column(df)

Unnamed: 0,mapping,count
0,"original value: ""hydronephrosis"" -> HP: Hydronephrosis (HP:0000126) (observed)",2
1,"original value: ""short limbs"" -> HP: Limb undergrowth (HP:0009826) (observed)",1
2,"original value: ""severe IUGR"" -> HP: Intrauterine growth retardation (HP:0001511) (observed)",1


In [13]:
birth_weight_d = {
 '2180\xa0g (−3.2 SD)': 'Small for gestational age',
 '000\xa0g (−3.3 SD)': 'Small for gestational age',}
birth_weightMapper = OptionColumnMapper(column_name='Birth weight',concept_recognizer=hpo_cr, option_d=birth_weight_d)
column_mapper_list.append(birth_weightMapper)
birth_weightMapper.preview_column(df)

Unnamed: 0,mapping,count
0,"original value: ""2180 g (−3.2 SD)"" -> HP: Small for gestational age (HP:0001518) (observed)",1
1,"original value: ""2,000 g (−3.3 SD)"" -> HP: Small for gestational age (HP:0001518) (observed)",1


In [14]:
# Omitting these because we manually curated detailed phenotypes and added them to the input table
#spine_anomaly_d 
#metaphyseal_anomaly_d = {'nan': 'PLACEHOLDER'}
#epiphyseal_anomaly_d = {'+ (wrists)': 'PLACEHOLDER'}

In [15]:
id_gdd_d = {
 'Motor delay': 'Motor delay',
 'Delayed speech and language development': 'Delayed speech and language development',
 'generalized hypotonia': 'Generalized hypotonia',
 'Global developmental delay': 'Global developmental delay',
 'Autism': 'Autism',
 'Intellectual disability mild': 'Intellectual disability, mild',}
id_gddMapper = OptionColumnMapper(column_name='Intellectual disability/developmental delay',
                                  concept_recognizer=hpo_cr, option_d=id_gdd_d)
column_mapper_list.append(id_gddMapper)
id_gddMapper.preview_column(df)

Unnamed: 0,mapping,count
0,"original value: ""Motor delay, Delayed speech and language development, generalized hypotonia, Global developmental delay"" -> HP: Motor delay (HP:0001270) (observed)",1
1,"original value: ""Global developmental delay, Autism"" -> HP: Global developmental delay (HP:0001263) (observed)",1
2,"original value: ""Intellectual disability mild"" -> HP: Intellectual disability, mild (HP:0001256) (observed)",2
3,"original value: ""Intellectual disability, Global developmental delay"" -> HP: Intellectual disability (HP:0001249) (observed)",2


<H2>Variants</H2>

In [16]:
patient_to_variant_d = defaultdict(list)
variant_set = set()   
for _, row in df.iterrows():
    pat_id = str(row['patient_id']) # needed because some IDs come out as ints
    dna_string = row['DNA']    
    fields = dna_string.split(";")
    if len(fields) != 2:
        raise ValueError(f"Malformed variant line {dna_string}")
    for var in fields:
        variant_str = var.strip()
        variant_str = re.sub(r"(c.)?\[","", variant_str)
        variant_str = variant_str.replace("]", "").strip()
        if "8783887" in variant_str:  # this is the structural variant
            variant_set.add("g.8783887_9068578del")
            patient_to_variant_d[pat_id].append("g.8783887_9068578del")
        else:
            variant_str = "c." + variant_str
            variant_set.add(variant_str)
            patient_to_variant_d[pat_id].append(variant_str)
print(f"We got {len(variant_set)} distinct variants.")

We got 12 distinct variants.


In [17]:
validator = VariantValidator(genome_build='hg38', transcript="NM_153332.4")
validated_var_d = defaultdict()
for var in variant_set:
    print(f"Validating {var}")
    if var == 'g.8783887_9068578del':
        sv = StructuralVariant.chromosomal_deletion(cell_contents='Deletion exons 1-4',
                 gene_symbol="ERI1",
                 gene_id="HGNC:23994")
        validated_var_d[var] = sv
    else:
        var_object = validator.encode_hgvs(hgvs=var)
        validated_var_d[var] = var_object
print(f"We got {len(validated_var_d)} variant objects")

Validating c.582+1G>A
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_153332.4%3Ac.582+1G>A/NM_153332.4?content-type=application%2Fjson
Validating c.895T>C
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_153332.4%3Ac.895T>C/NM_153332.4?content-type=application%2Fjson
Validating c.893A>G
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_153332.4%3Ac.893A>G/NM_153332.4?content-type=application%2Fjson
Validating c.464C>T
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_153332.4%3Ac.464C>T/NM_153332.4?content-type=application%2Fjson
Validating c.401A>G
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_153332.4%3Ac.401A>G/NM_153332.4?content-type=application%2Fjson
Validating c.893A>C
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_153332.4%3Ac.893A>C/NM_153332.4?content-type=application%2Fjson
Validating c.514C>T
https://rest.var

In [18]:
ageMapper = AgeColumnMapper.by_year('Age at last follow-up')
#ageMapper.preview_column(df)
sexMapper = SexColumnMapper(male_symbol='M', female_symbol='F', column_name='Sex')
#sexMapper.preview_column(df)

<h2>Disease diagnosis</h2>
<p>Diseases related to ERI1 are currenttly not represented in OMIM. For this reason, we represent the diagnosis as preliminary below. The authors write:  SEMD was present in the five individuals with at least one missense variant (Table 1). In contrast, three individuals with ERI1 null mutations and the Eri1 KO mice showed a much milder skeletal phenotype without any evidence for SEMD, consistent with the two individuals reported previously, who had homozygous a 284 kb deletion and p.Lys118∗. Notably, of the five individuals with SEMD, three died within 2 years after birth, suggesting missense variants lead to a poor prognosis.</p>

In [19]:
encoder = CohortEncoder(df=df, 
                        hpo_cr=hpo_cr, 
                        column_mapper_list=column_mapper_list, 
                        individual_column_name="patient_id", 
                        agemapper=ageMapper, 
                        sexmapper=sexMapper,
                        metadata=metadata)
eri1 = Disease(disease_id='OMIM: 608739', disease_label='ERI1-related disease')
encoder.set_disease(eri1)

In [20]:
individuals = encoder.get_individuals()

In [21]:
for indi in individuals:
    if indi.id not in patient_to_variant_d:
        raise ValueError(f"Error, individual id \"{indi.id}\" without variant data")
    var_list = patient_to_variant_d.get(indi.id)
    if len(var_list) != 2:
        raise ValueError(f"Error, malformed variant list for {indi.id}")
    v1 = var_list[0]
    v2 = var_list[1]
    if v1 == v2:
        if v1 == "g.8783887_9068578del":
            #This is a whole-gene deletion
            eri1_id = "HGNC:23994"
            eri1_sumbol = "ERI1"
            var = StructuralVariant.chromosomal_deletion(cell_contents=v1, gene_id=eri1_id, gene_symbol=eri1_sumbol)
            var.set_homozygous()
            indi.add_variant(var)
        else:
            vvar = validated_var_d.get(v1)
            vvar.set_homozygous()
            indi.add_variant(vvar)
    else:
        vvar1 = validated_var_d.get(v1)
        vvar1.set_heterozygous()
        indi.add_variant(vvar1)
        vvar2 = validated_var_d.get(v2)
        vvar2.set_heterozygous()
        indi.add_variant(vvar2)     

In [22]:
# Validation

In [23]:
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.BI_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
WARNING,REDUNDANT,5
INFORMATION,NOT_MEASURED,444


In [24]:
table = PhenopacketTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
1A (FEMALE; P28Y),ERI1-related disease (OMIM: 608739),NM_153332.4:c.450A>T (heterozygous) NM_153332.4:c.893A>G (heterozygous),Low-set ears (HP:0000369); Hydronephrosis (HP:0000126); Limited elbow extension (HP:0001377); Pes planus (HP:0001763); Dislocated radial head (HP:0003083); Scoliosis (HP:0002650); Narrow face (HP:0000275); Hip pain (HP:0030838); Long face (HP:0000276); Asthma (HP:0002099); Slender metacarpals (HP:0006236); Short stature (HP:0004322); Finger joint hypermobility (HP:0006094); Conductive hearing impairment (HP:0000405); Cutaneous syndactyly (HP:0012725); Increased vertebral height (HP:0004570); Coarse facial features (HP:0000280); Proptosis (HP:0000520); Hypernasal speech (HP:0001611); Vesicoureteral reflux (HP:0000076); Decreased body weight (HP:0004325); Clinodactyly of the 5th finger (HP:0004209); excluded: Platyspondyly (HP:0000926); excluded: Failure to thrive (HP:0001508)
1B (MALE; P26Y),ERI1-related disease (OMIM: 608739),NM_153332.4:c.450A>T (heterozygous) NM_153332.4:c.893A>G (heterozygous),High palate (HP:0000218); Low-set ears (HP:0000369); Hydronephrosis (HP:0000126); Limited elbow extension (HP:0001377); Pes planus (HP:0001763); Pectus excavatum (HP:0000767); Upslanted palpebral fissure (HP:0000582); Scoliosis (HP:0002650); Narrow forehead (HP:0000341); Asthma (HP:0002099); Slender metacarpals (HP:0006236); Short stature (HP:0004322); Velopharyngeal insufficiency (HP:0000220); Conductive hearing impairment (HP:0000405); Cutaneous syndactyly (HP:0012725); Increased vertebral height (HP:0004570); Coarse facial features (HP:0000280); Hypernasal speech (HP:0001611); Vesicoureteral reflux (HP:0000076); Decreased body weight (HP:0004325); Hip dislocation (HP:0002827); Tapered finger (HP:0001182); Patellar dislocation (HP:0002999); excluded: Platyspondyly (HP:0000926); excluded: Failure to thrive (HP:0001508)
2 (MALE; P3Y6M),ERI1-related disease (OMIM: 608739),NM_153332.4:c.464C>T (heterozygous) NM_153332.4:c.893A>C (heterozygous),Abnormal heart morphology (HP:0001627); Short stature (HP:0004322); Prominent forehead (HP:0011220); Depressed nasal bridge (HP:0005280); Micrognathia (HP:0000347); Cutaneous syndactyly (HP:0012725); Macrotia (HP:0000400); Narrow chest (HP:0000774); Pulmonary arterial hypertension (HP:0002092); Limb undergrowth (HP:0009826); Small for gestational age (HP:0001518); excluded: Platyspondyly (HP:0000926); excluded: Failure to thrive (HP:0001508)
3 (MALE; P3Y6M),ERI1-related disease (OMIM: 608739),NM_153332.4:c.401A>G (heterozygous) NM_153332.4:c.895T>C (heterozygous),Short stature (HP:0004322); Platyspondyly (HP:0000926); Syndactyly (HP:0001159); Tricuspid regurgitation (HP:0005180); Small for gestational age (HP:0001518); Pulmonary arterial hypertension (HP:0002092); Oligodactyly (HP:0012165); excluded: Failure to thrive (HP:0001508)
4 (MALE; P2Y),ERI1-related disease (OMIM: 608739),NM_153332.4:c.464C>T (heterozygous) NM_153332.4:c.62C>A (heterozygous),Low-set ears (HP:0000369); Hydronephrosis (HP:0000126); Frontal bossing (HP:0002007); Delayed speech and language development (HP:0000750); Wormian bones (HP:0002645); Micrognathia (HP:0000347); Motor delay (HP:0001270); Intrauterine growth retardation (HP:0001511); Osteopenia (HP:0000938); Trigonocephaly (HP:0000243); Global developmental delay (HP:0001263); Pectus carinatum (HP:0000768); Short stature (HP:0004322); Failure to thrive (HP:0001508); Vesicoureteral reflux (HP:0000076); Abnormal heart morphology (HP:0001627); Generalized hypotonia (HP:0001290); Sparse hair (HP:0008070); Syndactyly (HP:0001159); Delayed skeletal maturation (HP:0002750)
5 (FEMALE; P8Y),ERI1-related disease (OMIM: 608739),NM_153332.4:c.514C>T (homozygous),Syndactyly (HP:0001159); Narrow forehead (HP:0000341); Inguinal hernia (HP:0000023); Ventricular septal defect (HP:0001629); Brachycephaly (HP:0000248); Anonychia (HP:0001798); Global developmental delay (HP:0001263); Autism (HP:0000717); excluded: Scoliosis (HP:0002650); excluded: Increased vertebral height (HP:0004570); excluded: Pectus excavatum (HP:0000767); excluded: Platyspondyly (HP:0000926); excluded: Pectus carinatum (HP:0000768); excluded: Wormian bones (HP:0002645); excluded: Osteopenia (HP:0000938); excluded: Delayed skeletal maturation (HP:0002750); excluded: Decreased body weight (HP:0004325); excluded: Short stature (HP:0004322)
6 (MALE; P13Y),ERI1-related disease (OMIM: 608739),NM_153332.4:c.730C>T (homozygous),"Low-set ears (HP:0000369); Strabismus (HP:0000486); Low anterior hairline (HP:0000294); Intellectual disability, mild (HP:0001256); excluded: Scoliosis (HP:0002650); excluded: Increased vertebral height (HP:0004570); excluded: Hip dislocation (HP:0002827); excluded: Patellar dislocation (HP:0002999); excluded: Pectus excavatum (HP:0000767); excluded: Platyspondyly (HP:0000926); excluded: Pectus carinatum (HP:0000768); excluded: Wormian bones (HP:0002645); excluded: Osteopenia (HP:0000938); excluded: Delayed skeletal maturation (HP:0002750); excluded: Decreased body weight (HP:0004325); excluded: Short stature (HP:0004322)"
7 (FEMALE; P7Y),ERI1-related disease (OMIM: 608739),NM_153332.4:c.582+1G>A (homozygous),"Syndactyly (HP:0001159); Hydronephrosis (HP:0000126); Vesicoureteral reflux (HP:0000076); Epicanthus (HP:0000286); Intellectual disability, mild (HP:0001256); excluded: Increased vertebral height (HP:0004570); excluded: Hip dislocation (HP:0002827); excluded: Patellar dislocation (HP:0002999); excluded: Platyspondyly (HP:0000926); excluded: Pectus carinatum (HP:0000768); excluded: Wormian bones (HP:0002645); excluded: Osteopenia (HP:0000938); excluded: Delayed skeletal maturation (HP:0002750); excluded: Decreased body weight (HP:0004325); excluded: Short stature (HP:0004322)"
Hoxha (FEMALE; P7Y),ERI1-related disease (OMIM: 608739),NM_153332.4:c.352A>T (homozygous),Intellectual disability (HP:0001249); Global developmental delay (HP:0001263); excluded: Failure to thrive (HP:0001508)
Choucair (MALE; P5Y),ERI1-related disease (OMIM: 608739),g.8783887_9068578del: chromosomal_deletion (SO:1000029),Syndactyly (HP:0001159); Abnormal heart morphology (HP:0001627); Short stature (HP:0004322); Intellectual disability (HP:0001249); Global developmental delay (HP:0001263)


In [25]:
output_directory = "phenopackets"
Individual.output_individuals_as_phenopackets(individual_list=individuals, 
                                             metadata=metadata,
                                             outdir=output_directory)

We output 10 GA4GH phenopackets to the directory phenopackets


In [26]:
# pxf validate --hpo=hp.json *.json
# no errors