<H1>Sulfite Oxidase Deficiency SUOX</H1>
<P>Data from <a href="https://pubmed.ncbi.nlm.nih.gov/36303223/" target="__blank">Li JT, et al. (2022) Mutation analysis of SUOX in isolated sulfite oxidase deficiency with ectopia lentis as the presenting feature: insights into genotype-phenotype correlation. Orphanet J Rare Dis.17(1):392. PMID:36303223</a>.</P>
<P>We transferred information from Additional Files 5, 6, and 7 to two Excel files to parse the data.</P>

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import display, HTML
from pyphetools.creation import *
from pyphetools.visualization import *
from pyphetools.validation import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.9.70


In [2]:
PMID = "PMID:36303223"
title = "Mutation analysis of SUOX in isolated sulfite oxidase deficiency with ectopia lentis as the presenting feature: insights into genotype-phenotype correlation"
cite = Citation(pmid=PMID, title=title)
parser = HpoParser(hpo_json_file="../hp.json")
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
metadata = MetaData(created_by="ORCID:0000-0003-2598-6622", citation=cite)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2024-03-06


In [3]:
df = pd.read_excel("input/Suox-Li-2022.xlsx")
individual_column_name = 'individual_id'
def get_individual_id(arr):
    iid = arr.iloc[0]
    pmid = arr.iloc[1]
    if pmid == "our patient":
        return "individual_35_PMID_36303223" # from current manuscript
    else:
        return f"individual_{iid}_PMID_{pmid}"
df[individual_column_name] = df[['Proband ID', 'Resource (PMID)']].apply(lambda x: get_individual_id(x), axis=1)
df.head(2)

Unnamed: 0,Proband ID,PMID,Ethnicity,Gender,Parental consanguity,Age at onset (months),Variants,Amino acid,status,Typical type/Mild type,...,Thiosulfate NR: 0,Urine SSC (umol/mmolCr) NR: 0.1-10,Urine Taurine (mmol/molCr) NR: 12-150,Urine XA NR: 0-0.46mmol/L or <40umol/mmolCr or <0.29XA/Cr,Urine HypoXA NR: 0-0.18mmol/L or <8umol/mmolCr or <0.5HypoXA/Cr,Urine UA NR: 0.44-4.50mmol/L or 50-980umol/mmolCr,allele_1,allele_2,comment,individual_id
0,1,9050047,EUR,M,No,0,c.(433delC); (433delC),p.(Q145Sfs*16); (Q145Sfs*16),Homo,T,...,n.a.,320umol/L,95,n.a.,n.a.,n.a.,c.433del,c.433del,c.433delC,individual_1_PMID_9050047
1,2,9600976,EUR,F,Yes,5,c.(650G>A); (650G>A),p.(R217Q); (R217Q),Homo,T,...,0.297-1.632mmol/L,240umol/L,n.a.,0.04mmol/L,0.05mmol/L,0.14mmol/L,c.650G>A,c.650G>A,,individual_2_PMID_9600976


<H2>SUOX variants</H2>
<P>The original supplemental file   reported a variant that is erroneous according to Variant Validator:</p>
<pre>NM_001032386.2:c.1355C>A: Variant reference (C) does not agree with reference sequence (G)</pre>
<p>This is how the variant was reported in the original publication. We changed the C to a G and obtain the
same amino acid change as reported in the original publication: <tt>NP_001027558.1:p.(G452D)</tt>.</p>

In [4]:
SUOX_transcript = 'NM_001032386.2'
vmanager = VariantManager(df=df, allele_1_column_name="allele_1", allele_2_column_name="allele_2",
                          individual_column_name=individual_column_name,
                          gene_symbol="SUOX",
                          transcript=SUOX_transcript)

In [5]:
vmanager.to_summary()

Unnamed: 0,status,count,alleles
0,mapped,32,"c.1096C>T, c.1187A>G, c.1261C>T, c.352C>T, c.1280C>A, c.1376G>A, c.475G>T, c.1549_1574dup, c.205G>C, c.1348T>C, c.1136A>G, c.182T>C, c.1521_1524del, c.433del, c.1201A>G, c.400_403del, c.650G>A, c.284_285insC, c.1097G>A, c.772A>C, c.520del, c.733_736del, c.794C>A, c.427C>A, c.884G>A, c.803G>A, c.1382A>T, c.1084G>A, c.1355G>A, c.1126C>T, c.1200C>G, c.649C>G"
1,unmapped,0,


In [6]:
items = {
    'Developmental delay': ['Neurodevelopmental delay', 'HP:0012758'],
    'Regression': ['Cognitive regression', 'HP:0034332'],
    'Seizure': ['Seizure', 'HP:0001250'],
    'Extrapyramidal symptoms': ['Abnormality of extrapyramidal motor function', 'HP:0002071'],
    'Hypertonia':['Hypertonia','HP:0001276'],
    'Hypotonia': ['Hypotonia','HP:0001252'],
    'Microcephaly':['Microcephaly', 'HP:0000252'],
    'Ectopia lentis':['Ectopia lentis', 'HP:0001083'],
}

column_mapper_list = SimpleColumnMapperGenerator.from_map(items)

<H2>Threshold mappers</H2>
<p>The data contain information about biochemical abnormalities framed as tests with reference ranges and values. We can capture this using threshold mappers</p>

<h3>SSC (umol/L) NR: 0</h3>
<p>SSC refers to S-sulfocysteine; The normal range is absent (not more than zero). The corresponding
HPO term is Elevated circulating S-sulfocysteine concentration HP:0034745.</p>

In [7]:
elevatedSSC = HpTerm(hpo_id="HP:0034745", label="Elevated circulating S-sulfocysteine concentration")
ssc = Thresholder(unit="umol/L",threshold_high=0, hpo_term_high=elevatedSSC)

sscMapper = ThresholdedColumnMapper(column_name='SSC (umol/L) NR: 0',thresholder=ssc)
column_mapper_list.append(sscMapper)
sscMapper.preview_column(df)

Unnamed: 0,mapping: None-0.0 umol/L,count
0,Elevated circulating S-sulfocysteine concentration (HP:0034745): not measured,33
1,Elevated circulating S-sulfocysteine concentration (HP:0034745): observed,2


In [8]:
ht= HpTerm(hpo_id="HP:0500181", label="Hypertaurinemia")
taurine = Thresholder(unit="umol/L", threshold_high=145, hpo_term_high=ht)

taurineMapper = ThresholdedColumnMapper(column_name='Taurine (umol/L) NR: 15-145', thresholder=taurine)
column_mapper_list.append(taurineMapper)
taurineMapper.preview_column(df)

Unnamed: 0,mapping: None-145.0 umol/L,count
0,Hypertaurinemia (HP:0500181): observed,1
1,Hypertaurinemia (HP:0500181): not measured,34


In [9]:
# 'Homocys (umol/L) NR: 5-15' -- Hyperhomocystinemia
hc_high = HpTerm(hpo_id="HP:0002160", label="Hyperhomocystinemia")
hc_low = HpTerm(hpo_id="HP:0020222", label="Hypohomocysteinemia")
homocyteine = Thresholder(unit="uumol/L", threshold_low=5, threshold_high=15, hpo_term_high=hc_high, hpo_term_low=hc_low)
homocyteineMapper = ThresholdedColumnMapper(column_name='Homocys (umol/L) NR: 5-15', thresholder=homocyteine)
column_mapper_list.append(homocyteineMapper)
homocyteineMapper.preview_column(df)

Unnamed: 0,mapping: 5.0-15.0 uumol/L,count
0,Hyperhomocystinemia (HP:0002160): not measured,25
1,Hypohomocysteinemia (HP:0020222): observed,10


In [10]:
# 'Cys (umol/L) NR: 20-70' -- 
# Note this manifests as low circulate Cystine (not Cysteine)
#  
hcy_low = HpTerm(hpo_id="HP:0500152", label="Hypocystinemia")
hypocystinemia = Thresholder(unit="umol/L", threshold_low=20, threshold_high=70, hpo_term_low=hcy_low)
cystineMapper = ThresholdedColumnMapper(column_name='Cys (umol/L) NR: 20-70', thresholder=hypocystinemia)
column_mapper_list.append(cystineMapper)
cystineMapper.preview_column(df)

Unnamed: 0,mapping: 20.0-70.0 umol/L,count
0,Hypocystinemia (HP:0500152): observed,7
1,Hypocystinemia (HP:0500152): not measured,28


In [11]:
# 'UA (umol/L) NR: 210-430'  --  
ua_low = HpTerm(hpo_id="HP:0003537", label="Hypouricemia")
uat = Thresholder(unit="umol/L", threshold_low=210, threshold_high=430, hpo_term_low=ua_low)
uricAcidMapper = ThresholdedColumnMapper(column_name='UA (umol/L) NR: 210-430', thresholder=uat)
uricAcidMapper.preview_column(df)

Unnamed: 0,mapping: 210.0-430.0 umol/L,count
0,Hypouricemia (HP:0003537): not measured,28
1,Hypouricemia (HP:0003537): observed,7


In [12]:
# 'Sulfite (mg/L) NR: 0' 
# df_clinical['Sulfite (mg/L) NR: 0']
# requires new HPO term

In [13]:
# 'Thiosulfate NR: 0 ' -- requires new HPO term

In [14]:
# 'Urine SSC (umol/mmolCr) NR: 0.1-10' 
scu = HpTerm(hpo_id="HP:0032350", label="Sulfocysteinuria")
scu_high = Thresholder(unit="umol/mmolCr", threshold_low=0.1, threshold_high=10, hpo_term_high=scu)
urineSscMapper = ThresholdedColumnMapper(column_name='Urine SSC (umol/mmolCr) NR: 0.1-10', thresholder=scu_high)
column_mapper_list.append(urineSscMapper)
urineSscMapper.preview_column(df)

Unnamed: 0,mapping: 0.1-10.0 umol/mmolCr,count
0,Sulfocysteinuria (HP:0032350): not measured,22
1,Sulfocysteinuria (HP:0032350): observed,13


In [15]:
# 'Urine Taurine (mmol/molCr) NR: 12-150'  -- Increased urinary taurine 
ut = HpTerm(hpo_id="HP:0003166", label="Increased urinary taurine")
utt = Thresholder(unit="mmol/molCr", threshold_high=150, threshold_low=12, hpo_term_high=ut)
urineTaurineMapper = ThresholdedColumnMapper(column_name='Urine Taurine (mmol/molCr) NR: 12-150', thresholder=utt)
column_mapper_list.append(urineTaurineMapper)
urineTaurineMapper.preview_column(df)

Unnamed: 0,mapping: 12.0-150.0 mmol/molCr,count
0,Increased urinary taurine (HP:0003166): not measured,30
1,Increased urinary taurine (HP:0003166): observed,5


In [16]:
# 'Urine XA NR: 0-0.46mmol/L or <40umol/mmolCr or <0.29XA/Cr'
# Here we need to use an OptionColumnMapper because three different measurement ranges are used
urine_xa_d = { '11.7umol/mmolCr':"Xanthinuria",
       '1.7umol/mmolCr':"Xanthinuria" }
urine_not_xa_d = {'0.04mmol/L': "Xanthinuria", 
                  "1.6umol/mmolCr": "Xanthinuria", 
                  "0.0214XA/Cr": "Xanthinuria",
                 "normal": "Xanthinuria"}
urineXAmapper = OptionColumnMapper(column_name="Urine XA NR: 0-0.46mmol/L or <40umol/mmolCr or <0.29XA/Cr",
                                   concept_recognizer=hpo_cr, 
                                   option_d=urine_xa_d, 
                                   excluded_d=urine_not_xa_d)
column_mapper_list.append(urineXAmapper)
urineXAmapper.preview_column(df)

Unnamed: 0,mapping,count
0,Xanthinuria (HP:0010934) (excluded),9
1,Xanthinuria (HP:0010934) (observed),2


In [17]:
# 'Urine HypoXA NR: 0-0.18mmol/L or <8umol/mmolCr or <0.5HypoXA/Cr',
# Increased urinary hypoxanthine level HP:0011814
urine_hxa_d = {'8umol/mmolCr': "Increased urinary hypoxanthine level",}
urine_hxa_excluded_d = {
    'normal': "Increased urinary hypoxanthine level",
    '0.05mmol/L': "Increased urinary hypoxanthine level",
    '0.0264HypoXA/Cr': "Increased urinary hypoxanthine level",
}
urineHXAmapper = OptionColumnMapper(column_name='Urine HypoXA NR: 0-0.18mmol/L or <8umol/mmolCr or <0.5HypoXA/Cr',
    concept_recognizer=hpo_cr, option_d=urine_hxa_d, excluded_d=urine_hxa_excluded_d)
column_mapper_list.append(urineHXAmapper)
urineHXAmapper.preview_column(df)

Unnamed: 0,mapping,count
0,Increased urinary hypoxanthine level (HP:0011814) (excluded),8
1,Increased urinary hypoxanthine level (HP:0011814) (observed),1


In [18]:
# 'Urine UA NR: 0.44-4.50mmol/L or 50-980umol/mmolCr'
# Hyperuricosuria HP:0003149
# Decreased urinary urate HP:0011935
# Abnormality of urinary uric acid level HP:0012610
urine_ua_d = {'0.14mmol/L': "Decreased urinary urate",
             '21umol/mmolCr': "Decreased urinary urate",}
urine_ua_excluded_d = {'normal' : "Abnormality of urinary uric acid level",
                      '385umol/mmolCr': "Abnormality of urinary uric acid level",
                      '430umol/mmolCr': "Abnormality of urinary uric acid level",}
urineUaMapper = OptionColumnMapper(column_name='Urine UA NR: 0.44-4.50mmol/L or 50-980umol/mmolCr',
                                   concept_recognizer=hpo_cr,
                                  option_d=urine_ua_d,
                                  excluded_d=urine_ua_excluded_d)
column_mapper_list.append(urineUaMapper)
urineUaMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Decreased urinary urate (HP:0011935) (observed),2
1,Abnormality of urinary uric acid level (HP:0012610) (excluded),5


<H2>Putting it all together</H2>

In [19]:
ageMapper = AgeColumnMapper.by_month(column_name="Age at onset (months)")
#ageMapper.preview_column(df)

In [20]:
sexMapper = SexColumnMapper(male_symbol='M', female_symbol='F', column_name='Gender', unknown_symbol='n.a.')
#sexMapper.preview_column(df_clinical['Gender'])

In [21]:
encoder = CohortEncoder(df=df, 
                        hpo_cr=hpo_cr, 
                        column_mapper_list=column_mapper_list, 
                        individual_column_name=individual_column_name,
                        age_of_onset_mapper=ageMapper, 
                        sexmapper=sexMapper,
                        metadata=metadata)

sod = Disease(disease_id='OMIM:272300', disease_label='Sulfite oxidase deficiency')
encoder.set_disease(sod)

In [22]:
individuals = encoder.get_individuals()
individual_d = { i.id:i for i in individuals}

In [23]:
var_d = vmanager.get_variant_d()
from collections import defaultdict
in_to_var_d = defaultdict(list)
for i, row in df.iterrows():
    individ_id = row["individual_id"]
    if not individ_id in individual_d:
        raise ValueError(f"could not find individuals{individ_id}")
    individual = individual_d.get(individ_id)
    allele_1 = row["allele_1"]
    allele_2 = row["allele_2"]
    var_1 = var_d.get(allele_1)
    if allele_1 == allele_2:
        var_1.set_homozygous()
        individual.add_variant(var_1)
    else:
        var_2 = var_d.get(allele_2)
        var_1.set_heterozygous()
        var_2.set_heterozygous()
        individual.add_variant(var_1)
        individual.add_variant(var_2)

In [24]:
for indi in individuals:
    var_list = in_to_var_d[indi.id]
    for v in var_list:
        indi.add_variant(v)

In [25]:
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.BI_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
INFORMATION,NOT_MEASURED,257


<H2>Visualization</H2>

In [26]:
table = IndividualTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
individual_1_PMID_9050047 (MALE; n/a),Sulfite oxidase deficiency (OMIM:272300),NM_001032386.2:c.433del (homozygous),Neurodevelopmental delay (HP:0012758); Seizure (HP:0001250); Abnormality of extrapyramidal motor function (HP:0002071); Hypertonia (HP:0001276); Hypotonia (HP:0001252); Microcephaly (HP:0000252); Ectopia lentis (HP:0001083); Hypertaurinemia (HP:0500181); Hypocystinemia (HP:0500152); excluded: Cognitive regression (HP:0034332)
individual_2_PMID_9600976 (FEMALE; n/a),Sulfite oxidase deficiency (OMIM:272300),NM_001032386.2:c.650G>A (homozygous),Neurodevelopmental delay (HP:0012758); Seizure (HP:0001250); Abnormality of extrapyramidal motor function (HP:0002071); Hypertonia (HP:0001276); Hypotonia (HP:0001252); Ectopia lentis (HP:0001083); Decreased urinary urate (HP:0011935); excluded: Cognitive regression (HP:0034332); excluded: Xanthinuria (HP:0010934); excluded: Increased urinary hypoxanthine level (HP:0011814)
individual_3_PMID_10519592 (MALE; n/a),Sulfite oxidase deficiency (OMIM:272300),NM_001032386.2:c.794C>A (heterozygous) NM_001032386.2:c.1280C>A (heterozygous),Seizure (HP:0001250); Abnormality of extrapyramidal motor function (HP:0002071); Hypertonia (HP:0001276); Ectopia lentis (HP:0001083); excluded: Neurodevelopmental delay (HP:0012758); excluded: Cognitive regression (HP:0034332); excluded: Hypotonia (HP:0001252); excluded: Microcephaly (HP:0000252); excluded: Xanthinuria (HP:0010934); excluded: Increased urinary hypoxanthine level (HP:0011814); excluded: Abnormality of urinary uric acid level (HP:0012610)
individual_4_PMID_12112661 (UNKNOWN; n/a),Sulfite oxidase deficiency (OMIM:272300),NM_001032386.2:c.734_737del (homozygous),Seizure (HP:0001250)
individual_5_PMID_12112661 (UNKNOWN; n/a),Sulfite oxidase deficiency (OMIM:272300),NM_001032386.2:c.287dup (heterozygous) NM_001032386.2:c.1126C>T (heterozygous),Seizure (HP:0001250)
individual_6_PMID_12112661 (UNKNOWN; n/a),Sulfite oxidase deficiency (OMIM:272300),NM_001032386.2:c.772A>C (homozygous),Seizure (HP:0001250)
individual_7_PMID_12112661 (UNKNOWN; n/a),Sulfite oxidase deficiency (OMIM:272300),NM_001032386.2:c.803G>A (homozygous),Seizure (HP:0001250)
individual_8_PMID_12112661 (UNKNOWN; n/a),Sulfite oxidase deficiency (OMIM:272300),NM_001032386.2:c.1200C>G (homozygous),Seizure (HP:0001250)
individual_9_PMID_12112661 (UNKNOWN; n/a),Sulfite oxidase deficiency (OMIM:272300),NM_001032386.2:c.1261C>T (homozygous),Seizure (HP:0001250)
individual_10_PMID_12112661 (UNKNOWN; n/a),Sulfite oxidase deficiency (OMIM:272300),NM_001032386.2:c.1084G>A (homozygous),Seizure (HP:0001250)


In [27]:
Individual.output_individuals_as_phenopackets(individual_list=individuals, 
                                              metadata=metadata)

We output 35 GA4GH phenopackets to the directory phenopackets


<H2>Validation</H2>

<p>Also validated with phenopacket-tools</p>
<pre>pxf validate --hpo hp.json *.json</pre>
<p>No errors found</p>