# SUOX


Sulfite Oxidase Deficiency

Data from [Li JT, et al. Mutation analysis of SUOX in isolated sulfite oxidase deficiency with ectopia lentis as the presenting feature: insights into genotype-phenotype correlation](https://pubmed.ncbi.nlm.nih.gov/36303223/)

In [1]:
import genophenocorr

print(f"Using genophenocorr version {genophenocorr.__version__}")

Using genophenocorr version 0.1.1dev


## Setup

## Logging

Setup logging to get notified about progress, issues, etc..

In [2]:
import hpotk

hpotk.util.setup_logging()

### Load HPO

We use HPO `v2023-10-09` release for this analysis.

In [3]:
fpath_hpo = 'https://github.com/obophenotype/human-phenotype-ontology/releases/download/v2023-10-09/hp.json'
hpo = hpotk.load_minimal_ontology(fpath_hpo)

### Load Phenopackets

We will load phenopacket JSON files located in `phenopackets` folder that is next to the notebook.

In [4]:
from genophenocorr.preprocessing import configure_caching_cohort_creator, load_phenopacket_folder

fpath_phenopackets = 'phenopackets'
cohort_creator = configure_caching_cohort_creator(hpo)
cohort = load_phenopacket_folder(fpath_phenopackets, cohort_creator)

Patients Created: 100%|██████████| 35/35 [00:00<00:00, 365.76it/s]
2024-01-16 09:36:24,078 genophenocorr.preprocessing INFO : Validated under lenient policy


### Pick transcript

We choose the [MANE Select](https://www.ncbi.nlm.nih.gov/nuccore/NM_001032386.2) transcript for *SUOX*.

In [5]:
tx_id = 'NM_001032386.2'

## Explore cohort

Explore the cohort to guide selection of the genotype-phenotype analysis.

In [6]:
from IPython.display import HTML, display
from genophenocorr.view import CohortViewer

viewer = CohortViewer(hpo)

In [7]:
cohort.list_all_variants(10)

[('12_56004589_56004589_C_G', 7),
 ('12_56004039_56004039_G_A', 3),
 ('12_56004485_56004485_C_T', 3),
 ('12_56004765_56004765_G_A', 3),
 ('12_56004771_56004771_A_T', 2),
 ('12_56004273_56004273_G_A', 2),
 ('12_56004905_56004909_ATTGT_A', 2),
 ('12_56004933_56004933_A_ACAATGTGCAGCCAGACACCGTGGCCC', 2),
 ('12_56004192_56004192_G_A', 1),
 ('12_56004473_56004473_G_A', 1)]

In [8]:
cohort.list_all_phenotypes()

[('HP:0001250', 28),
 ('HP:0001252', 15),
 ('HP:0032350', 13),
 ('HP:0001276', 11),
 ('HP:0002071', 11),
 ('HP:0000252', 10),
 ('HP:0012758', 8),
 ('HP:0001083', 7),
 ('HP:0500152', 7),
 ('HP:0003537', 7),
 ('HP:0034332', 6),
 ('HP:0003166', 5),
 ('HP:0010934', 2),
 ('HP:0011935', 2),
 ('HP:0034745', 2),
 ('HP:0011814', 1),
 ('HP:0500181', 1)]

In [9]:
cohort.list_data_by_tx()

{'NM_001032387.2': Counter({'FRAMESHIFT_VARIANT': 9,
          'MISSENSE_VARIANT': 29,
          'STOP_GAINED': 10}),
 'NM_001032386.2': Counter({'FRAMESHIFT_VARIANT': 9,
          'MISSENSE_VARIANT': 29,
          'STOP_GAINED': 10}),
 'NM_000456.3': Counter({'FRAMESHIFT_VARIANT': 9,
          'MISSENSE_VARIANT': 29,
          'STOP_GAINED': 10})}

In [10]:
len(cohort.list_all_patients())

35

## Configure the analysis

In [11]:
from genophenocorr.analysis import configure_cohort_analysis
from genophenocorr.analysis.predicate import BooleanPredicate

analysis = configure_cohort_analysis(cohort, hpo)

## Run the analyses

Test for presence of genotype-phenotype correlations between subjects with missense variants vs. the other subjects.

In [12]:
from genophenocorr.model import VariantEffect

missense = analysis.compare_by_variant_effect(VariantEffect.MISSENSE_VARIANT, tx_id)
missense.summarize(hpo, BooleanPredicate.YES)

MISSENSE_VARIANT on NM_001032386.2,No,No,Yes,Yes,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Seizure [HP:0001250],11,31.428571,17,48.571429,0.072129,1.0
Hypouricemia [HP:0003537],4,26.666667,3,20.000000,0.118881,1.0
Cognitive regression [HP:0034332],0,0.000000,6,24.000000,0.129170,1.0
Increased urinary taurine [HP:0003166],0,0.000000,5,83.333333,0.166667,1.0
Hypotonia [HP:0001252],3,13.043478,12,52.173913,0.181896,1.0
...,...,...,...,...,...,...
Abnormal circulating nitrogen compound concentration [HP:0004364],4,57.142857,3,42.857143,1.000000,1.0
Elevated circulating S-sulfocysteine concentration [HP:0034745],0,0.000000,2,100.000000,1.000000,1.0
Abnormal circulating non-proteinogenic amino acid concentration [HP:0033109],0,0.000000,2,100.000000,1.000000,1.0
Hypertaurinemia [HP:0500181],1,50.000000,0,0.000000,1.000000,1.0


Test for presence of genotype-phenotype correlations between subjects with >=1 allele of a variant vs. the others.

In [13]:
by_variant = analysis.compare_by_variant_key('12_56004589_56004589_C_G')
by_variant.summarize(hpo, BooleanPredicate.YES)

>=1 allele of the variant 12_56004589_56004589_C_G,No,No,Yes,Yes,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Hypotonia [HP:0001252],14,60.869565,1,4.347826,0.032869,1.0
Neurodevelopmental delay [HP:0012758],8,32.000000,0,0.000000,0.129170,1.0
Abnormality of extrapyramidal motor function [HP:0002071],10,40.000000,1,4.000000,0.180435,1.0
Cognitive regression [HP:0034332],6,24.000000,0,0.000000,0.277764,1.0
Ectopia lentis [HP:0001083],5,27.777778,2,11.111111,0.528186,1.0
...,...,...,...,...,...,...
Abnormality of urinary uric acid level [HP:0012610],2,28.571429,0,0.000000,1.000000,1.0
Elevated circulating S-sulfocysteine concentration [HP:0034745],2,100.000000,0,0.000000,1.000000,1.0
Abnormal circulating non-proteinogenic amino acid concentration [HP:0033109],2,100.000000,0,0.000000,1.000000,1.0
Hypertaurinemia [HP:0500181],1,50.000000,0,0.000000,1.000000,1.0


TODO - finalize!