<H1>MAPK8IP3 genotype phenotype correlations</H1>

In [1]:
import genophenocorr

print(f"Using genophenocorr version {genophenocorr.__version__}")

Using genophenocorr version 0.1.1dev


## Setup

### Load HPO

We'll need HPO for the analysis

In [2]:
import hpotk

store = hpotk.configure_ontology_store()
hpo = store.load_minimal_hpo(release='v2023-10-09')
print(f'Loaded HPO v{hpo.version}')

Loaded HPO v2023-10-09


### Load phenopackets

Load the phenopackets located in the `phenopackets` folder.

In [3]:
from genophenocorr.preprocessing import configure_caching_cohort_creator, load_phenopacket_folder

fpath_phenopackets = 'phenopackets'
cohort_creator = configure_caching_cohort_creator(hpo)
cohort = load_phenopacket_folder(fpath_phenopackets, cohort_creator)

Patients Created:   5%|▌         | 1/20 [00:06<02:05,  6.61s/it]Expected a result but got an Error for variant: 16_1706450_1706450_C_G
{"error":"Could not connect to database homo_sapiens_core_111_38 as user ensro using [DBI:mysql:database=homo_sapiens_core_111_38;host=fb1-mysql-ens-rest-web.ebi.ac.uk;port=4571] as a locator:DBI connect('database=homo_sapiens_core_111_38;host=fb1-mysql-ens-rest-web.ebi.ac.uk;port=4571','ensro',...) failed: Can't connect to MySQL server on 'fb1-mysql-ens-rest-web.ebi.ac.uk' (111) at /nfs/public/ro/ensweb/live/rest/www_111/ensembl/modules/Bio/EnsEMBL/DBSQL/DBConnection.pm line 260."}
Patients Created: 100%|██████████| 20/20 [00:09<00:00,  2.01it/s]
Validated under none policy
20 phenopacket(s) found at `phenopackets`
  patient #0
    phenotype-features
     ·No diseases found.
  patient #1
    phenotype-features
     ·No diseases found.
    variants
     ·Patient PMID_30612693_3 has an error with variant 16_1706450_1706450_C_G. Try again or remove varian

### Pick a transcript

We use the transcript `NM_001318852.2` which is the MANE transcript of the *MAPK8IP3* gene.

In [4]:
tx_id = 'NM_001318852.2'

## Explore cohort

Explore the cohort to guide selection of the genotype-phenotype analysis.

In [5]:
from genophenocorr.view import CohortViewer
from IPython.display import display, HTML

viewer = CohortViewer(hpo=hpo)

In [6]:
display(HTML(viewer.hpo_term_counts_table(cohort=cohort, min_count=2)))

0,1
HPO Term,Count
Global developmental delay (HP:0001263),13
Hypotonia (HP:0001252),8
"Intellectual disability, moderate (HP:0002342)",8
"Intellectual disability, severe (HP:0010864)",7
Delayed ability to walk (HP:0031936),6
Spastic diplegia (HP:0001264),6
Motor delay (HP:0001270),5
Thin upper lip vermilion (HP:0000219),5
Thin corpus callosum (HP:0033725),5


In [7]:
display(HTML(viewer.variants_table(cohort=cohort, preferred_transcript="NM_001318852.2", min_count=2)))

[WARN] could not identify a single variant for target transcript (got 0), variant 1_2408780_2408780_C_T


0,1,2,3
Variant,Effect,Count,Key
c.1735C>T,MISSENSE_VARIANT,6,16_1762843_1762843_C_T
c.3439C>T,MISSENSE_VARIANT,4,16_1767834_1767834_C_T
c.1334T>C,MISSENSE_VARIANT,2,16_1760409_1760409_T_C


## Configure the analysis

In [8]:

from genophenocorr.analysis import configure_cohort_analysis

analysis = configure_cohort_analysis(cohort, hpo)


## Correlation analysis for c.1735C>T

`NM_001318852.2:c.1735C>T` is the most commonly encountered variant in our cohort. In the following code, we investigate whether this variant displays significant genotype-phenotype correlations.

For the purpose of the analysis, the variant is denoted by its key: `16_1762843_1762843_C_T`.

Let's run the analysis and summarize the results.

In [9]:
from genophenocorr.analysis.predicate import PatientCategories

variant_key = '16_1762843_1762843_C_T'

by_variant = analysis.compare_by_variant_key(variant_key=variant_key)
by_variant.summarize(hpo, PatientCategories.YES)

>=1 allele of the variant 16_1762843_1762843_C_T,Yes,Yes,No,No,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Inability to walk [HP:0002540],3/3,100%,0/6,0%,0.011905,1.0
Loss of ambulation [HP:0002505],3/3,100%,0/6,0%,0.011905,1.0
Gait disturbance [HP:0001288],3/3,100%,1/6,17%,0.047619,1.0
Abnormality of movement [HP:0100022],4/4,100%,3/8,38%,0.080808,1.0
Seizure [HP:0001250],2/2,100%,2/10,20%,0.090909,1.0
...,...,...,...,...,...,...
Hypertelorism [HP:0000316],0/0,0%,1/1,100%,1.000000,1.0
Aplasia/Hypoplasia of the brainstem [HP:0007362],1/1,100%,0/0,0%,1.000000,1.0
Hearing impairment [HP:0000365],0/0,0%,1/1,100%,1.000000,1.0
Narrow mouth [HP:0000160],0/0,0%,1/1,100%,1.000000,1.0


TODO - finalize!