<H1>MAPK8IP3 genotype phenotype correlations</H1>

Ensure you install `genophenocorr` into your Jupyter kernel/virtual environment before running this notebook. Using `--editable` option makes the 
latest library features available upon kernel restart, no re-installation is necessary.

```shell
source your/virtual/environment/bin/activate

cd genophenocorr
python3 -m pip install --editable .
```

In [1]:
import genophenocorr
print(f"Using genophenocorr version {genophenocorr.__version__}")

Using genophenocorr version 0.1.0


## Initial setup

We'll need HPO for the analysis

In [2]:
import hpotk
hpo_purl = "http://purl.obolibrary.org/obo/hp.json"
hpo = hpotk.load_minimal_ontology(hpo_purl)
print(f'Loaded HPO v{hpo.version}')

Loaded HPO v2023-09-01


## Load phenopackets

### Configure the phenopacket patient creator

In [3]:
from genophenocorr.preprocessing import configure_caching_patient_creator
patient_creator = configure_caching_patient_creator(hpo)

## Parse the phenopackets

Load the phenopackets located in the `phenopackets` folder.

In [4]:
import os

from google.protobuf.json_format import Parse
from phenopackets import Phenopacket

fpath_pp_dir = 'input'

patients = []

for dirpath, dirnames, filenames in os.walk(fpath_pp_dir):
    for filename in filenames:
        if filename.endswith('.json'):
            fpath_pp = os.path.join(dirpath, filename)
            pp = Phenopacket()
            with open(fpath_pp) as fh:
                Parse(fh.read(), pp)
            patient = patient_creator.create_patient(pp)
            patients.append(patient)

print(f'Loaded {len(patients)} patients')

Loaded 20 patients


## Create a cohort from the patients


In [5]:
from genophenocorr.model import Cohort
from genophenocorr.view import CohortViewer
from IPython.display import display, HTML
cohort = Cohort.from_patients(patients)
#cohort.all_patients
viewer = CohortViewer(hpo=hpo)

In [6]:
display(HTML(viewer.hpo_term_counts_table(cohort=cohort, min_count=2)))

0,1
HPO Term,Count
Global developmental delay (HP:0001263),14
"Intellectual disability, moderate (HP:0002342)",9
Hypotonia (HP:0001252),9
"Intellectual disability, severe (HP:0010864)",7
Delayed ability to walk (HP:0031936),6
Spastic diplegia (HP:0001264),6
Motor delay (HP:0001270),5
Thin upper lip vermilion (HP:0000219),5
Thin corpus callosum (HP:0033725),5


In [7]:
display(HTML(viewer.variants_table(cohort=cohort, preferred_transcript="NM_001318852.2", min_count=2)))

[WARN] could not identify a single variant for target transcript (got 0), variant 1_2408780_C/T
c.1735C>T - 6
c.3439C>T - 4
c.1334T>C - 2
1_2408780_C/T - 1
c.1201G>A - 1
c.1577G>A - 1
c.45C>G - 1
c.111C>G - 1
c.2985C>G - 1
c.65del - 1
c.79G>T - 1


0,1,2,3
Variant,Effect,Count,Key
c.1735C>T,missense_variant,6,16_1762843_C/T
c.3439C>T,missense_variant,4,16_1767834_C/T
c.1334T>C,missense_variant,2,16_1760409_T/C


<h1>Correlation analysis for c.1735C>T</h1>
<p>c.1735C>T is the most commonly encountered variant in our cohort. In the following code, we investigate whether this variant displays significant genotype-phenotype correlations</p>

In [9]:
from pprint import pprint
from genophenocorr.analysis import CohortAnalysis
from genophenocorr.constants import VariantEffect
mane_select_MAPK8IP3 = "NM_001318852.2"
cohort_analysis = CohortAnalysis(cohort, mane_select_MAPK8IP3, hpo, include_unmeasured=False)

Divide by 0 error with HPO HP:0000582, not included in this analysis.
Divide by 0 error with HPO HP:0025336, not included in this analysis.
Divide by 0 error with HPO HP:0006956, not included in this analysis.
Divide by 0 error with HPO HP:0002194, not included in this analysis.
Divide by 0 error with HPO HP:0002187, not included in this analysis.
Divide by 0 error with HPO HP:0000574, not included in this analysis.
Divide by 0 error with HPO HP:0500041, not included in this analysis.
Divide by 0 error with HPO HP:0000668, not included in this analysis.
Divide by 0 error with HPO HP:0002370, not included in this analysis.
Divide by 0 error with HPO HP:0000365, not included in this analysis.
Divide by 0 error with HPO HP:0000411, not included in this analysis.
Divide by 0 error with HPO HP:0006989, not included in this analysis.
Divide by 0 error with HPO HP:0045025, not included in this analysis.
Divide by 0 error with HPO HP:0003307, not included in this analysis.
Divide by 0 error wi

                                               With frameshift_variant  \
                                                                 Count   
HP:0000717 (Autism)                                                  1   
HP:0002342 (Intellectual disability, moderate)                       1   
HP:0001250 (Seizure)                                                 0   
HP:0034183 (Spastic triplegia)                                       0   
HP:0002505 (Loss of ambulation)                                      0   
HP:0100704 (Cerebral visual impairment)                              0   
HP:0000729 (Autistic behavior)                                       1   
HP:0002650 (Scoliosis)                                               1   
HP:0001252 (Hypotonia)                                               1   
HP:0001249 (Intellectual disability)                                 1   
HP:0001263 (Global developmental delay)                              1   
HP:0100021 (Cerebral palsy)           

In [10]:
cohort_analysis.compare_by_variant("16_1762843_C/T")

Divide by 0 error with HPO HP:0000582, not included in this analysis.
Divide by 0 error with HPO HP:0006956, not included in this analysis.
Divide by 0 error with HPO HP:0002187, not included in this analysis.
Divide by 0 error with HPO HP:0000574, not included in this analysis.
Divide by 0 error with HPO HP:0500041, not included in this analysis.
Divide by 0 error with HPO HP:0000668, not included in this analysis.
Divide by 0 error with HPO HP:0002370, not included in this analysis.
Divide by 0 error with HPO HP:0000365, not included in this analysis.
Divide by 0 error with HPO HP:0000411, not included in this analysis.
Divide by 0 error with HPO HP:0006989, not included in this analysis.
Divide by 0 error with HPO HP:0045025, not included in this analysis.
Divide by 0 error with HPO HP:0003307, not included in this analysis.
Divide by 0 error with HPO HP:0034183, not included in this analysis.
Divide by 0 error with HPO HP:0000486, not included in this analysis.
Divide by 0 error wi

Unnamed: 0_level_0,With 16_1762843_C/T,With 16_1762843_C/T,Without 16_1762843_C/T,Without 16_1762843_C/T,Unnamed: 5_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p-value
HP:0002505 (Loss of ambulation),3,100.00%,0,0.00%,0.008333
HP:0001250 (Seizure),2,100.00%,2,18.18%,0.076923
HP:0032988 (Persistent head lag),0,0.00%,2,100.00%,0.1
HP:0100021 (Cerebral palsy),1,100.00%,2,22.22%,0.3
HP:0025336 (Delayed ability to sit),1,33.33%,2,100.00%,0.4
HP:0032989 (Delayed ability to roll over),1,50.00%,1,50.00%,1.0
HP:0002059 (Cerebral atrophy),2,100.00%,2,100.00%,1.0
HP:0006970 (Periventricular leukomalacia),1,100.00%,1,100.00%,1.0
HP:0002493 (Upper motor neuron dysfunction),6,100.00%,4,100.00%,1.0
HP:0002188 (Delayed CNS myelination),2,100.00%,1,100.00%,1.0
