<H1>MAPK8IP3 genotype phenotype correlations</H1>

Ensure you install `genophenocorr` into your Jupyter kernel/virtual environment before running this notebook. Using `--editable` option makes the 
latest library features available upon kernel restart, no re-installation is necessary.

```shell
source your/virtual/environment/bin/activate

cd genophenocorr
python3 -m pip install --editable .
```

In [7]:
import genophenocorr
print(f"Using genophenocorr version {genophenocorr.__version__}")

Using genophenocorr version 0.1.0


## Initial setup

We'll need HPO for the analysis

In [9]:
import hpotk

fpath_hpo = '/home/ielis/data/ontologies/hpo/2023-07-21/hp.2023-07-21.json'
hpo = hpotk.load_minimal_ontology(fpath_hpo)

print(f'Loaded HPO v{hpo.version}')

Loaded HPO v2023-07-21


## Load phenopackets

### Configure the phenopacket patient creator

In [10]:
from genophenocorr.preprocessing import configure_caching_patient_creator

patient_creator = configure_caching_patient_creator(hpo)

## Parse the phenopackets

Load the phenopackets locaded in the `phenopackets` folder.

In [11]:
import os

from google.protobuf.json_format import Parse
from phenopackets import Phenopacket

fpath_pp_dir = 'phenopackets'

patients = []

for dirpath, dirnames, filenames in os.walk(fpath_pp_dir):
    for filename in filenames:
        if filename.endswith('.json'):
            fpath_pp = os.path.join(dirpath, filename)
            pp = Phenopacket()
            with open(fpath_pp) as fh:
                Parse(fh.read(), pp)
            patient = patient_creator.create_patient(pp)
            patients.append(patient)

print(f'Loaded {len(patients)} patients')

Loaded 22 patients


## Create a cohort from the patients


In [16]:
from genophenocorr.model import Cohort

cohort = Cohort.from_patients(patients)

cohort.all_patients

frozenset({Patient(patient_id:1, variants:['16_1706403_G/-'], phenotypes:[DefaultTermId(idx=2, value=HP_0001251), DefaultTermId(idx=2, value=HP_0001263), DefaultTermId(idx=2, value=HP_0002376), DefaultTermId(idx=2, value=HP_0000717), DefaultTermId(idx=2, value=HP_0001252), DefaultTermId(idx=2, value=HP_0100022), DefaultTermId(idx=2, value=HP_0100704), DefaultTermId(idx=2, value=HP_0001250)], proteins:['NP_001305781.1', 'NP_653171.1', 'NP_001035529.1', 'NP_055948.2']),
           Patient(patient_id:1, variants:['9_70598463_C/T'], phenotypes:[DefaultTermId(idx=2, value=HP_0010864), DefaultTermId(idx=2, value=HP_0031936), DefaultTermId(idx=2, value=HP_0000750), DefaultTermId(idx=2, value=HP_0000729), DefaultTermId(idx=2, value=HP_0011147), DefaultTermId(idx=2, value=HP_0001252)], proteins:['NP_001007472.2']),
           Patient(patient_id:10, variants:['16_1766768_C/G'], phenotypes:[DefaultTermId(idx=2, value=HP_0001263), DefaultTermId(idx=2, value=HP_0002342), DefaultTermId(idx=2, value=

## Run the analysis