<H1>MAPK8IP3 genotype phenotype correlations</H1>

Ensure you install the `genophenocorr` package into your Jupyter kernel before running the notebook:

```shell
source path/to/kernel/venv/bin/activate

cd genophenocorr
python3 -m pip install --editable .
```

> The above block will install genophenocorr and its dependencies into the virtual environment.
> The installation is done in the *editable* mode, where the changes made to the code (e.g. in an IDE) are available in the notebook immediately after the kernel restart, without the need to reinstall the library.

In [1]:
import os
import sys

import glob
import pprint

import hpotk
import genophenocorr
print(f"Using genophenocorr version {genophenocorr.__version__}")

Using genophenocorr version 0.1.0


## Set resource paths

In [2]:
fpath_hpo = '/home/ielis/data/ontologies/hpo/2023-07-21/hp.2023-07-21.json'
fpath_pp_dir = 'phenopackets'

hpo = hpotk.load_minimal_ontology(fpath_hpo)

## Load phenopackets into a `Cohort`

### Configure the patient creator

In [3]:
from genophenocorr.preprocessing import configure_caching_patient_creator

patient_creator = configure_caching_patient_creator(hpo)

### Load the patients from phenopackets

In [4]:
from phenopackets import Phenopacket
from google.protobuf.json_format import Parse

patients = []
for dirpath, dirnames, filenames in os.walk(fpath_pp_dir):
    for filename in filenames:
        if filename.endswith('.json'):
            fpath_pp = os.path.join(dirpath, filename)
            print(f'Loading phenopacket from {fpath_pp}')
            with open(fpath_pp) as fh:
                pp = Phenopacket()
                Parse(fh.read(), pp)
            patient = patient_creator.create_patient(pp)
            patients.append(patient)
print(f'Loaded {len(patients)} patients')

Loading phenopacket from phenopackets/PMID_30612693_6.json
Loading phenopacket from phenopackets/PMID_31278393_6.json
Loading phenopacket from phenopackets/PMID_30612693_9.json
Loading phenopacket from phenopackets/PMID_30612693_11.json
Loading phenopacket from phenopackets/PMID_31278393_3.json
Loading phenopacket from phenopackets/PMID_30612693_3.json
Loading phenopacket from phenopackets/PMID_31278393_7.json
Loading phenopacket from phenopackets/PMID_30612693_10.json
Loading phenopacket from phenopackets/PMID_31278393_5.json
Loading phenopacket from phenopackets/PMID_31278393_1.json
Loading phenopacket from phenopackets/PMID_30612693_1.json
Loading phenopacket from phenopackets/PMID_111_probandA.json
Loading phenopacket from phenopackets/PMID_30612693_4.json
Loading phenopacket from phenopackets/PMID_30612693_2.json
Loading phenopacket from phenopackets/PMID_31278393_8.json
Loading phenopacket from phenopackets/PMID_30612693_13.json
Loading phenopacket from phenopackets/PMID_30612693

## Assemble the patients into a cohort

In [5]:
from genophenocorr.model import Cohort

cohort = Cohort.from_patients(patients)

In [6]:
cohort.list_all_patients()

['4',
 '12',
 '7',
 '9',
 '1',
 '13',
 '1',
 '5',
 '3',
 '5',
 '3',
 '11',
 '10',
 '4',
 'PMID_111_probandA',
 '8',
 '6',
 '7',
 '2',
 '2',
 '6',
 '8']