<h1>PPP2R1A</h1>

In [1]:
import genophenocorr

print(f"Using genophenocorr version {genophenocorr.__version__}")

Using genophenocorr version 0.1.1dev


## Setup

### Load HPO

We use HPO `v2023-10-09` release for this analysis.

In [2]:
import hpotk

store = hpotk.configure_ontology_store()
hpo = store.load_minimal_hpo(release='v2023-10-09')
print(f'Loaded HPO v{hpo.version}')

Loaded HPO v2023-10-09


### Load phenopackets

We'll load the phenopacket JSON files stored in the `phenopackets` folder next to the notebook.

In [3]:
from genophenocorr.preprocessing import configure_caching_cohort_creator, load_phenopacket_folder

fpath_phenopackets = 'phenopackets'
cohort_creator = configure_caching_cohort_creator(hpo)
cohort = load_phenopacket_folder(fpath_phenopackets, cohort_creator)

Patients Created: 100%|██████████| 60/60 [00:07<00:00,  7.85it/s]
Validated under none policy
60 phenopacket(s) found at `phenopackets`
  patient #0
    phenotype-features
     ·No diseases found.
  patient #1
    phenotype-features
     errors:
     Terms should not contain both present Intellectual disability, moderate [HP:0002342] and its present or excluded ancestor Intellectual disability [HP:0001249]
     ·No diseases found.
  patient #2
    phenotype-features
     ·No diseases found.
  patient #3
    phenotype-features
     ·No diseases found.
  patient #4
    phenotype-features
     errors:
     Terms should not contain both present Intellectual disability, severe [HP:0010864] and its present or excluded ancestor Intellectual disability [HP:0001249]
     ·No diseases found.
  patient #5
    phenotype-features
     errors:
     Terms should not contain both present Intellectual disability, moderate [HP:0002342] and its present or excluded ancestor Intellectual disability [HP:000

### Pick a transcript

We will use [MANE Select](https://www.ncbi.nlm.nih.gov/nuccore/NM_014225.6) transcript for *PPP2R1A*.

In [4]:
tx_id = 'NM_014225.6'

## Configure the analysis


In [5]:
from genophenocorr.analysis import configure_cohort_analysis

analysis = configure_cohort_analysis(cohort, hpo)

## Run analysis

Test for genotype-phenotype correlation between variants located in some protein region vs. the variants outside of the region.

In [6]:
from genophenocorr.model import FeatureType
from genophenocorr.analysis.predicate import PatientCategories

by_region = analysis.compare_by_protein_feature_type(FeatureType.REGION, tx_id=tx_id)
by_region.summarize(hpo, PatientCategories.YES)

Variant that affects REGION feature type on protein encoded by transcript NM_014225.6,Yes,Yes,No,No,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Hearing impairment [HP:0000365],6/27,22%,1/1,100%,0.25,1.0
Abnormal knee physiology [HP:0034670],2/2,100%,0/0,0%,1.00,1.0
Abnormality of the cardiovascular system [HP:0001626],9/9,100%,0/0,0%,1.00,1.0
Thoracolumbar scoliosis [HP:0002944],1/1,100%,0/0,0%,1.00,1.0
Gastroesophageal reflux [HP:0002020],1/1,100%,1/1,100%,1.00,1.0
...,...,...,...,...,...,...
Absent speech [HP:0001344],4/4,100%,0/0,0%,1.00,1.0
Thoracic aortic aneurysm [HP:0012727],1/1,100%,0/0,0%,1.00,1.0
Aortic aneurysm [HP:0004942],1/1,100%,0/0,0%,1.00,1.0
Autistic behavior [HP:0000729],8/8,100%,1/1,100%,1.00,1.0


TODO - finalize!