# KBG Syndrome

Data from [Martinez-Cayuelas E, et al. Clinical description, molecular delineation and genotype-phenotype correlation in 340 patients with KBG syndrome: addition of 67 new patients](https://pubmed.ncbi.nlm.nih.gov/36446582).

We investigate subjects with mutations in *ANKRD11*.

In [1]:
import genophenocorr

print(f"Using genophenocorr version {genophenocorr.__version__}")

Using genophenocorr version 0.1.1dev


## Setup

### Load HPO

We use HPO `v2023-10-09` release for this analysis.

In [2]:
import hpotk

fpath_hpo = 'https://github.com/obophenotype/human-phenotype-ontology/releases/download/v2023-10-09/hp.json'
hpo = hpotk.load_minimal_ontology(fpath_hpo)

### Load Phenopackets

We will load phenopacket JSON files located in `phenopackets` folder that is next to the notebook.

In [3]:
from genophenocorr.preprocessing import configure_caching_cohort_creator, load_phenopacket_folder

fpath_phenopackets = 'phenopackets'
cohort_creator = configure_caching_cohort_creator(hpo)
cohort = load_phenopacket_folder(fpath_phenopackets, cohort_creator)

Patients Created: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 328/328 [00:00<00:00, 565.93it/s]


328


AttributeError: 'TranscriptAnnotation' object has no attribute '_protein_id'

### Pick transcript

We choose the [MANE Select](https://www.ncbi.nlm.nih.gov/nuccore/NM_013275.6) transcript for *ANKRD11*.

In [None]:
tx_id = 'NM_013275.6'

## Explore cohort

Explore the cohort to guide selection of the genotype-phenotype analysis.


In [None]:
from IPython.display import HTML, display
from genophenocorr.view import CohortViewer

viewer = CohortViewer(hpo)

In [None]:
display(HTML(viewer.cohort_summary_table(cohort)))

In [None]:
display(HTML(viewer.hpo_term_counts_table(cohort))) ## Add Labels to output

In [None]:
display(HTML(viewer.variants_table(cohort, tx_id))) 

## Configure the analysis

In [None]:
from genophenocorr.analysis import configure_cohort_analysis, CohortAnalysisConfiguration
from genophenocorr.analysis.predicate import BooleanPredicate, GroupingPredicate

analysis_config = CohortAnalysisConfiguration.builder()\
    .missing_implies_excluded(True)\
    .pval_correction('fdr_bh')\
    .min_perc_patients_w_hpo(0.1)\
    .build()
analysis = configure_cohort_analysis(cohort, hpo, analysis_config)

Test for presence of genotype-phenotype correlations between frameshift variants vs. others.

In [None]:
from genophenocorr.model import VariantEffect

frameshift = analysis.compare_by_variant_effect(VariantEffect.FRAMESHIFT_VARIANT, tx_id)
frameshift.summarize(hpo, BooleanPredicate.YES)

Test for presence of genotype-phenotype correlations between subjects with >=1 allele of a variant vs. the other subjects:


In [None]:
var_single = analysis.compare_by_variant_key('16_89284634_89284639_GTGTTT_G')
var_single.summarize(hpo, BooleanPredicate.YES)

Or between subjects with one variant vs. the other variant.

In [None]:
var_double = analysis.compare_by_variant_keys('16_89284129_89284134_CTTTTT_C', '16_89284634_89284639_GTGTTT_G')
var_double.summarize(hpo, BooleanPredicate.YES)

TODO - finalize!