# KBG Syndrome

[KBG syndrome (KBGS)](https://omim.org/entry/148050) is caused by heterozygous mutation in the ANKRD11 gene. In this notebook, we have used
[pyphetools](https://github.com/monarch-initiative/pyphetools) to parse the clinical data included in the supplemental files of
[Martinez-Cayuelas E, et al. Clinical description, molecular delineation and genotype-phenotype correlation in 340 patients with KBG syndrome: addition of 67 new patients](https://pubmed.ncbi.nlm.nih.gov/36446582).

The authors identified a significantly higher frequency of patients with a triangular face in carriers of sequence variants compared to CNVs. Other associations found were short stature and variants in exon 9, a lower incidence of ID/ADHD/ASD in carriers of the c.1903_1907del variant and the size of the deletion, in CNV carriers, with the presence of macrodontia and hand anomalies.

In [1]:
import genophenocorr
import hpotk

store = hpotk.configure_ontology_store()
hpo = store.load_minimal_hpo(release='v2023-10-09')
print(f'Loaded HPO v{hpo.version}')
print(f"Using genophenocorr version {genophenocorr.__version__}")

Loaded HPO v2023-10-09
Using genophenocorr version 0.1.1dev


## Settings
Specify the transcript to be used to encode the variants (the phenopackets contain VCF representations of small variants).

### Pick transcript

We choose the [MANE Select](https://www.ncbi.nlm.nih.gov/nuccore/NM_013275.6) transcript for *ANKRD11*.

In [2]:
tx_id = 'NM_013275.6'

## Load Phenopackets

We will load phenopacket JSON files located in `phenopackets` folder that is next to the notebook.

In [3]:
from genophenocorr.preprocessing import configure_caching_cohort_creator, load_phenopacket_folder

fpath_phenopackets = 'phenopackets'
cohort_creator = configure_caching_cohort_creator(hpo, timeout=20)
cohort = load_phenopacket_folder(fpath_phenopackets, cohort_creator)



Patients Created: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 337/337 [00:01<00:00, 216.15it/s]
Validated under none policy
337 phenopacket(s) found at `phenopackets`
  patient #0
    variants
     ·Expected a VCF record, a VRS CNV, or an expression with `hgvs.c` but had an error retrieving any from patient Novara, 2017_P2[PMID_36446582_Novara_2017_P2]. Remove variant from testing
     ·Patient PMID_36446582_Novara_2017_P2 has no variants to work with
  patient #1
    variants
     ·Expected a VCF record, a VRS CNV, or an expression with `hgvs.c` but had an error retrieving any from patient Goldenberg2016_P13[PMID_36446582_Goldenberg2016_P13]. Remove variant from testing
     ·Patient PMID_36446582_Goldenberg2016_P13 has no variants to work with
  patient #3
    variants
     ·Expected a VCF record, a VRS CNV, or an expression with `hgvs.c` but had an error retrieving any from patient Ockeloen2015_P20[PMID_36446582

## Summarize the cohort

In [4]:
from IPython.display import display, HTML
from genophenocorr.view import CohortViewable

cv = CohortViewable(hpo=hpo, transcript_id=tx_id)
html = cv.process(cohort=cohort, transcript_id=tx_id)

display(HTML(html))

HPO Term,ID,Annotation Count
Macrodontia,HP:0001572,211
Intellectual disability,HP:0001249,194
Abnormality of the hand,HP:0001155,189
Global developmental delay,HP:0001263,176
Delayed speech and language development,HP:0000750,160
Short stature,HP:0004322,150
Thick eyebrow,HP:0000574,126
Long philtrum,HP:0000343,121
Bulbous nose,HP:0000414,89
Triangular face,HP:0000325,83

Variant,Variant name,Variant Count
16_89284634_89284639_GTGTTT_G,c.1903_1907del,34
16_89284129_89284134_CTTTTT_C,c.2408_2412del,10
16_89284140_89284144_TTTTC_T,c.2398_2401del,9
16_89285157_89285161_GTTTC_G,c.1381_1384del,8
16_89279749_89279749_C_CG,c.6792dup,5
16_89275180_89275180_A_AG,c.7481dup,5
16_89283314_89283318_CCTTT_C,c.3224_3227del,3
16_89284358_89284360_GAT_G,c.2182_2183del,3
16_89282710_89282710_T_A,c.3832A>T,3
16_89282136_89282136_C_T,c.4406G>A,3

Disease,Annotation Count
OMIM:148050,337

Variant effect,Annotation Count
FRAMESHIFT_VARIANT,175
STOP_GAINED,67
MISSENSE_VARIANT,7
SPLICE_ACCEPTOR_VARIANT,4
SPLICE_DONOR_VARIANT,3
CODING_SEQUENCE_VARIANT,1
INFRAME_DELETION,2
SPLICE_REGION_VARIANT,2


## Configure the analysis

In [5]:
from genophenocorr.analysis import configure_cohort_analysis, CohortAnalysisConfiguration
from genophenocorr.analysis.predicate import PatientCategories

analysis_config = CohortAnalysisConfiguration()
analysis_config.missing_implies_excluded = True
analysis_config.pval_correction = 'fdr_bh'
analysis_config.min_perc_patients_w_hpo = 0.1
analysis = configure_cohort_analysis(cohort, hpo, config=analysis_config)

Test for presence of genotype-phenotype correlations between frameshift variants vs. others.

In [6]:
from genophenocorr.model import VariantEffect

frameshift = analysis.compare_by_variant_effect(VariantEffect.FRAMESHIFT_VARIANT, tx_id=tx_id)
frameshift.summarize(hpo, PatientCategories.YES)

FRAMESHIFT_VARIANT on NM_013275.6,Yes,Yes,No,No,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Abnormality of the hand [HP:0001155],95/144,66%,60/71,85%,0.005661,1.0
EEG abnormality [HP:0002353],7/33,21%,9/16,56%,0.022884,1.0
Feeding difficulties [HP:0011968],33/89,37%,26/45,58%,0.027584,1.0
Low anterior hairline [HP:0000294],40/58,69%,15/30,50%,0.105274,1.0
Intellectual disability [HP:0001249],99/119,83%,59/64,92%,0.115195,1.0
...,...,...,...,...,...,...
Abnormality of the head [HP:0000234],153/153,100%,74/74,100%,1.000000,1.0
Aplasia/hypoplasia affecting bones of the axial skeleton [HP:0009122],13/13,100%,3/3,100%,1.000000,1.0
Abnormal cardiovascular system physiology [HP:0011025],1/1,100%,2/2,100%,1.000000,1.0
Non-motor seizure [HP:0033259],1/1,100%,2/2,100%,1.000000,1.0


In [7]:
from genophenocorr.analysis import configure_cohort_analysis, CohortAnalysisConfiguration
from genophenocorr.analysis.predicate import PatientCategories

analysis_config = CohortAnalysisConfiguration()
analysis_config.missing_implies_excluded = True
analysis_config.pval_correction = 'fdr_bh'
analysis_config.min_perc_patients_w_hpo = 0.1
analysis_config.heuristic_strategy()
analysis = configure_cohort_analysis(cohort, hpo, config=analysis_config)

In [8]:
frameshift = analysis.compare_by_variant_effect(VariantEffect.FRAMESHIFT_VARIANT, tx_id=tx_id)
frameshift.summarize(hpo, PatientCategories.YES)

FRAMESHIFT_VARIANT on NM_013275.6,Yes,Yes,No,No,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Abnormality of the hand [HP:0001155],95/144,66%,60/71,85%,0.005661,0.209453
EEG abnormality [HP:0002353],7/33,21%,9/16,56%,0.022884,0.340205
Feeding difficulties [HP:0011968],33/89,37%,26/45,58%,0.027584,0.340205
Low anterior hairline [HP:0000294],40/58,69%,15/30,50%,0.105274,0.794281
Intellectual disability [HP:0001249],99/119,83%,59/64,92%,0.115195,0.794281
Delayed skeletal maturation [HP:0002750],37/86,43%,17/28,61%,0.128802,0.794281
Inguinal hernia [HP:0000023],2/31,6%,4/17,24%,0.166824,0.881785
Intrauterine growth retardation [HP:0001511],6/39,15%,7/23,30%,0.202568,0.935054
Bulbous nose [HP:0000414],45/71,63%,29/39,74%,0.291265,0.935054
Hypertelorism [HP:0000316],34/63,54%,25/38,66%,0.29933,0.935054


In [9]:
from genophenocorr.view import StatsViewable
sv = StatsViewable(frameshift.mtc_filter_report)

#(filter_method_name=analysis_config.mtc_strategy, mtc_name="nana", filter_results_map=frameshift.mtc_filter_report,term_count=42)

In [10]:
display(HTML(sv.process(frameshift)))

Skipped,Count
Skipping top level term,12
Skipping term with only 3 observations (not powered for 2x2),8
Skipping term with only 6 observations (not powered for 2x2),5
Skipping non phenotype term,2
Skipping term with only 5 observations (not powered for 2x2),2
Skipping term HP:0012759 because all genotypes have same HPO observed proportions,1
Skipping term HP:0012638 because all genotypes have same HPO observed proportions,1
Skipping term HP:0011446 because all genotypes have same HPO observed proportions,1
Skipping term HP:0012758 because all genotypes have same HPO observed proportions,1
Skipping term HP:0001999 because all genotypes have same HPO observed proportions,1


Test for presence of genotype-phenotype correlations between subjects with >=1 allele of a variant vs. the other subjects:


In [11]:
var_single = analysis.compare_by_variant_key('16_89284634_89284639_GTGTTT_G')
var_single.summarize(hpo, PatientCategories.YES)

>=1 allele of the variant 16_89284634_89284639_GTGTTT_G,Yes,Yes,No,No,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Sleep abnormality [HP:0002360],7/9,78%,12/48,25%,0.004267,0.153609
Autistic behavior [HP:0000729],1/6,17%,39/66,59%,0.081923,1.0
Intellectual disability [HP:0001249],14/19,74%,144/164,88%,0.147465,1.0
Synophrys [HP:0000664],5/14,36%,48/82,59%,0.148677,1.0
EEG abnormality [HP:0002353],0/6,0%,16/43,37%,0.158804,1.0
Low-set ears [HP:0000369],5/6,83%,16/35,46%,0.183611,1.0
Motor delay [HP:0001270],6/11,55%,49/66,74%,0.277389,1.0
Abnormality of the hand [HP:0001155],22/27,81%,133/188,71%,0.358444,1.0
Recurrent otitis media [HP:0000403],2/7,29%,22/46,48%,0.436242,1.0
Long philtrum [HP:0000343],12/13,92%,92/117,79%,0.463475,1.0


Or between subjects with one variant vs. the other variant.

In [12]:
var_double = analysis.compare_by_variant_keys('16_89284129_89284134_CTTTTT_C', '16_89284634_89284639_GTGTTT_G')
var_double.summarize(hpo, PatientCategories.YES)

KeyError: PatientCategory(cat_id=0, name=No, description=The patient does not belong to the group.)

TODO - finalize!