# KBG Syndrome

[KBG syndrome (KBGS)](https://omim.org/entry/148050) is caused by heterozygous mutation in the ANKRD11 gene. In this notebook, we have used
[pyphetools](https://github.com/monarch-initiative/pyphetools) to parse the clinical data included in the supplemental files of
[Martinez-Cayuelas E, et al. Clinical description, molecular delineation and genotype-phenotype correlation in 340 patients with KBG syndrome: addition of 67 new patients](https://pubmed.ncbi.nlm.nih.gov/36446582).

The authors identified a significantly higher frequency of patients with a triangular face in carriers of sequence variants compared to CNVs. Other associations found were short stature and variants in exon 9, a lower incidence of ID/ADHD/ASD in carriers of the c.1903_1907del variant and the size of the deletion, in CNV carriers, with the presence of macrodontia and hand anomalies.

In [1]:
import genophenocorr
import hpotk

store = hpotk.configure_ontology_store()
hpo = store.load_minimal_hpo(release='v2023-10-09')
print(f'Loaded HPO v{hpo.version}')
print(f"Using genophenocorr version {genophenocorr.__version__}")

Loaded HPO v2023-10-09
Using genophenocorr version 0.1.1dev


## Settings
Specify the transcript to be used to encode the variants (the phenopackets contain VCF representations of small variants).

### Pick transcript

We choose the [MANE Select](https://www.ncbi.nlm.nih.gov/nuccore/NM_013275.6) transcript for *ANKRD11*.

In [2]:
tx_id = 'NM_013275.6'

## Load Phenopackets

We will load phenopacket JSON files located in `phenopackets` folder that is next to the notebook.

In [3]:
from genophenocorr.preprocessing import configure_caching_cohort_creator, load_phenopacket_folder

fpath_phenopackets = 'phenopackets'
cohort_creator = configure_caching_cohort_creator(hpo, timeout=20)
cohort = load_phenopacket_folder(fpath_phenopackets, cohort_creator)



Patients Created: 100%|██████████| 337/337 [05:55<00:00,  1.05s/it]
Validated under none policy
337 phenopacket(s) found at `phenopackets`
  patient #0
    variants
     ·Expected a VCF record, a VRS CNV, or an expression with `hgvs.c` but had an error retrieving any from patient Novara, 2017_P11[PMID_36446582_Novara_2017_P11]. Remove variant from testing
     ·Patient PMID_36446582_Novara_2017_P11 has no variants to work with
  patient #7
    variants
     ·Expected a VCF record, a VRS CNV, or an expression with `hgvs.c` but had an error retrieving any from patient Ockeloen2015_P20[PMID_36446582_Ockeloen2015_P20]. Remove variant from testing
     ·Patient PMID_36446582_Ockeloen2015_P20 has no variants to work with
  patient #10
    variants
     ·Expected a VCF record, a VRS CNV, or an expression with `hgvs.c` but had an error retrieving any from patient Willemsen2010_P3[PMID_36446582_Willemsen2010_P3]. Remove variant from testing
     ·Patient PMID_36446582_Willemsen2010_P3 has no va

## Summarize the cohort

In [5]:
from IPython.display import display, HTML
from genophenocorr.view import CohortViewable

cv = CohortViewable(hpo=hpo, transcript_id=tx_id)
html = cv.process(cohort=cohort)

display(HTML(html))

HPO Term,ID,Annotation Count
Macrodontia,HP:0001572,170
Intellectual disability,HP:0001249,158
Abnormality of the hand,HP:0001155,155
Global developmental delay,HP:0001263,132
Delayed speech and language development,HP:0000750,121
Short stature,HP:0004322,115
Thick eyebrow,HP:0000574,106
Long philtrum,HP:0000343,104
Bulbous nose,HP:0000414,74
Triangular face,HP:0000325,68

Variant,Variant name,Variant Count
16_89284634_89284639_GTGTTT_G,todo,34
16_89284129_89284134_CTTTTT_C,todo,10
16_89284140_89284144_TTTTC_T,todo,9
16_89285157_89285161_GTTTC_G,todo,8
16_89275180_89275180_A_AG,todo,5
16_89279749_89279749_C_CG,todo,5
16_89283314_89283318_CCTTT_C,todo,3
16_89282710_89282710_T_A,todo,3
16_89282136_89282136_C_T,todo,3
16_89284358_89284360_GAT_G,todo,3

Disease,Annotation Count
OMIM:148050,257

Variant effect,Annotation Count
FRAMESHIFT_VARIANT,175
STOP_GAINED,67
SPLICE_ACCEPTOR_VARIANT,4
SPLICE_DONOR_VARIANT,3
INFRAME_DELETION,2
MISSENSE_VARIANT,7
SPLICE_REGION_VARIANT,2
CODING_SEQUENCE_VARIANT,1


## Configure the analysis

In [6]:
from genophenocorr.analysis import configure_cohort_analysis, CohortAnalysisConfiguration
from genophenocorr.analysis.predicate import PatientCategories

analysis_config = CohortAnalysisConfiguration()
analysis_config.missing_implies_excluded = True
analysis_config.pval_correction = 'fdr_bh'
analysis_config.min_perc_patients_w_hpo = 0.1
analysis = configure_cohort_analysis(cohort, hpo, config=analysis_config)

Test for presence of genotype-phenotype correlations between frameshift variants vs. others.

In [8]:
from genophenocorr.model import VariantEffect

frameshift = analysis.compare_by_variant_effect(VariantEffect.FRAMESHIFT_VARIANT, tx_id=tx_id)
frameshift.summarize(hpo, PatientCategories.YES)

FRAMESHIFT_VARIANT on NM_013275.6,Yes,Yes,No,No,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Abnormality of the hand [HP:0001155],95/144,66%,60/71,85%,0.005661,1.0
EEG abnormality [HP:0002353],7/33,21%,9/16,56%,0.022884,1.0
Feeding difficulties [HP:0011968],33/89,37%,26/45,58%,0.027584,1.0
Low anterior hairline [HP:0000294],40/58,69%,15/30,50%,0.105274,1.0
Intellectual disability [HP:0001249],99/119,83%,59/64,92%,0.115195,1.0
...,...,...,...,...,...,...
Infection-related seizure [HP:0032892],3/3,100%,2/2,100%,1.000000,1.0
Abnormality of the cardiovascular system [HP:0001626],16/16,100%,6/6,100%,1.000000,1.0
Global developmental delay [HP:0001263],86/91,95%,46/48,96%,1.000000,1.0
Abnormal ear morphology [HP:0031703],53/53,100%,25/25,100%,1.000000,1.0


Test for presence of genotype-phenotype correlations between subjects with >=1 allele of a variant vs. the other subjects:


In [9]:
var_single = analysis.compare_by_variant_key('16_89284634_89284639_GTGTTT_G')
var_single.summarize(hpo, PatientCategories.YES)

>=1 allele of the variant 16_89284634_89284639_GTGTTT_G,Yes,Yes,No,No,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Sleep abnormality [HP:0002360],7/9,78%,12/48,25%,0.004267,0.810716
Autistic behavior [HP:0000729],1/6,17%,39/66,59%,0.081923,1.000000
Intellectual disability [HP:0001249],14/19,74%,144/164,88%,0.147465,1.000000
Synophrys [HP:0000664],5/14,36%,48/82,59%,0.148677,1.000000
EEG abnormality [HP:0002353],0/6,0%,16/43,37%,0.158804,1.000000
...,...,...,...,...,...,...
Abnormal skin adnexa morphology [HP:0011138],14/14,100%,115/115,100%,1.000000,1.000000
Infection-related seizure [HP:0032892],1/1,100%,4/4,100%,1.000000,1.000000
Abnormality of the cardiovascular system [HP:0001626],5/5,100%,17/17,100%,1.000000,1.000000
Abnormal ear morphology [HP:0031703],7/7,100%,71/71,100%,1.000000,1.000000


Or between subjects with one variant vs. the other variant.

In [10]:
var_double = analysis.compare_by_variant_keys('16_89284129_89284134_CTTTTT_C', '16_89284634_89284639_GTGTTT_G')
var_double.summarize(hpo, PatientCategories.YES)

>=1 allele of either variant 16_89284129_89284134_CTTTTT_C or variant 16_89284634_89284639_GTGTTT_G,First,First,Second,Second,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Autistic behavior [HP:0000729],2/2,100%,1/6,17%,0.107143,1.0
Macrodontia [HP:0001572],5/9,56%,22/27,81%,0.184067,1.0
Hypertelorism [HP:0000316],3/3,100%,5/10,50%,0.230769,1.0
Cryptorchidism [HP:0000028],0/4,0%,5/12,42%,0.244505,1.0
Low anterior hairline [HP:0000294],3/3,100%,7/13,54%,0.250000,1.0
...,...,...,...,...,...,...
Mandibular prognathia [HP:0000303],0/0,0%,2/5,40%,1.000000,1.0
Infection-related seizure [HP:0032892],0/0,0%,1/1,100%,1.000000,1.0
Abnormality of the cardiovascular system [HP:0001626],1/1,100%,5/5,100%,1.000000,1.0
Abnormal ear morphology [HP:0031703],3/3,100%,7/7,100%,1.000000,1.0


TODO - finalize!