# KBG Syndrome

[KBG syndrome (KBGS)](https://omim.org/entry/148050) is caused by heterozygous mutation in the ANKRD11 gene. In this notebook, we have used
[pyphetools](https://github.com/monarch-initiative/pyphetools) to parse the clinical data included in the supplemental files of
[Martinez-Cayuelas E, et al. Clinical description, molecular delineation and genotype-phenotype correlation in 340 patients with KBG syndrome: addition of 67 new patients](https://pubmed.ncbi.nlm.nih.gov/36446582).

The authors identified a significantly higher frequency of patients with a triangular face in carriers of sequence variants compared to CNVs. Other associations found were short stature and variants in exon 9, a lower incidence of ID/ADHD/ASD in carriers of the c.1903_1907del variant and the size of the deletion, in CNV carriers, with the presence of macrodontia and hand anomalies.

In [1]:
import genophenocorr
import hpotk

store = hpotk.configure_ontology_store()
hpo = store.load_minimal_hpo(release='v2023-10-09')
print(f'Loaded HPO v{hpo.version}')
print(f"Using genophenocorr version {genophenocorr.__version__}")



Loaded HPO v2023-10-09
Using genophenocorr version 0.1.1dev


## Settings
Specify the transcript to be used to encode the variants (the phenopackets contain VCF representations of small variants).

### Pick transcript

We choose the [MANE Select](https://www.ncbi.nlm.nih.gov/nuccore/NM_013275.6) transcript for *ANKRD11*.

In [2]:
tx_id = 'NM_013275.6'

## Load Phenopackets

We will load phenopacket JSON files located in `phenopackets` folder that is next to the notebook.

In [3]:
from genophenocorr.preprocessing import configure_caching_cohort_creator, load_phenopacket_folder

fpath_phenopackets = 'phenopackets'
cohort_creator = configure_caching_cohort_creator(hpo, timeout=20)
cohort = load_phenopacket_folder(fpath_phenopackets, cohort_creator)



Patients Created: 100%|██████████| 337/337 [00:00<00:00, 454.45it/s]
Validated under none policy
337 phenopacket(s) found at `phenopackets`
  patient #0
    variants
     ·Expected a VCF record, a VRS CNV, or an expression with `hgvs.c` but had an error retrieving any from patient Novara, 2017_P2[PMID_36446582_Novara_2017_P2]. Remove variant from testing
     ·Patient PMID_36446582_Novara_2017_P2 has no variants to work with
  patient #1
    variants
     ·Expected a VCF record, a VRS CNV, or an expression with `hgvs.c` but had an error retrieving any from patient Goldenberg2016_P13[PMID_36446582_Goldenberg2016_P13]. Remove variant from testing
     ·Patient PMID_36446582_Goldenberg2016_P13 has no variants to work with
  patient #3
    variants
     ·Expected a VCF record, a VRS CNV, or an expression with `hgvs.c` but had an error retrieving any from patient Ockeloen2015_P20[PMID_36446582_Ockeloen2015_P20]. Remove variant from testing
     ·Patient PMID_36446582_Ockeloen2015_P20 has no

## Summarize the cohort

In [4]:
from IPython.display import display, HTML
from genophenocorr.view import CohortViewable

cv = CohortViewable(hpo=hpo, transcript_id=tx_id)
html = cv.process(cohort=cohort)

display(HTML(html))

HPO Term,ID,Annotation Count
Macrodontia,HP:0001572,211
Intellectual disability,HP:0001249,194
Abnormality of the hand,HP:0001155,189
Global developmental delay,HP:0001263,176
Delayed speech and language development,HP:0000750,160
Short stature,HP:0004322,150
Thick eyebrow,HP:0000574,126
Long philtrum,HP:0000343,121
Bulbous nose,HP:0000414,89
Triangular face,HP:0000325,83

Variant,Variant name,Variant Count
16_89284634_89284639_GTGTTT_G,todo,34
16_89284129_89284134_CTTTTT_C,todo,10
16_89284140_89284144_TTTTC_T,todo,9
16_89285157_89285161_GTTTC_G,todo,8
16_89275180_89275180_A_AG,todo,5
16_89279749_89279749_C_CG,todo,5
16_89284345_89284345_G_A,todo,3
16_89283314_89283318_CCTTT_C,todo,3
16_89284565_89284565_G_C,todo,3
16_89284358_89284360_GAT_G,todo,3

Disease,Annotation Count
OMIM:148050,337

Variant effect,Annotation Count
FRAMESHIFT_VARIANT,175
STOP_GAINED,67
MISSENSE_VARIANT,7
SPLICE_ACCEPTOR_VARIANT,4
SPLICE_REGION_VARIANT,2
SPLICE_DONOR_VARIANT,3
INFRAME_DELETION,2
CODING_SEQUENCE_VARIANT,1


## Configure the analysis

In [5]:
from genophenocorr.analysis import configure_cohort_analysis, CohortAnalysisConfiguration
from genophenocorr.analysis.predicate import PatientCategories

analysis_config = CohortAnalysisConfiguration()
analysis_config.missing_implies_excluded = True
analysis_config.pval_correction = 'fdr_bh'
analysis_config.min_perc_patients_w_hpo = 0.1
analysis = configure_cohort_analysis(cohort, hpo, config=analysis_config)

Test for presence of genotype-phenotype correlations between frameshift variants vs. others.

In [6]:
from genophenocorr.model import VariantEffect

frameshift = analysis.compare_by_variant_effect(VariantEffect.FRAMESHIFT_VARIANT, tx_id=tx_id)
frameshift.summarize(hpo, PatientCategories.YES)

FRAMESHIFT_VARIANT on NM_013275.6,Yes,Yes,No,No,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Abnormality of the hand [HP:0001155],95/144,66%,60/71,85%,0.005661,1.0
EEG abnormality [HP:0002353],7/33,21%,9/16,56%,0.022884,1.0
Feeding difficulties [HP:0011968],33/89,37%,26/45,58%,0.027584,1.0
Low anterior hairline [HP:0000294],40/58,69%,15/30,50%,0.105274,1.0
Intellectual disability [HP:0001249],99/119,83%,59/64,92%,0.115195,1.0
...,...,...,...,...,...,...
Long philtrum [HP:0000343],66/82,80%,38/48,79%,1.000000,1.0
Abnormal eyebrow morphology [HP:0000534],77/77,100%,39/39,100%,1.000000,1.0
Abnormality of the digestive system [HP:0025031],35/35,100%,29/29,100%,1.000000,1.0
Hyperactivity [HP:0000752],34/34,100%,19/19,100%,1.000000,1.0


In [9]:
from genophenocorr.analysis import configure_cohort_analysis, CohortAnalysisConfiguration
from genophenocorr.analysis.predicate import PatientCategories

analysis_config = CohortAnalysisConfiguration()
analysis_config.missing_implies_excluded = True
analysis_config.pval_correction = 'fdr_bh'
analysis_config.min_perc_patients_w_hpo = 0.1
analysis_config.heuristic_strategy()
analysis = configure_cohort_analysis(cohort, hpo, config=analysis_config)

In [10]:
frameshift = analysis.compare_by_variant_effect(VariantEffect.FRAMESHIFT_VARIANT, tx_id=tx_id)
frameshift.summarize(hpo, PatientCategories.YES)

FRAMESHIFT_VARIANT on NM_013275.6,Yes,Yes,No,No,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Low anterior hairline [HP:0000294],40/58,69%,15/30,50%,0.105274,1.0
Microretrognathia [HP:0000308],13/36,36%,3/16,19%,0.330525,1.0
Sleep abnormality [HP:0002360],15/40,38%,4/17,24%,0.370187,1.0
Exaggerated cupid's bow [HP:0002263],11/35,31%,2/12,17%,0.464459,1.0
Thick eyebrow [HP:0000574],68/82,83%,38/49,78%,0.494742,1.0
...,...,...,...,...,...,...
Abnormality of the forehead [HP:0000290],40/40,100%,15/15,100%,1.000000,1.0
Conductive hearing impairment [HP:0000405],18/18,100%,8/8,100%,1.000000,1.0
Functional abnormality of the middle ear [HP:0011452],18/18,100%,8/8,100%,1.000000,1.0
Abnormality of the chin [HP:0000306],8/8,100%,5/5,100%,1.000000,1.0


In [12]:
from genophenocorr.view import StatsViewable
sv = StatsViewable(filter_method_name=analysis_config.mtc_strategy, mtc_name="nana", filter_results_map=frameshift.mtc_filter_report,term_count=42)

In [13]:
display(HTML(sv.process(frameshift)))

AttributeError: 'HpoMtcReport' object has no attribute 'items'

Test for presence of genotype-phenotype correlations between subjects with >=1 allele of a variant vs. the other subjects:


In [7]:
var_single = analysis.compare_by_variant_key('16_89284634_89284639_GTGTTT_G')
var_single.summarize(hpo, PatientCategories.YES)

>=1 allele of the variant 16_89284634_89284639_GTGTTT_G,Yes,Yes,No,No,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Sleep abnormality [HP:0002360],7/9,78%,12/48,25%,0.004267,0.810716
Autistic behavior [HP:0000729],1/6,17%,39/66,59%,0.081923,1.000000
Intellectual disability [HP:0001249],14/19,74%,144/164,88%,0.147465,1.000000
Synophrys [HP:0000664],5/14,36%,48/82,59%,0.148677,1.000000
EEG abnormality [HP:0002353],0/6,0%,16/43,37%,0.158804,1.000000
...,...,...,...,...,...,...
Growth abnormality [HP:0001507],12/12,100%,108/108,100%,1.000000,1.000000
Abnormal eyebrow morphology [HP:0000534],13/13,100%,103/103,100%,1.000000,1.000000
Abnormality of the digestive system [HP:0025031],6/6,100%,58/58,100%,1.000000,1.000000
Hyperactivity [HP:0000752],5/5,100%,48/48,100%,1.000000,1.000000


Or between subjects with one variant vs. the other variant.

In [8]:
var_double = analysis.compare_by_variant_keys('16_89284129_89284134_CTTTTT_C', '16_89284634_89284639_GTGTTT_G')
var_double.summarize(hpo, PatientCategories.YES)

>=1 allele of either variant 16_89284129_89284134_CTTTTT_C or variant 16_89284634_89284639_GTGTTT_G,First,First,Second,Second,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Autistic behavior [HP:0000729],2/2,100%,1/6,17%,0.107143,1.0
Macrodontia [HP:0001572],5/9,56%,22/27,81%,0.184067,1.0
Hypertelorism [HP:0000316],3/3,100%,5/10,50%,0.230769,1.0
Cryptorchidism [HP:0000028],0/4,0%,5/12,42%,0.244505,1.0
Low anterior hairline [HP:0000294],3/3,100%,7/13,54%,0.250000,1.0
...,...,...,...,...,...,...
Long philtrum [HP:0000343],4/4,100%,12/13,92%,1.000000,1.0
Abnormal eyebrow morphology [HP:0000534],5/5,100%,13/13,100%,1.000000,1.0
Abnormality of the digestive system [HP:0025031],1/1,100%,6/6,100%,1.000000,1.0
Hyperactivity [HP:0000752],0/0,0%,5/5,100%,1.000000,1.0


TODO - finalize!