# KBG Syndrome

Data from [Martinez-Cayuelas E, et al. Clinical description, molecular delineation and genotype-phenotype correlation in 340 patients with KBG syndrome: addition of 67 new patients](https://pubmed.ncbi.nlm.nih.gov/36446582).

We investigate subjects with mutations in *ANKRD11*.

In [1]:
import genophenocorr

print(f"Using genophenocorr version {genophenocorr.__version__}")

Using genophenocorr version 0.1.1dev


## Setup

### Load HPO

We use HPO `v2023-10-09` release for this analysis.

In [2]:
import hpotk

store = hpotk.configure_ontology_store()
hpo = store.load_minimal_hpo(release='v2023-10-09')
print(f'Loaded HPO v{hpo.version}')

Loaded HPO v2023-10-09


### Load Phenopackets

We will load phenopacket JSON files located in `phenopackets` folder that is next to the notebook.

In [3]:
from genophenocorr.preprocessing import configure_caching_cohort_creator, load_phenopacket_folder

fpath_phenopackets = 'phenopackets'
cohort_creator = configure_caching_cohort_creator(hpo, timeout=20)
cohort = load_phenopacket_folder(fpath_phenopackets, cohort_creator)

Patients Created:  16%|█▋        | 54/328 [00:09<00:55,  4.96it/s]Expected a result but got an Error for variant: 16_89280752_89280752_G_T
{"error":"Could not connect to database homo_sapiens_core_111_38 as user ensro using [DBI:mysql:database=homo_sapiens_core_111_38;host=fb1-mysql-ens-rest-web.ebi.ac.uk;port=4571] as a locator:DBI connect('database=homo_sapiens_core_111_38;host=fb1-mysql-ens-rest-web.ebi.ac.uk;port=4571','ensro',...) failed: Can't connect to MySQL server on 'fb1-mysql-ens-rest-web.ebi.ac.uk' (111) at /nfs/public/ro/ensweb/live/rest/www_111/ensembl/modules/Bio/EnsEMBL/DBSQL/DBConnection.pm line 260."}
Patients Created:  24%|██▍       | 78/328 [00:29<03:01,  1.38it/s]Expected a result but got an Error for variant: 16_89284779_89284779_G_T
{"error":"Could not connect to database homo_sapiens_core_111_38 as user ensro using [DBI:mysql:database=homo_sapiens_core_111_38;host=fb1-mysql-ens-rest-web.ebi.ac.uk;port=4571] as a locator:DBI connect('database=homo_sapiens_core_11

### Pick transcript

We choose the [MANE Select](https://www.ncbi.nlm.nih.gov/nuccore/NM_013275.6) transcript for *ANKRD11*.

In [4]:
tx_id = 'NM_013275.6'

## Explore cohort

Explore the cohort to guide selection of the genotype-phenotype analysis.


In [5]:
from IPython.display import HTML, display
from genophenocorr.view import CohortViewer

viewer = CohortViewer(hpo)

In [6]:
display(HTML(viewer.cohort_summary_table(cohort)))

0,1
Item,Description
Total Individuals,328
Excluded Individuals,"125: Scarano, 2013_P12[PMID_36446582_Scarano,_2013_P12];Willemsen2010_P3[PMID_36446582_Willemsen2010_P3];KBG56[PMID_36446582_KBG56];KBG2[PMID_36446582_KBG2];KBG59[PMID_36446582_KBG59];KBG64[PMID_36446582_KBG64];KBG39[PMID_36446582_KBG39];Crippa2015_P2[PMID_36446582_Crippa2015_P2];Libianto2019[PMID_36446582_Libianto2019];Youngs2011[PMID_36446582_Youngs2011];KBG1[PMID_36446582_KBG1];Goldenberg2016_P3[PMID_36446582_Goldenberg2016_P3];Scarano, 2013_P9[PMID_36446582_Scarano,_2013_P9];KBG38[PMID_36446582_KBG38];KBG63[PMID_36446582_KBG63];KBG14[PMID_36446582_KBG14];Goldenberg2016_P32[PMID_36446582_Goldenberg2016_P32];Goldenberg2016_P15[PMID_36446582_Goldenberg2016_P15];Goldenberg2016_P10[PMID_36446582_Goldenberg2016_P10];Goldenberg2016_P20[PMID_36446582_Goldenberg2016_P20];KBG15[PMID_36446582_KBG15];Novara, 2017_P2[PMID_36446582_Novara,_2017_P2];Willemsen2010_P1[PMID_36446582_Willemsen2010_P1];Goldenberg2016_P24[PMID_36446582_Goldenberg2016_P24];Goldenberg2016_P22[PMID_36446582_Goldenberg2016_P22];KBG22[PMID_36446582_KBG22];Scarano, 2013_P7[PMID_36446582_Scarano,_2013_P7];Ockeloen2015_P19[PMID_36446582_Ockeloen2015_P19];Novara, 2017_P8[PMID_36446582_Novara,_2017_P8];Goldenberg2016_P21[PMID_36446582_Goldenberg2016_P21];Novara, 2017_P12[PMID_36446582_Novara,_2017_P12];KBG18[PMID_36446582_KBG18];Ockeloen2015_P7[PMID_36446582_Ockeloen2015_P7];Novara, 2017_P4[PMID_36446582_Novara,_2017_P4];Ockeloen2015_P20[PMID_36446582_Ockeloen2015_P20];Low2017[PMID_36446582_Low2017];Goldenberg2016_P26[PMID_36446582_Goldenberg2016_P26];Novara, 2017_P7[PMID_36446582_Novara,_2017_P7];KBG51[PMID_36446582_KBG51];Gnazzo, 2020_P29[PMID_36446582_Gnazzo,_2020_P29];Scarano, 2013_P11[PMID_36446582_Scarano,_2013_P11];Ockeloen2015_P14[PMID_36446582_Ockeloen2015_P14];Kutkowska-Kazmierczak2021_P17[PMID_36446582_Kutkowska-Kazmierczak2021_P17];Gnazzo, 2020_P1[PMID_36446582_Gnazzo,_2020_P1];KBG13[PMID_36446582_KBG13];Kutkowska-Kazmierczak2021_P19[PMID_36446582_Kutkowska-Kazmierczak2021_P19];VanDongen2019_P1[PMID_36446582_VanDongen2019_P1];Crippa2015_P3[PMID_36446582_Crippa2015_P3];Novara, 2017_P1[PMID_36446582_Novara,_2017_P1];Gnazzo, 2020_P30[PMID_36446582_Gnazzo,_2020_P30];Goldenberg2016_P39[PMID_36446582_Goldenberg2016_P39];Palumbo 2016[PMID_36446582_Palumbo_2016];Rentas2021_P1[PMID_36446582_Rentas2021_P1];KBG23[PMID_36446582_KBG23];Gnazzo, 2020_P26[PMID_36446582_Gnazzo,_2020_P26];Sacharow, 2012_P2[PMID_36446582_Sacharow,_2012_P2];Goldenberg2016_P18[PMID_36446582_Goldenberg2016_P18];Lim2014[PMID_36446582_Lim2014];Goldenberg2016_P23[PMID_36446582_Goldenberg2016_P23];Goldenberg2016_P33[PMID_36446582_Goldenberg2016_P33];Kutkowska-Kazmierczak2021_P23[PMID_36446582_Kutkowska-Kazmierczak2021_P23];Murray, 2017_P11 (8.1.)[PMID_36446582_Murray,_2017_P11_(8.1.)];Spengler, 2013[PMID_36446582_Spengler,_2013];Miyatake, 2017_P3[PMID_36446582_Miyatake,_2017_P3];KBG26[PMID_36446582_KBG26];KBG37[PMID_36446582_KBG37];Khalifa, 2013_P1A[PMID_36446582_Khalifa,_2013_P1A];Goldenberg2016_P36[PMID_36446582_Goldenberg2016_P36];Gnazzo, 2020_P31[PMID_36446582_Gnazzo,_2020_P31];Low, 2016_30 (28)[PMID_36446582_Low,_2016_30_(28)];Jin Kim, 2020_P2[PMID_36446582_Jin_Kim,_2020_P2];KBG11[PMID_36446582_KBG11];Goldenberg2016_P29[PMID_36446582_Goldenberg2016_P29];Kutkowska-Kazmierczak2021_P14[PMID_36446582_Kutkowska-Kazmierczak2021_P14];Srivastava, 2017_P1[PMID_36446582_Srivastava,_2017_P1];Kutkowska-Kazmierczak2021_P22[PMID_36446582_Kutkowska-Kazmierczak2021_P22];Gnazzo, 2020_P23[PMID_36446582_Gnazzo,_2020_P23];Goldenberg2016_P35[PMID_36446582_Goldenberg2016_P35];KBG46[PMID_36446582_KBG46];KBG25[PMID_36446582_KBG25];Goldenberg2016_P7[PMID_36446582_Goldenberg2016_P7];Kutkowska-Kazmierczak2021_P16[PMID_36446582_Kutkowska-Kazmierczak2021_P16];Scarano, 2013_P10[PMID_36446582_Scarano,_2013_P10];Goldenberg2016_P13[PMID_36446582_Goldenberg2016_P13];Kutkowska-Kazmierczak2021_P20[PMID_36446582_Kutkowska-Kazmierczak2021_P20];Goldenberg2016_P12[PMID_36446582_Goldenberg2016_P12];Sacharow, 2012_P1[PMID_36446582_Sacharow,_2012_P1];Kutkowska-Kazmierczak2021_P15[PMID_36446582_Kutkowska-Kazmierczak2021_P15];Isrie, 2012_P1[PMID_36446582_Isrie,_2012_P1];Low, 2016_P17 (10)[PMID_36446582_Low,_2016_P17_(10)];KBG58[PMID_36446582_KBG58];KBG9[PMID_36446582_KBG9];Goldenberg2016_P19[PMID_36446582_Goldenberg2016_P19];Kutkowska-Kazmierczak2021_P18[PMID_36446582_Kutkowska-Kazmierczak2021_P18];Miyatake, 2013[PMID_36446582_Miyatake,_2013];Willemsen2010_P2[PMID_36446582_Willemsen2010_P2];Gnazzo, 2020_P24[PMID_36446582_Gnazzo,_2020_P24];Crippa2015_P1[PMID_36446582_Crippa2015_P1];Novara, 2017_P11[PMID_36446582_Novara,_2017_P11];Walz2015_Pf[PMID_36446582_Walz2015_Pf];Gnazzo, 2020_P21[PMID_36446582_Gnazzo,_2020_P21];Bucerzan2020[PMID_36446582_Bucerzan2020];Parenti2021_P15[PMID_36446582_Parenti2021_P15];Goldenberg2016_P1[PMID_36446582_Goldenberg2016_P1];Behnert, 2018[PMID_36446582_Behnert,_2018];KBG30[PMID_36446582_KBG30];Novara, 2017_P5[PMID_36446582_Novara,_2017_P5];Parenti2021_P23[PMID_36446582_Parenti2021_P23];Isrie, 2012_P2[PMID_36446582_Isrie,_2012_P2];Novara, 2017_P9[PMID_36446582_Novara,_2017_P9];Low, 2016_P28 (25)[PMID_36446582_Low,_2016_P28_(25)];KBG54[PMID_36446582_KBG54];Goldenberg2016_P4[PMID_36446582_Goldenberg2016_P4];KBG21[PMID_36446582_KBG21];Ockeloen2015_P9[PMID_36446582_Ockeloen2015_P9];KBG34[PMID_36446582_KBG34];Khalifa, 2013_P1B[PMID_36446582_Khalifa,_2013_P1B];Kutkowska-Kazmierczak2021_P21[PMID_36446582_Kutkowska-Kazmierczak2021_P21];Novara, 2017_P3[PMID_36446582_Novara,_2017_P3];Gnazzo, 2020_P25[PMID_36446582_Gnazzo,_2020_P25];Willemsen2010_P4[PMID_36446582_Willemsen2010_P4];Goldenberg2016_P2[PMID_36446582_Goldenberg2016_P2];Parenti2021_P21[PMID_36446582_Parenti2021_P21];Parenti2016_P2[PMID_36446582_Parenti2016_P2];Goldenberg2016_P28[PMID_36446582_Goldenberg2016_P28]"
Total Unique HPO Terms,27
Total Unique Variants,203


In [7]:
display(HTML(viewer.hpo_term_counts_table(cohort))) ## Add Labels to output

0,1
HPO Term,Count
Macrodontia (HP:0001572),149
Abnormality of the hand (HP:0001155),129
Intellectual disability (HP:0001249),126
Global developmental delay (HP:0001263),105
Abnormal external nose morphology (HP:0010938),88
Short stature (HP:0004322),88
Thick eyebrow (HP:0000574),85
Long philtrum (HP:0000343),81
Hearing impairment (HP:0000365),56


In [8]:
display(HTML(viewer.variants_table(cohort, tx_id))) 

0,1,2,3
Variant,Effect,Count,Key
c.1903_1907del,FRAMESHIFT_VARIANT,33,16_89284634_89284639_GTGTTT_G
c.2408_2412del,FRAMESHIFT_VARIANT,10,16_89284129_89284134_CTTTTT_C
c.1381_1384del,FRAMESHIFT_VARIANT,8,16_89285157_89285161_GTTTC_G
c.2398_2401del,FRAMESHIFT_VARIANT,8,16_89284140_89284144_TTTTC_T
c.6792_6793insC,FRAMESHIFT_VARIANT,5,16_89279749_89279749_C_CG
c.7481_7482insC,FRAMESHIFT_VARIANT,5,16_89275180_89275180_A_AG
c.2175_2178del,FRAMESHIFT_VARIANT,3,16_89284363_89284367_CTTTG_C
c.2182_2183del,FRAMESHIFT_VARIANT,3,16_89284358_89284360_GAT_G
c.3224_3227del,FRAMESHIFT_VARIANT,3,16_89283314_89283318_CCTTT_C


## Configure the analysis

In [10]:
from genophenocorr.analysis import configure_cohort_analysis, CohortAnalysisConfiguration
from genophenocorr.analysis.predicate import PatientCategories

analysis_config = CohortAnalysisConfiguration()
analysis_config.missing_implies_excluded = True
analysis_config.pval_correction = 'fdr_bh'
analysis_config.min_perc_patients_w_hpo = 0.1
analysis = configure_cohort_analysis(cohort, hpo, config=analysis_config)

Test for presence of genotype-phenotype correlations between frameshift variants vs. others.

In [11]:
from genophenocorr.model import VariantEffect

frameshift = analysis.compare_by_variant_effect(VariantEffect.FRAMESHIFT_VARIANT, tx_id)
frameshift.summarize(hpo, PatientCategories.YES)

FRAMESHIFT_VARIANT on NM_013275.6,Yes,Yes,No,No,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Intellectual disability [HP:0001249],84/141,60%,42/52,81%,0.006363,0.162348
Abnormality of mental function [HP:0011446],96/141,68%,46/53,87%,0.010348,0.162348
Abnormal external nose morphology [HP:0010938],57/125,46%,31/46,67%,0.015374,0.162348
Abnormal nasal morphology [HP:0005105],57/125,46%,31/46,67%,0.015374,0.162348
Abnormality of the nose [HP:0000366],57/125,46%,31/46,67%,0.015374,0.162348
Neurodevelopmental abnormality [HP:0012759],107/142,75%,50/55,91%,0.017089,0.162348
Global developmental delay [HP:0001263],71/121,59%,34/44,77%,0.029584,0.185721
Neurodevelopmental delay [HP:0012758],71/121,59%,34/44,77%,0.029584,0.185721
Abnormal nervous system physiology [HP:0012638],113/142,80%,52/56,93%,0.032583,0.185721
Abnormality of the nervous system [HP:0000707],113/142,80%,52/56,93%,0.032583,0.185721


Test for presence of genotype-phenotype correlations between subjects with >=1 allele of a variant vs. the other subjects:


In [12]:
var_single = analysis.compare_by_variant_key('16_89284634_89284639_GTGTTT_G')
var_single.summarize(hpo, PatientCategories.YES)

>=1 allele of the variant 16_89284634_89284639_GTGTTT_G,Yes,Yes,No,No,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Neurodevelopmental abnormality [HP:0012759],21/33,64%,136/164,83%,0.01745,0.218347
Intellectual disability [HP:0001249],15/32,47%,111/161,69%,0.024371,0.218347
Abnormality of head or neck [HP:0000152],27/33,82%,154/164,94%,0.032162,0.218347
Abnormality of the head [HP:0000234],27/33,82%,154/164,94%,0.032162,0.218347
Abnormality of the face [HP:0000271],27/33,82%,154/164,94%,0.032162,0.218347
Abnormal nervous system physiology [HP:0012638],23/33,70%,142/165,86%,0.03716,0.218347
Abnormality of the nervous system [HP:0000707],23/33,70%,142/165,86%,0.03716,0.218347
Abnormality of the ear [HP:0000598],10/28,36%,78/141,55%,0.065063,0.218347
Abnormal eyebrow morphology [HP:0000534],10/30,33%,75/140,54%,0.068973,0.218347
Abnormal ocular adnexa morphology [HP:0030669],10/30,33%,75/140,54%,0.068973,0.218347


Or between subjects with one variant vs. the other variant.

In [13]:
var_double = analysis.compare_by_variant_keys('16_89284129_89284134_CTTTTT_C', '16_89284634_89284639_GTGTTT_G')
var_double.summarize(hpo, PatientCategories.YES)

>=1 allele of either variant 16_89284129_89284134_CTTTTT_C or variant 16_89284634_89284639_GTGTTT_G,First,First,Second,Second,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Short attention span [HP:0000736],0/8,0%,7/27,26%,0.165543,1.0
Recurrent maladaptive behavior [HP:5200241],0/8,0%,7/27,26%,0.165543,1.0
Hyperactivity [HP:0000752],0/8,0%,7/27,26%,0.165543,1.0
Cognitive impairment [HP:0100543],0/8,0%,7/27,26%,0.165543,1.0
Reduced attention regulation [HP:5200044],0/8,0%,7/27,26%,0.165543,1.0
Attention deficit hyperactivity disorder [HP:0007018],0/8,0%,7/27,26%,0.165543,1.0
Disinhibition [HP:0000734],0/8,0%,7/27,26%,0.165543,1.0
Growth delay [HP:0001510],5/8,62%,14/32,44%,0.441958,1.0
Short stature [HP:0004322],5/8,62%,14/32,44%,0.441958,1.0
Growth abnormality [HP:0001507],5/8,62%,14/32,44%,0.441958,1.0


TODO - finalize!