# KBG Syndrome

Data from [Martinez-Cayuelas E, et al. Clinical description, molecular delineation and genotype-phenotype correlation in 340 patients with KBG syndrome: addition of 67 new patients](https://pubmed.ncbi.nlm.nih.gov/36446582).

We investigate subjects with mutations in *ANKRD11*.

In [1]:
import genophenocorr

print(f"Using genophenocorr version {genophenocorr.__version__}")

Using genophenocorr version 0.1.1dev


## Setup

### Load HPO

We use HPO `v2023-10-09` release for this analysis.

In [2]:
import hpotk

fpath_hpo = 'https://github.com/obophenotype/human-phenotype-ontology/releases/download/v2023-10-09/hp.json'
hpo = hpotk.load_minimal_ontology(fpath_hpo)

### Load Phenopackets

We will load phenopacket JSON files located in `phenopackets` folder that is next to the notebook.

In [3]:
from genophenocorr.preprocessing import configure_caching_cohort_creator, load_phenopacket_folder

fpath_phenopackets = 'phenopackets'
cohort_creator = configure_caching_cohort_creator(hpo, cache_dir='temp_cache')
cohort = load_phenopacket_folder(fpath_phenopackets, cohort_creator)

Patients Created: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 328/328 [00:01<00:00, 234.42it/s]
Validated under none policy
328 phenopacket(s) found at `phenopackets`
  patient #0
    variants
     ·Expected a VCF record, a VRS CNV, or an expression with `hgvs.c` but had an error retrieving any from patient Novara, 2017_P2[PMID_36446582_Novara,_2017_P2]. Remove variant from testing
     ·Patient PMID_36446582_Novara,_2017_P2 has no variants to work with
  patient #1
    variants
     ·Expected a VCF record, a VRS CNV, or an expression with `hgvs.c` but had an error retrieving any from patient Goldenberg2016_P13[PMID_36446582_Goldenberg2016_P13]. Remove variant from testing
     ·Patient PMID_36446582_Goldenberg2016_P13 has no variants to work with
  patient #3
    variants
     ·Expected a VCF record, a VRS CNV, or an expression with `hgvs.c` but had an error retrieving any from patient Ockeloen2015_P20[PMID_3644658

### Pick transcript

We choose the [MANE Select](https://www.ncbi.nlm.nih.gov/nuccore/NM_013275.6) transcript for *ANKRD11*.

In [4]:
tx_id = 'NM_013275.6'

## Explore cohort

Explore the cohort to guide selection of the genotype-phenotype analysis.


In [5]:
from IPython.display import HTML, display
from genophenocorr.view import CohortViewer

viewer = CohortViewer(hpo)

In [6]:
display(HTML(viewer.cohort_summary_table(cohort)))

0,1
Item,Description
Total Individuals,328
Excluded Individuals,"77: Scarano, 2013_P12[PMID_36446582_Scarano,_2013_P12];Goldenberg2016_P10[PMID_36446582_Goldenberg2016_P10];Khalifa, 2013_P1B[PMID_36446582_Khalifa,_2013_P1B];Bucerzan2020[PMID_36446582_Bucerzan2020];Goldenberg2016_P33[PMID_36446582_Goldenberg2016_P33];Isrie, 2012_P1[PMID_36446582_Isrie,_2012_P1];KBG38[PMID_36446582_KBG38];Gnazzo, 2020_P29[PMID_36446582_Gnazzo,_2020_P29];Kutkowska-Kazmierczak2021_P14[PMID_36446582_Kutkowska-Kazmierczak2021_P14];Novara, 2017_P4[PMID_36446582_Novara,_2017_P4];Sacharow, 2012_P1[PMID_36446582_Sacharow,_2012_P1];Lim2014[PMID_36446582_Lim2014];Goldenberg2016_P36[PMID_36446582_Goldenberg2016_P36];Willemsen2010_P2[PMID_36446582_Willemsen2010_P2];Goldenberg2016_P22[PMID_36446582_Goldenberg2016_P22];Kutkowska-Kazmierczak2021_P17[PMID_36446582_Kutkowska-Kazmierczak2021_P17];Goldenberg2016_P32[PMID_36446582_Goldenberg2016_P32];Kutkowska-Kazmierczak2021_P23[PMID_36446582_Kutkowska-Kazmierczak2021_P23];Ockeloen2015_P20[PMID_36446582_Ockeloen2015_P20];Novara, 2017_P5[PMID_36446582_Novara,_2017_P5];KBG1[PMID_36446582_KBG1];Khalifa, 2013_P1A[PMID_36446582_Khalifa,_2013_P1A];Willemsen2010_P1[PMID_36446582_Willemsen2010_P1];Novara, 2017_P11[PMID_36446582_Novara,_2017_P11];Goldenberg2016_P20[PMID_36446582_Goldenberg2016_P20];Goldenberg2016_P2[PMID_36446582_Goldenberg2016_P2];KBG23[PMID_36446582_KBG23];Kutkowska-Kazmierczak2021_P19[PMID_36446582_Kutkowska-Kazmierczak2021_P19];KBG26[PMID_36446582_KBG26];Novara, 2017_P2[PMID_36446582_Novara,_2017_P2];KBG9[PMID_36446582_KBG9];Isrie, 2012_P2[PMID_36446582_Isrie,_2012_P2];KBG25[PMID_36446582_KBG25];Kutkowska-Kazmierczak2021_P20[PMID_36446582_Kutkowska-Kazmierczak2021_P20];Scarano, 2013_P11[PMID_36446582_Scarano,_2013_P11];Goldenberg2016_P13[PMID_36446582_Goldenberg2016_P13];Goldenberg2016_P26[PMID_36446582_Goldenberg2016_P26];Scarano, 2013_P10[PMID_36446582_Scarano,_2013_P10];Goldenberg2016_P21[PMID_36446582_Goldenberg2016_P21];KBG2[PMID_36446582_KBG2];Crippa2015_P3[PMID_36446582_Crippa2015_P3];Spengler, 2013[PMID_36446582_Spengler,_2013];Goldenberg2016_P29[PMID_36446582_Goldenberg2016_P29];Sacharow, 2012_P2[PMID_36446582_Sacharow,_2012_P2];KBG58[PMID_36446582_KBG58];Parenti2021_P23[PMID_36446582_Parenti2021_P23];Novara, 2017_P7[PMID_36446582_Novara,_2017_P7];Palumbo 2016[PMID_36446582_Palumbo_2016];Goldenberg2016_P19[PMID_36446582_Goldenberg2016_P19];Kutkowska-Kazmierczak2021_P22[PMID_36446582_Kutkowska-Kazmierczak2021_P22];Gnazzo, 2020_P31[PMID_36446582_Gnazzo,_2020_P31];Crippa2015_P1[PMID_36446582_Crippa2015_P1];Behnert, 2018[PMID_36446582_Behnert,_2018];Novara, 2017_P12[PMID_36446582_Novara,_2017_P12];Novara, 2017_P8[PMID_36446582_Novara,_2017_P8];Kutkowska-Kazmierczak2021_P15[PMID_36446582_Kutkowska-Kazmierczak2021_P15];Srivastava, 2017_P1[PMID_36446582_Srivastava,_2017_P1];Goldenberg2016_P12[PMID_36446582_Goldenberg2016_P12];Willemsen2010_P3[PMID_36446582_Willemsen2010_P3];Goldenberg2016_P3[PMID_36446582_Goldenberg2016_P3];Kutkowska-Kazmierczak2021_P16[PMID_36446582_Kutkowska-Kazmierczak2021_P16];Novara, 2017_P9[PMID_36446582_Novara,_2017_P9];Novara, 2017_P1[PMID_36446582_Novara,_2017_P1];Youngs2011[PMID_36446582_Youngs2011];KBG22[PMID_36446582_KBG22];Willemsen2010_P4[PMID_36446582_Willemsen2010_P4];Kutkowska-Kazmierczak2021_P21[PMID_36446582_Kutkowska-Kazmierczak2021_P21];Goldenberg2016_P4[PMID_36446582_Goldenberg2016_P4];Goldenberg2016_P18[PMID_36446582_Goldenberg2016_P18];Goldenberg2016_P1[PMID_36446582_Goldenberg2016_P1];Kutkowska-Kazmierczak2021_P18[PMID_36446582_Kutkowska-Kazmierczak2021_P18];Gnazzo, 2020_P30[PMID_36446582_Gnazzo,_2020_P30];Goldenberg2016_P28[PMID_36446582_Goldenberg2016_P28];Goldenberg2016_P24[PMID_36446582_Goldenberg2016_P24];Miyatake, 2013[PMID_36446582_Miyatake,_2013];Crippa2015_P2[PMID_36446582_Crippa2015_P2];Novara, 2017_P3[PMID_36446582_Novara,_2017_P3]"
Total Unique HPO Terms,27
Total Unique Variants,251


In [7]:
display(HTML(viewer.hpo_term_counts_table(cohort))) ## Add Labels to output

0,1
HPO Term,Count
Macrodontia (HP:0001572),182
Intellectual disability (HP:0001249),159
Abnormality of the hand (HP:0001155),156
Global developmental delay (HP:0001263),133
Short stature (HP:0004322),115
Abnormal external nose morphology (HP:0010938),112
Thick eyebrow (HP:0000574),105
Long philtrum (HP:0000343),103
Hearing impairment (HP:0000365),74


In [8]:
display(HTML(viewer.variants_table(cohort, tx_id))) 

0,1,2,3
Variant,Effect,Count,Key
c.1903_1907del,FRAMESHIFT_VARIANT,33,16_89284634_89284639_GTGTTT_G
c.2408_2412del,FRAMESHIFT_VARIANT,10,16_89284129_89284134_CTTTTT_C
c.2398_2401del,FRAMESHIFT_VARIANT,8,16_89284140_89284144_TTTTC_T
c.1381_1384del,FRAMESHIFT_VARIANT,8,16_89285157_89285161_GTTTC_G
c.6792_6793insC,FRAMESHIFT_VARIANT,5,16_89279749_89279749_C_CG
c.7481_7482insC,FRAMESHIFT_VARIANT,5,16_89275180_89275180_A_AG
c.2182_2183del,FRAMESHIFT_VARIANT,3,16_89284358_89284360_GAT_G
c.3832A>T,STOP_GAINED,3,16_89282710_89282710_T_A
c.3224_3227del,FRAMESHIFT_VARIANT,3,16_89283314_89283318_CCTTT_C


## Configure the analysis

In [10]:
from genophenocorr.analysis import configure_cohort_analysis, CohortAnalysisConfiguration
from genophenocorr.analysis.predicate import BooleanPredicate, GroupingPredicate

analysis_config = CohortAnalysisConfiguration.builder()\
    .missing_implies_excluded(True)\
    .pval_correction('fdr_bh')\
    .min_perc_patients_w_hpo(0.1)\
    .build()
analysis = configure_cohort_analysis(cohort, hpo, config=analysis_config)

Test for presence of genotype-phenotype correlations between frameshift variants vs. others.

In [11]:
from genophenocorr.model import VariantEffect

frameshift = analysis.compare_by_variant_effect(VariantEffect.FRAMESHIFT_VARIANT, tx_id)
frameshift.summarize(hpo, BooleanPredicate.YES)

FRAMESHIFT_VARIANT on NM_013275.6,No,No,Yes,Yes,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Intellectual disability [HP:0001249],60/238,25.210084,99/238,41.596639,0.000305,0.008746
Abnormality of mental function [HP:0011446],66/240,27.5,113/240,47.083333,0.00038,0.008746
Neurodevelopmental abnormality [HP:0012759],70/242,28.92562,127/242,52.479339,0.001123,0.017225
Abnormal nervous system physiology [HP:0012638],73/244,29.918033,135/244,55.327869,0.003385,0.02932
Abnormality of the nervous system [HP:0000707],73/244,29.918033,135/244,55.327869,0.003385,0.02932
Global developmental delay [HP:0001263],47/204,23.039216,86/204,42.156863,0.005883,0.02932
Neurodevelopmental delay [HP:0012758],47/204,23.039216,86/204,42.156863,0.005883,0.02932
Abnormal external nose morphology [HP:0010938],42/212,19.811321,70/212,33.018868,0.006374,0.02932
Abnormal nasal morphology [HP:0005105],42/212,19.811321,70/212,33.018868,0.006374,0.02932
Abnormality of the nose [HP:0000366],42/212,19.811321,70/212,33.018868,0.006374,0.02932


Test for presence of genotype-phenotype correlations between subjects with >=1 allele of a variant vs. the other subjects:


In [12]:
var_single = analysis.compare_by_variant_key('16_89284634_89284639_GTGTTT_G')
var_single.summarize(hpo, BooleanPredicate.YES)

>=1 allele of the variant 16_89284634_89284639_GTGTTT_G,No,No,Yes,Yes,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Neurodevelopmental abnormality [HP:0012759],176/242,72.727273,21/242,8.677686,0.007993,0.169079
Abnormal nervous system physiology [HP:0012638],185/244,75.819672,23/244,9.42623,0.014355,0.169079
Abnormality of the nervous system [HP:0000707],185/244,75.819672,23/244,9.42623,0.014355,0.169079
Intellectual disability [HP:0001249],144/238,60.504202,15/238,6.302521,0.014702,0.169079
Phenotypic abnormality [HP:0000118],217/251,86.454183,31/251,12.350598,0.046296,0.214567
All [HP:0000001],217/251,86.454183,31/251,12.350598,0.046296,0.214567
Abnormality of mental function [HP:0011446],160/240,66.666667,19/240,7.916667,0.048016,0.214567
Global developmental delay [HP:0001263],119/204,58.333333,14/204,6.862745,0.056706,0.214567
Neurodevelopmental delay [HP:0012758],119/204,58.333333,14/204,6.862745,0.056706,0.214567
Abnormality of the ear [HP:0000598],101/209,48.325359,10/209,4.784689,0.066,0.214567


Or between subjects with one variant vs. the other variant.

In [13]:
var_double = analysis.compare_by_variant_keys('16_89284129_89284134_CTTTTT_C', '16_89284634_89284639_GTGTTT_G')
var_double.summarize(hpo, BooleanPredicate.YES)

>=1 allele of either variant 16_89284129_89284134_CTTTTT_C or variant 16_89284634_89284639_GTGTTT_G,First,First,Second,Second,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Cognitive impairment [HP:0100543],0/35,0.0,7/35,20.0,0.165543,1.0
Short stature [HP:0004322],5/40,12.5,14/40,35.0,0.441958,1.0
Abnormality of body height [HP:0000002],5/40,12.5,14/40,35.0,0.441958,1.0
Growth delay [HP:0001510],5/40,12.5,14/40,35.0,0.441958,1.0
Growth abnormality [HP:0001507],5/40,12.5,14/40,35.0,0.441958,1.0
Hearing abnormality [HP:0000364],3/36,8.333333,7/36,19.444444,0.657626,1.0
Abnormal ear physiology [HP:0031704],3/36,8.333333,7/36,19.444444,0.657626,1.0
Abnormality of the ear [HP:0000598],4/36,11.111111,10/36,27.777778,0.68323,1.0
Abnormality of the hand [HP:0001155],8/43,18.604651,22/43,51.162791,0.696329,1.0
Abnormality of the upper limb [HP:0002817],8/43,18.604651,22/43,51.162791,0.696329,1.0


TODO - finalize!