# KBG Syndrome

Data from [Martinez-Cayuelas E, et al. Clinical description, molecular delineation and genotype-phenotype correlation in 340 patients with KBG syndrome: addition of 67 new patients](https://pubmed.ncbi.nlm.nih.gov/36446582).

We investigate subjects with mutations in *ANKRD11*.

In [1]:
import genophenocorr

print(f"Using genophenocorr version {genophenocorr.__version__}")

Using genophenocorr version 0.1.1dev


## Setup

### Load HPO

We use HPO `v2023-10-09` release for this analysis.

In [2]:
import hpotk

fpath_hpo = 'https://github.com/obophenotype/human-phenotype-ontology/releases/download/v2023-10-09/hp.json'
hpo = hpotk.load_minimal_ontology(fpath_hpo)

### Load Phenopackets

We will load phenopacket JSON files located in `phenopackets` folder that is next to the notebook.

In [3]:
from genophenocorr.preprocessing import configure_caching_cohort_creator, load_phenopacket_folder

fpath_phenopackets = 'phenopackets'
cohort_creator = configure_caching_cohort_creator(hpo, cache_dir='temp_cache')
cohort = load_phenopacket_folder(fpath_phenopackets, cohort_creator)

Patients Created: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 328/328 [00:01<00:00, 196.20it/s]

328



Validated under none policy
328 phenopacket(s) found at `phenopackets`
  patient #0
    variants
     ·Expected a VCF record, a VRS CNV, or an expression with `hgvs.c` but had an error retrieving any from patient Novara, 2017_P2[PMID_36446582_Novara,_2017_P2]. Remove variant from testing
     ·Patient PMID_36446582_Novara,_2017_P2 has no variants to work with
  patient #1
    variants
     ·Expected a VCF record, a VRS CNV, or an expression with `hgvs.c` but had an error retrieving any from patient Goldenberg2016_P13[PMID_36446582_Goldenberg2016_P13]. Remove variant from testing
     ·Patient PMID_36446582_Goldenberg2016_P13 has no variants to work with
  patient #3
    variants
     ·Expected a VCF record, a VRS CNV, or an expression with `hgvs.c` but had an error retrieving any from patient Ockeloen2015_P20[PMID_36446582_Ockeloen2015_P20]. Remove variant from testing
     ·Patient PMID_36446582_Ockeloen2015_P20 has no variants to work with
  patient #7
    variants
     ·Expected a 

### Pick transcript

We choose the [MANE Select](https://www.ncbi.nlm.nih.gov/nuccore/NM_013275.6) transcript for *ANKRD11*.

In [4]:
tx_id = 'NM_013275.6'

## Explore cohort

Explore the cohort to guide selection of the genotype-phenotype analysis.


In [5]:
from IPython.display import HTML, display
from genophenocorr.view import CohortViewer

viewer = CohortViewer(hpo)

In [6]:
display(HTML(viewer.cohort_summary_table(cohort)))

0,1
Item,Description
Total Individuals,328
Excluded Individuals,"77: Kutkowska-Kazmierczak2021_P22[PMID_36446582_Kutkowska-Kazmierczak2021_P22];Goldenberg2016_P3[PMID_36446582_Goldenberg2016_P3];KBG22[PMID_36446582_KBG22];Scarano, 2013_P11[PMID_36446582_Scarano,_2013_P11];Gnazzo, 2020_P30[PMID_36446582_Gnazzo,_2020_P30];Sacharow, 2012_P2[PMID_36446582_Sacharow,_2012_P2];Willemsen2010_P1[PMID_36446582_Willemsen2010_P1];Sacharow, 2012_P1[PMID_36446582_Sacharow,_2012_P1];Isrie, 2012_P1[PMID_36446582_Isrie,_2012_P1];Goldenberg2016_P36[PMID_36446582_Goldenberg2016_P36];Scarano, 2013_P12[PMID_36446582_Scarano,_2013_P12];Goldenberg2016_P13[PMID_36446582_Goldenberg2016_P13];Gnazzo, 2020_P31[PMID_36446582_Gnazzo,_2020_P31];Goldenberg2016_P19[PMID_36446582_Goldenberg2016_P19];KBG58[PMID_36446582_KBG58];Khalifa, 2013_P1A[PMID_36446582_Khalifa,_2013_P1A];Willemsen2010_P2[PMID_36446582_Willemsen2010_P2];Goldenberg2016_P24[PMID_36446582_Goldenberg2016_P24];Goldenberg2016_P33[PMID_36446582_Goldenberg2016_P33];Novara, 2017_P3[PMID_36446582_Novara,_2017_P3];Bucerzan2020[PMID_36446582_Bucerzan2020];Crippa2015_P3[PMID_36446582_Crippa2015_P3];Kutkowska-Kazmierczak2021_P18[PMID_36446582_Kutkowska-Kazmierczak2021_P18];Kutkowska-Kazmierczak2021_P16[PMID_36446582_Kutkowska-Kazmierczak2021_P16];Goldenberg2016_P18[PMID_36446582_Goldenberg2016_P18];Goldenberg2016_P26[PMID_36446582_Goldenberg2016_P26];KBG2[PMID_36446582_KBG2];Crippa2015_P2[PMID_36446582_Crippa2015_P2];Crippa2015_P1[PMID_36446582_Crippa2015_P1];Youngs2011[PMID_36446582_Youngs2011];Kutkowska-Kazmierczak2021_P14[PMID_36446582_Kutkowska-Kazmierczak2021_P14];Goldenberg2016_P32[PMID_36446582_Goldenberg2016_P32];Goldenberg2016_P1[PMID_36446582_Goldenberg2016_P1];Palumbo 2016[PMID_36446582_Palumbo_2016];Isrie, 2012_P2[PMID_36446582_Isrie,_2012_P2];Kutkowska-Kazmierczak2021_P19[PMID_36446582_Kutkowska-Kazmierczak2021_P19];KBG23[PMID_36446582_KBG23];Kutkowska-Kazmierczak2021_P21[PMID_36446582_Kutkowska-Kazmierczak2021_P21];Gnazzo, 2020_P29[PMID_36446582_Gnazzo,_2020_P29];Khalifa, 2013_P1B[PMID_36446582_Khalifa,_2013_P1B];Novara, 2017_P7[PMID_36446582_Novara,_2017_P7];Novara, 2017_P11[PMID_36446582_Novara,_2017_P11];Lim2014[PMID_36446582_Lim2014];Willemsen2010_P3[PMID_36446582_Willemsen2010_P3];Novara, 2017_P9[PMID_36446582_Novara,_2017_P9];Parenti2021_P23[PMID_36446582_Parenti2021_P23];KBG9[PMID_36446582_KBG9];Kutkowska-Kazmierczak2021_P15[PMID_36446582_Kutkowska-Kazmierczak2021_P15];Goldenberg2016_P22[PMID_36446582_Goldenberg2016_P22];Novara, 2017_P8[PMID_36446582_Novara,_2017_P8];Novara, 2017_P4[PMID_36446582_Novara,_2017_P4];Spengler, 2013[PMID_36446582_Spengler,_2013];Goldenberg2016_P20[PMID_36446582_Goldenberg2016_P20];Miyatake, 2013[PMID_36446582_Miyatake,_2013];Novara, 2017_P5[PMID_36446582_Novara,_2017_P5];Kutkowska-Kazmierczak2021_P20[PMID_36446582_Kutkowska-Kazmierczak2021_P20];Scarano, 2013_P10[PMID_36446582_Scarano,_2013_P10];Novara, 2017_P12[PMID_36446582_Novara,_2017_P12];Goldenberg2016_P21[PMID_36446582_Goldenberg2016_P21];Ockeloen2015_P20[PMID_36446582_Ockeloen2015_P20];Willemsen2010_P4[PMID_36446582_Willemsen2010_P4];Goldenberg2016_P12[PMID_36446582_Goldenberg2016_P12];Behnert, 2018[PMID_36446582_Behnert,_2018];Kutkowska-Kazmierczak2021_P17[PMID_36446582_Kutkowska-Kazmierczak2021_P17];KBG38[PMID_36446582_KBG38];KBG25[PMID_36446582_KBG25];KBG26[PMID_36446582_KBG26];Goldenberg2016_P4[PMID_36446582_Goldenberg2016_P4];Novara, 2017_P1[PMID_36446582_Novara,_2017_P1];Srivastava, 2017_P1[PMID_36446582_Srivastava,_2017_P1];Goldenberg2016_P28[PMID_36446582_Goldenberg2016_P28];Goldenberg2016_P29[PMID_36446582_Goldenberg2016_P29];KBG1[PMID_36446582_KBG1];Novara, 2017_P2[PMID_36446582_Novara,_2017_P2];Goldenberg2016_P2[PMID_36446582_Goldenberg2016_P2];Goldenberg2016_P10[PMID_36446582_Goldenberg2016_P10];Kutkowska-Kazmierczak2021_P23[PMID_36446582_Kutkowska-Kazmierczak2021_P23]"
Total Unique HPO Terms,27
Total Unique Variants,251


In [7]:
display(HTML(viewer.hpo_term_counts_table(cohort))) ## Add Labels to output

0,1
HPO Term,Count
Macrodontia (HP:0001572),182
Intellectual disability (HP:0001249),159
Abnormality of the hand (HP:0001155),156
Global developmental delay (HP:0001263),133
Short stature (HP:0004322),115
Abnormal external nose morphology (HP:0010938),112
Thick eyebrow (HP:0000574),105
Long philtrum (HP:0000343),103
Hearing impairment (HP:0000365),74


In [8]:
display(HTML(viewer.variants_table(cohort, tx_id))) 

0,1,2,3
Variant,Effect,Count,Key
c.1903_1907del,FRAMESHIFT_VARIANT,33,16_89284634_89284639_GTGTTT_G
c.2408_2412del,FRAMESHIFT_VARIANT,10,16_89284129_89284134_CTTTTT_C
c.1381_1384del,FRAMESHIFT_VARIANT,8,16_89285157_89285161_GTTTC_G
c.2398_2401del,FRAMESHIFT_VARIANT,8,16_89284140_89284144_TTTTC_T
c.6792_6793insC,FRAMESHIFT_VARIANT,5,16_89279749_89279749_C_CG
c.7481_7482insC,FRAMESHIFT_VARIANT,5,16_89275180_89275180_A_AG
c.2182_2183del,FRAMESHIFT_VARIANT,3,16_89284358_89284360_GAT_G
c.7570-1G>C,SPLICE_ACCEPTOR_VARIANT,3,16_89274958_89274958_C_G
c.2175_2178del,FRAMESHIFT_VARIANT,3,16_89284363_89284367_CTTTG_C


## Configure the analysis

In [9]:
from genophenocorr.analysis import configure_cohort_analysis, CohortAnalysisConfiguration
from genophenocorr.analysis.predicate import BooleanPredicate, GroupingPredicate

analysis_config = CohortAnalysisConfiguration.builder()\
    .missing_implies_excluded(True)\
    .pval_correction('fdr_bh')\
    .min_perc_patients_w_hpo(0.1)\
    .build()
analysis = configure_cohort_analysis(cohort, hpo, analysis_config)

ValueError: Unknown protein fallback annotator type <genophenocorr.analysis._config.CohortAnalysisConfiguration object at 0x11fe91990>

Test for presence of genotype-phenotype correlations between frameshift variants vs. others.

In [None]:
from genophenocorr.model import VariantEffect

frameshift = analysis.compare_by_variant_effect(VariantEffect.FRAMESHIFT_VARIANT, tx_id)
frameshift.summarize(hpo, BooleanPredicate.YES)

Test for presence of genotype-phenotype correlations between subjects with >=1 allele of a variant vs. the other subjects:


In [None]:
var_single = analysis.compare_by_variant_key('16_89284634_89284639_GTGTTT_G')
var_single.summarize(hpo, BooleanPredicate.YES)

Or between subjects with one variant vs. the other variant.

In [None]:
var_double = analysis.compare_by_variant_keys('16_89284129_89284134_CTTTTT_C', '16_89284634_89284639_GTGTTT_G')
var_double.summarize(hpo, BooleanPredicate.YES)

TODO - finalize!