<h1>KBG Syndrome</h1>
<p>Data from <a href="https://pubmed.ncbi.nlm.nih.gov/36446582/" target="__blank">Martinez-Cayuelas E, et al. Clinical description, molecular delineation and genotype-phenotype correlation in 340 patients with KBG syndrome: addition of 67 new patients. J Med Genet. 2022 Nov 29:jmedgenet-2022-108632. PMID: 36446582.</a>.</p>

In [1]:
import os
import hpotk

In [2]:
from genophenocorr.preprocessing import configure_caching_patient_creator

In [3]:
fpath_hpo = 'hpo_data/hp.json'
cache_dir = 'annotations'
fpath_phenopackets = 'phenopackets'
tx_id = 'NM_013275.6'


In [4]:
hpo: hpotk.ontology.Ontology = hpotk.ontology.load.obographs.load_ontology(fpath_hpo)

In [5]:
pc = configure_caching_patient_creator(hpo, cache_dir = cache_dir)

In [6]:
from genophenocorr.preprocessing import load_phenopacket_folder

In [7]:
patientCohort = load_phenopacket_folder(fpath_phenopackets, pc)

In [8]:
from IPython.display import HTML, display
from genophenocorr.view import CohortViewer

viewer = CohortViewer(hpo)

In [9]:
display(HTML(viewer.cohort_summary_table(patientCohort)))

0,1
Item,Description
Total Individuals,340
Excluded Individuals,"14: Novara, 2017_P10;VanDongen2019_P9;VanDongen2019_P2;VanDongen2019_P7;Low, 2016_P7 (8);VanDongen2019_P4;VanDongen2019_P12;KBG42;VanDongen2019_P8;Parenti2016_P1;VanDongen2019_P5;Reuter2020;KBG31B;VanDongen2019_P13"
Total Unique HPO Terms,28
Total Unique Variants,326


In [10]:
display(HTML(viewer.hpo_term_counts_table(patientCohort))) ## Add Labels to output

0,1
HPO Term,Count
Abnormality of dental morphology (HP:0006482),223
Abnormality of higher mental function (HP:0011446),218
Intellectual disability (HP:0001249),192
Abnormality of the hand (HP:0001155),186
Neurodevelopmental delay (HP:0012758),174
Short stature (HP:0004322),150
Abnormal external nose morphology (HP:0010938),132
Abnormal eyebrow morphology (HP:0000534),125
Long philtrum (HP:0000343),120


In [11]:
display(HTML(viewer.variants_table(patientCohort, tx_id))) 

[WARN] could not identify a single variant for target transcript (got 0), variant 16_87886394_88066394_DEL


0,1,2,3
Variant,Effect,Count,Key
c.1903_1907del,FRAMESHIFT_VARIANT,32,16_89284634_89284639_GTGTTT_G
c.2408_2412del,FRAMESHIFT_VARIANT,10,16_89284129_89284134_CTTTTT_C
c.1381_1384del,FRAMESHIFT_VARIANT,8,16_89285157_89285161_GTTTC_G
c.2398_2401del,FRAMESHIFT_VARIANT,8,16_89284140_89284144_TTTTC_T
c.7481_7482insC,FRAMESHIFT_VARIANT,5,16_89275180_89275180_A_AG
c.6792_6793insC,FRAMESHIFT_VARIANT,5,16_89279749_89279749_C_CG
c.1977C>G,STOP_GAINED,3,16_89284565_89284565_G_C
c.4406G>A,STOP_GAINED,3,16_89282136_89282136_C_T
c.3832A>T,STOP_GAINED,3,16_89282710_89282710_T_A


In [12]:
patientCohort.list_all_patients()

['Scarano, 2013_P10',
 'Ockeloen2015_P18',
 'Parenti2021_P12',
 'Murray, 2017_P6 (3.2)',
 'Kutkowska-Kazmierczak2021_P23',
 'Kutkowska-Kazmierczak2021_P16',
 'Gnazzo, 2020_P2',
 'KBG58',
 'Ockeloen2015_P7',
 'Goldenberg2016_P27',
 'Kutkowska-Kazmierczak2021_P15',
 'Gnazzo, 2020_P11',
 'Parenti2021_P19',
 'KBG33',
 'Ockeloen2015_P10',
 'KBG10B',
 'Sirmaci2011_P5',
 'Goldenberg2016_P3',
 'KBG17',
 'Walz2015_Pf',
 'Parenti2021_P11',
 'Gnazzo, 2020_P28',
 'Khalifa, 2013_P1B',
 'VanDongen2019_P1',
 'Gnazzo, 2020_P26',
 'KBG11',
 'Gnazzo, 2020_P19',
 'Kutkowska-Kazmierczak2021_P1',
 'KBG41',
 'Gnazzo, 2020_P12',
 'Ockeloen2015_P9',
 'Murray, 2017_P2 (1.2)',
 'Gnazzo, 2020_P14',
 'Gnazzo, 2020_P27',
 'KBG38',
 'Novara, 2017_P10',
 'Goldenberg2016_P32',
 'Sirmaci2011_P2/F1? (previously published Tekin, 2004)',
 'Low, 2016_P14 (2)',
 'Murray, 2017_P12 (9.1)',
 'Murray, 2017_P9 (5.1.)',
 'Youngs2011',
 'Parenti2021_P16',
 'Goldenberg2016_P29',
 'Parenti2016_P2',
 'Murray, 2017_P1 (1.1)',
 'Gnazz

In [13]:
patientCohort.list_all_proteins()

[('NP_037407.4', 325),
 ('NP_001243111.1', 325),
 ('NP_001243112.1', 325),
 ('NP_872337.2', 45),
 ('NP_004924.1', 37),
 ('NP_001230208.1', 29),
 ('NP_001120686.1', 29),
 ('NP_777577.2', 29),
 ('NP_057293.1', 25),
 ('NP_000503.1', 25),
 ('NP_001305454.1', 25),
 ('NP_001136336.2', 25),
 ('NP_001305457.1', 25),
 ('NP_005178.4', 25),
 ('NP_787127.1', 25),
 ('NP_001281257.1', 25),
 ('NP_001305456.1', 25),
 ('NP_001025189.1', 25),
 ('NP_001305459.1', 25),
 ('NP_001305455.1', 25),
 ('NP_001305461.1', 25),
 ('NP_001073956.2', 25),
 ('NP_000476.1', 25),
 ('NP_001305453.1', 25),
 ('NP_112190.2', 25),
 ('NP_001305458.1', 25),
 ('NP_849163.1', 22),
 ('NP_001165286.1', 22),
 ('NP_001305442.1', 22),
 ('NP_001012777.1', 22),
 ('NP_001305436.1', 22),
 ('NP_001165287.1', 22),
 ('NP_001012780.1', 22),
 ('NP_840101.1', 21),
 ('NP_037410.1', 20),
 ('NP_001281269.1', 20),
 ('NP_002452.1', 20),
 ('NP_000092.2', 20),
 ('NP_653205.3', 20),
 ('NP_003110.1', 17),
 ('NP_955399.1', 17),
 ('NP_722520.2', 15),
 ('N

In [14]:
patientCohort.list_data_by_tx('NM_013275.6')

{'NM_013275.6': Counter({'FEATURE_TRUNCATION': 55,
          'CODING_SEQUENCE_VARIANT': 50,
          'FIVE_PRIME_UTR_VARIANT': 44,
          'INTRON_VARIANT': 60,
          'FRAMESHIFT_VARIANT': 170,
          'STOP_GAINED': 66,
          'STOP_LOST': 33,
          'THREE_PRIME_UTR_VARIANT': 34,
          'FEATURE_ELONGATION': 3,
          'SPLICE_ACCEPTOR_VARIANT': 4,
          'TRANSCRIPT_ABLATION': 14,
          'MISSENSE_VARIANT': 6,
          'SPLICE_DONOR_VARIANT': 2,
          'TRANSCRIPT_AMPLIFICATION': 1,
          'DOWNSTREAM_GENE_VARIANT': 1,
          'INFRAME_DELETION': 2,
          'SPLICE_REGION_VARIANT': 2})}

In [15]:
patientCohort.list_data_by_tx()

{'NM_174917.5': Counter({'STOP_LOST': 3,
          'FEATURE_TRUNCATION': 3,
          'CODING_SEQUENCE_VARIANT': 3,
          'THREE_PRIME_UTR_VARIANT': 3,
          'INTRON_VARIANT': 3,
          'TRANSCRIPT_ABLATION': 25,
          'TRANSCRIPT_AMPLIFICATION': 1,
          'FIVE_PRIME_UTR_VARIANT': 2}),
 'NM_001080487.4': Counter({'TRANSCRIPT_ABLATION': 24,
          'TRANSCRIPT_AMPLIFICATION': 1}),
 'NM_000101.4': Counter({'TRANSCRIPT_ABLATION': 19,
          'TRANSCRIPT_AMPLIFICATION': 1}),
 'NM_178310.4': Counter({'TRANSCRIPT_ABLATION': 19,
          'TRANSCRIPT_AMPLIFICATION': 1,
          'UPSTREAM_GENE_VARIANT': 1}),
 'NM_001384944.1': Counter({'FEATURE_TRUNCATION': 1,
          'CODING_SEQUENCE_VARIANT': 1,
          'FIVE_PRIME_UTR_VARIANT': 1,
          'INTRON_VARIANT': 1,
          'TRANSCRIPT_ABLATION': 8,
          'TRANSCRIPT_AMPLIFICATION': 1}),
 'NM_015144.3': Counter({'TRANSCRIPT_ABLATION': 3,
          'FEATURE_TRUNCATION': 1,
          'CODING_SEQUENCE_VARIANT': 2,


In [16]:
patientCohort.all_proteins

{ProteinMetadata(id=NP_000092.2, label=Cytochrome b-245 light chain, features=(SimpleProteinFeature(type=FeatureType.REGION, info=FeatureInfo(name=Disordered, start=134, end=195)),)),
 ProteinMetadata(id=NP_000476.1, label=Adenine phosphoribosyltransferase, features=()),
 ProteinMetadata(id=NP_000503.1, label=N-acetylgalactosamine-6-sulfatase, features=(SimpleProteinFeature(type=FeatureType.REGION, info=FeatureInfo(name=Catalytic domain, start=27, end=379)),)),
 ProteinMetadata(id=NP_000968.2, label=Large ribosomal subunit protein eL13, features=()),
 ProteinMetadata(id=NP_001012777.1, label=Cytoplasmic tRNA 2-thiolation protein 2, features=(SimpleProteinFeature(type=FeatureType.REGION, info=FeatureInfo(name=Disordered, start=1, end=24)), SimpleProteinFeature(type=FeatureType.REGION, info=FeatureInfo(name=Disordered, start=188, end=217)))),
 ProteinMetadata(id=NP_001012780.1, label=Cytoplasmic tRNA 2-thiolation protein 2, features=(SimpleProteinFeature(type=FeatureType.REGION, info=Fea

In [17]:
from genophenocorr.analysis import configure_cohort_analysis, CohortAnalysisConfiguration
from genophenocorr.analysis.predicate import BooleanPredicate, GroupingPredicate
from genophenocorr.model import VariantEffect

In [18]:
analysis = configure_cohort_analysis(patientCohort, hpo, CohortAnalysisConfiguration(
    missing_implies_excluded = True,
    pval_correction='fdr_bh',
    min_perc_patients_w_hpo=0.1,
    include_sv=True,
    recessive=False))

In [19]:
frameshift = analysis.compare_by_variant_effect(VariantEffect.FRAMESHIFT_VARIANT, tx_id)

In [20]:
summary_fs = frameshift.summarize(hpo, BooleanPredicate.YES)
summary_fs.sort_values(('','p value'))

FRAMESHIFT_VARIANT on NM_013275.6,No,No,Yes,Yes,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Phenotypic abnormality [HP:0000118],137,42.02454,124,38.03681,0.000831,0.017445
All [HP:0000001],137,42.02454,124,38.03681,0.000831,0.017445
Neurodevelopmental abnormality [HP:0012759],121,38.906752,113,36.334405,0.00233,0.032621
Abnormal nervous system physiology [HP:0012638],125,39.808917,117,37.261146,0.00454,0.038133
Abnormality of the nervous system [HP:0000707],125,39.808917,117,37.261146,0.00454,0.038133
Neurodevelopmental delay [HP:0012758],70,26.119403,58,21.641791,0.009995,0.069965
Short stature [HP:0004322],41,13.712375,26,8.695652,0.017875,0.075074
Abnormality of body height [HP:0000002],41,13.712375,26,8.695652,0.017875,0.075074
Growth abnormality [HP:0001507],41,13.712375,26,8.695652,0.017875,0.075074
Growth delay [HP:0001510],41,13.712375,26,8.695652,0.017875,0.075074


In [21]:
var_single = analysis.compare_by_variant_key('16_89284634_89284639_GTGTTT_G')

In [22]:
summary_vs = var_single.summarize(hpo, BooleanPredicate.YES)
summary_vs.sort_values(('','p value'))

>=1 allele of the variant 16_89284634_89284639_GTGTTT_G,No,No,Yes,Yes,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Phenotypic abnormality [HP:0000118],242,74.233129,19,5.828221,0.004307,0.090438
All [HP:0000001],242,74.233129,19,5.828221,0.004307,0.090438
Abnormal nervous system physiology [HP:0012638],223,71.019108,19,6.050955,0.024006,0.252063
Abnormality of the nervous system [HP:0000707],223,71.019108,19,6.050955,0.024006,0.252063
Abnormality of the ear [HP:0000598],49,17.753623,1,0.362319,0.037354,0.31377
Neurodevelopmental abnormality [HP:0012759],215,69.131833,19,6.109325,0.048989,0.342924
Abnormality of higher mental function [HP:0011446],189,61.363636,16,5.194805,0.072214,0.426679
Intellectual disability [HP:0001249],177,58.032787,15,4.918033,0.081272,0.426679
Abnormal ear morphology [HP:0031703],31,12.015504,1,0.387597,0.218309,0.707731
Behavioral abnormality [HP:0000708],63,24.901186,4,1.581028,0.241365,0.707731


In [23]:
var_double = analysis.compare_by_variant_keys('16_89284129_89284134_CTTTTT_C', '16_89284634_89284639_GTGTTT_G')

In [24]:
summary_vd = var_double.summarize(hpo, BooleanPredicate.YES)
summary_vd.sort_values(('','p value'))

>=1 allele of either variant 16_89284129_89284134_CTTTTT_C or variant 16_89284634_89284639_GTGTTT_G,First,First,Second,Second,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Short stature [HP:0004322],0,0.0,5,12.820513,0.562802,1.0
Abnormality of body height [HP:0000002],0,0.0,5,12.820513,0.562802,1.0
Growth delay [HP:0001510],0,0.0,5,12.820513,0.562802,1.0
Growth abnormality [HP:0001507],0,0.0,5,12.820513,0.562802,1.0
Behavioral abnormality [HP:0000708],2,5.714286,4,11.428571,0.635267,1.0
Neurodevelopmental abnormality [HP:0012759],5,11.904762,19,45.238095,0.720095,1.0
Abnormality of higher mental function [HP:0011446],6,14.634146,16,39.02439,0.726865,1.0
Abnormal upper lip morphology [HP:0000177],2,5.263158,5,13.157895,1.0,1.0
Abnormality of the philtrum [HP:0000288],2,5.263158,5,13.157895,1.0,1.0
Abnormality of head or neck [HP:0000152],4,9.52381,14,33.333333,1.0,1.0
