<h1>KBG Syndrome</h1>
<p>Data from <a href="https://pubmed.ncbi.nlm.nih.gov/36446582/" target="__blank">Martinez-Cayuelas E, et al. Clinical description, molecular delineation and genotype-phenotype correlation in 340 patients with KBG syndrome: addition of 67 new patients. J Med Genet. 2022 Nov 29:jmedgenet-2022-108632. PMID: 36446582.</a>.</p>

In [1]:
import os
import hpotk

In [2]:
from genophenocorr.preprocessing import configure_caching_patient_creator

In [3]:
fpath_hpo = 'hpo_data/hp.json'
cache_dir = 'annotations'
fpath_phenopackets = 'phenopackets'
tx_id = 'NM_013275.6'


In [4]:
hpo: hpotk.ontology.Ontology = hpotk.ontology.load.obographs.load_ontology(fpath_hpo)

In [5]:
pc = configure_caching_patient_creator(hpo, cache_dir = cache_dir)

In [6]:
from genophenocorr.preprocessing import load_phenopacket_folder

In [7]:
patientCohort = load_phenopacket_folder(fpath_phenopackets, pc)

Expected at least one variant per patient, but received none for patient Parenti2016_P1
Expected at least one variant per patient, but received none for patient Low, 2016_P7 (8)
Expected at least one variant per patient, but received none for patient KBG42


In [8]:
from IPython.display import HTML, display
from genophenocorr.view import CohortViewer

viewer = CohortViewer(hpo)

In [9]:
display(HTML(viewer.cohort_summary_table(patientCohort)))

0,1
Item,Description
Total Individuals,340
Excluded Individuals,"11: Reuter2020;VanDongen2019_P2;VanDongen2019_P7;VanDongen2019_P9;VanDongen2019_P12;VanDongen2019_P5;VanDongen2019_P4;Novara, 2017_P10;KBG31B;VanDongen2019_P8;VanDongen2019_P13"
Total Unique HPO Terms,28
Total Unique Variants,326


In [10]:
display(HTML(viewer.hpo_term_counts_table(patientCohort))) ## Add Labels to output

0,1
HPO Term,Count
Abnormality of dental morphology (HP:0006482),224
Abnormality of higher mental function (HP:0011446),220
Intellectual disability (HP:0001249),194
Abnormality of the hand (HP:0001155),189
Neurodevelopmental delay (HP:0012758),176
Short stature (HP:0004322),150
Abnormal external nose morphology (HP:0010938),134
Abnormal eyebrow morphology (HP:0000534),126
Long philtrum (HP:0000343),121


In [11]:
display(HTML(viewer.variants_table(patientCohort, tx_id))) 

[WARN] could not identify a single variant for target transcript (got 0), variant 16_87886395_88066394_DEL


0,1,2,3
Variant,Effect,Count,Key
c.1903_1907del,FRAMESHIFT_VARIANT,32,16_89284634_89284639_GTGTTT_G
c.2408_2412del,FRAMESHIFT_VARIANT,10,16_89284129_89284134_CTTTTT_C
c.1381_1384del,FRAMESHIFT_VARIANT,8,16_89285157_89285161_GTTTC_G
c.2398_2401del,FRAMESHIFT_VARIANT,8,16_89284140_89284144_TTTTC_T
c.7481_7482insC,FRAMESHIFT_VARIANT,5,16_89275180_89275181_A_AG
c.6792_6793insC,FRAMESHIFT_VARIANT,5,16_89279749_89279750_C_CG
c.2175_2178del,FRAMESHIFT_VARIANT,3,16_89284363_89284367_CTTTG_C
c.4406G>A,STOP_GAINED,3,16_89282136_89282136_C_T
c.2197C>T,STOP_GAINED,3,16_89284345_89284345_G_A


In [12]:
patientCohort.list_all_patients()

['Willemsen2010_P2',
 'Gnazzo, 2020_P2',
 'Bucerzan2020',
 'Gnazzo, 2020_P20',
 'KBG38',
 'Willemsen2010_P3',
 'Ockeloen2015_P20',
 'Parenti2021_P12',
 'Goldenberg2016_P24',
 'Parenti2021_P9',
 'Kutkowska-Kazmierczak2021_P2',
 'KBG11',
 'KBG9',
 'KBG23',
 'Low, 2016_P15 (3)',
 'Murray, 2017_P1 (1.1)',
 'Novara, 2017_P5',
 'Crippa2015_P3',
 'Gnazzo, 2020_P8',
 'Parenti2021_P8',
 'Goldenberg2016_P9',
 'Goldenberg2016_P26',
 'Kutkowska-Kazmierczak2021_P19',
 'Parenti2021_P3',
 'Parenti2021_P19',
 'KBG40',
 'Kutkowska-Kazmierczak2021_P7',
 'Jin Kim, 2020_P1',
 'Reuter2020',
 'Parenti2021_P1',
 'KBG8A',
 'Goldenberg2016_P14',
 'Murray, 2017_P6 (3.2)',
 'Murray, 2017_P4 (2.1)',
 'VanDongen2019_P3',
 'KBG62',
 'Goldenberg2016_P8',
 'Gnazzo, 2020_P25',
 'Scarano, 2013_P8',
 'KBG51',
 'Parenti2021_P23',
 'KBG10A',
 'Gnazzo, 2020_P1',
 'Ockeloen2015_P5',
 'Kutkowska-Kazmierczak2021_P18',
 'Khalifa, 2013_P1B',
 'Novara, 2017_P10',
 'Low, 2016_P22 (16)',
 'Parenti2016_P1',
 'Ockeloen2015_P8',
 'Sc

In [13]:
patientCohort.list_all_proteins()

[('NP_037407.4', 325),
 ('NP_001243111.1', 325),
 ('NP_001243112.1', 325),
 ('NP_872337.2', 45),
 ('NP_004924.1', 37),
 ('NP_777577.2', 29),
 ('NP_001120686.1', 29),
 ('NP_001230208.1', 29),
 ('NP_112190.2', 25),
 ('NP_005178.4', 25),
 ('NP_000476.1', 25),
 ('NP_001305458.1', 25),
 ('NP_000503.1', 25),
 ('NP_001305453.1', 25),
 ('NP_001305461.1', 25),
 ('NP_001136336.2', 25),
 ('NP_001305455.1', 25),
 ('NP_001073956.2', 25),
 ('NP_001305456.1', 25),
 ('NP_001281257.1', 25),
 ('NP_001305459.1', 25),
 ('NP_787127.1', 25),
 ('NP_057293.1', 25),
 ('NP_001305457.1', 25),
 ('NP_001305454.1', 25),
 ('NP_001025189.1', 25),
 ('NP_001305436.1', 22),
 ('NP_001165286.1', 22),
 ('NP_001012780.1', 22),
 ('NP_001305442.1', 22),
 ('NP_001012777.1', 22),
 ('NP_849163.1', 22),
 ('NP_001165287.1', 22),
 ('NP_840101.1', 21),
 ('NP_000092.2', 20),
 ('NP_001281269.1', 20),
 ('NP_653205.3', 20),
 ('NP_037410.1', 20),
 ('NP_955399.1', 17),
 ('NP_003110.1', 17),
 ('NP_722520.2', 15),
 ('NP_001167011.1', 10),
 

In [14]:
patientCohort.list_data_by_tx('NM_013275.6')

{'NM_013275.6': Counter({'TRANSCRIPT_ABLATION': 14,
          'FRAMESHIFT_VARIANT': 171,
          'STOP_GAINED': 65,
          'FEATURE_ELONGATION': 3,
          'CODING_SEQUENCE_VARIANT': 50,
          'FIVE_PRIME_UTR_VARIANT': 44,
          'INTRON_VARIANT': 60,
          'STOP_LOST': 33,
          'FEATURE_TRUNCATION': 55,
          'THREE_PRIME_UTR_VARIANT': 34,
          'MISSENSE_VARIANT': 6,
          'SPLICE_ACCEPTOR_VARIANT': 4,
          'SPLICE_DONOR_VARIANT': 2,
          'DOWNSTREAM_GENE_VARIANT': 1,
          'SPLICE_REGION_VARIANT': 2,
          'INFRAME_DELETION': 2,
          'TRANSCRIPT_AMPLIFICATION': 1})}

In [15]:
patientCohort.list_data_by_tx()

{'NM_178841.4': Counter({'TRANSCRIPT_ABLATION': 20,
          'STOP_LOST': 1,
          'FEATURE_TRUNCATION': 1,
          'CODING_SEQUENCE_VARIANT': 1,
          'FIVE_PRIME_UTR_VARIANT': 1,
          'THREE_PRIME_UTR_VARIANT': 1,
          'INTRON_VARIANT': 1,
          'TRANSCRIPT_AMPLIFICATION': 1}),
 'NM_001384764.1': Counter({'TRANSCRIPT_ABLATION': 35,
          'TRANSCRIPT_AMPLIFICATION': 2}),
 'NM_001351937.2': Counter({'TRANSCRIPT_ABLATION': 5,
          'UPSTREAM_GENE_VARIANT': 1,
          'TRANSCRIPT_AMPLIFICATION': 1}),
 'NM_001384939.1': Counter({'TRANSCRIPT_ABLATION': 8,
          'FEATURE_TRUNCATION': 1,
          'CODING_SEQUENCE_VARIANT': 1,
          'FIVE_PRIME_UTR_VARIANT': 1,
          'INTRON_VARIANT': 1,
          'TRANSCRIPT_AMPLIFICATION': 1}),
 'NM_001318529.2': Counter({'TRANSCRIPT_ABLATION': 24,
          'TRANSCRIPT_AMPLIFICATION': 1}),
 'NM_001351938.2': Counter({'TRANSCRIPT_ABLATION': 5,
          'UPSTREAM_GENE_VARIANT': 1,
          'TRANSCRIPT_AMPLIFI

In [16]:
patientCohort.all_proteins

{ProteinMetadata(id=NP_000092.2, label=Cytochrome b-245 light chain, features=(SimpleProteinFeature(type=FeatureType.REGION, info=FeatureInfo(name=Disordered, start=134, end=195)),)),
 ProteinMetadata(id=NP_000476.1, label=Adenine phosphoribosyltransferase, features=()),
 ProteinMetadata(id=NP_000503.1, label=N-acetylgalactosamine-6-sulfatase, features=(SimpleProteinFeature(type=FeatureType.REGION, info=FeatureInfo(name=Catalytic domain, start=27, end=379)),)),
 ProteinMetadata(id=NP_000968.2, label=Large ribosomal subunit protein eL13, features=()),
 ProteinMetadata(id=NP_001012777.1, label=Cytoplasmic tRNA 2-thiolation protein 2, features=(SimpleProteinFeature(type=FeatureType.REGION, info=FeatureInfo(name=Disordered, start=1, end=24)), SimpleProteinFeature(type=FeatureType.REGION, info=FeatureInfo(name=Disordered, start=188, end=217)))),
 ProteinMetadata(id=NP_001012780.1, label=Cytoplasmic tRNA 2-thiolation protein 2, features=(SimpleProteinFeature(type=FeatureType.REGION, info=Fea

In [17]:
from genophenocorr.analysis import CohortAnalysis

In [18]:
analysis = CohortAnalysis(patientCohort, 'NM_013275.6', hpo, include_unmeasured=False)

In [19]:
from genophenocorr.model import VariantEffect

In [20]:
VariantEffect.FRAMESHIFT_VARIANT.value

'SO:0001589'

In [21]:
analysis.compare_by_variant_type(VariantEffect.FRAMESHIFT_VARIANT)

Unnamed: 0_level_0,With frameshift_variant,With frameshift_variant,Without frameshift_variant,Without frameshift_variant,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p-value,Corrected p-values
HP:0011446 (Abnormality of higher mental function),114,82.61%,106,94.64%,0.003347,0.046856
HP:0001249 (Intellectual disability),100,71.43%,94,86.24%,0.005575,0.078048
HP:0007018 (Attention deficit hyperactivity disorder),35,81.40%,26,66.67%,0.13903,1.0
HP:0000325 (Triangular face),45,71.43%,38,58.46%,0.141518,1.0
HP:0001155 (Abnormality of the hand),100,67.11%,89,72.36%,0.35836,1.0
HP:0012758 (Neurodevelopmental delay),86,94.51%,90,96.77%,0.494487,1.0
HP:0006482 (Abnormality of dental morphology),124,85.52%,100,81.97%,0.504534,1.0
HP:0000365 (Hearing impairment),52,80.00%,45,76.27%,0.666875,1.0
HP:0010938 (Abnormal external nose morphology),71,89.87%,63,92.65%,0.772039,1.0
HP:0000729 (Autistic behavior),27,56.25%,29,60.42%,0.836156,1.0


In [22]:
analysis.compare_by_variant('16_89284634_89284639_GTGTTT_G')

Unnamed: 0_level_0,With 16_89284634_89284639_GTGTTT_G,With 16_89284634_89284639_GTGTTT_G,Without 16_89284634_89284639_GTGTTT_G,Without 16_89284634_89284639_GTGTTT_G,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p-value,Corrected p-values
HP:0001249 (Intellectual disability),15,57.69%,179,80.27%,0.012939,0.18115
HP:0011446 (Abnormality of higher mental function),19,73.08%,201,89.73%,0.022575,0.316053
HP:0010938 (Abnormal external nose morphology),13,81.25%,121,92.37%,0.153085,1.0
HP:0007018 (Attention deficit hyperactivity disorder),7,100.00%,54,72.00%,0.182109,1.0
HP:0000534 (Abnormal eyebrow morphology),10,71.43%,116,82.27%,0.299278,1.0
HP:0001155 (Abnormality of the hand),21,77.78%,168,68.57%,0.384609,1.0
HP:0000729 (Autistic behavior),3,42.86%,53,59.55%,0.445576,1.0
HP:0000365 (Hearing impairment),7,70.00%,90,78.95%,0.452661,1.0
HP:0012758 (Neurodevelopmental delay),14,93.33%,162,95.86%,0.500526,1.0
HP:0000343 (Long philtrum),11,73.33%,110,80.29%,0.50892,1.0


In [23]:
analysis2 = CohortAnalysis(patientCohort, 'NM_013275.6', hpo, include_unmeasured=False, include_large_SV=False)

analysis2.compare_by_exon(9)

Unnamed: 0_level_0,Inside Exon 9,Inside Exon 9,Outside Exon 9,Outside Exon 9,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p-value,Corrected p-values
HP:0004322 (Short stature),107,62.94%,8,36.36%,0.021011,0.294157
HP:0011446 (Abnormality of higher mental function),155,85.64%,23,100.00%,0.050196,0.702743
HP:0001155 (Abnormality of the hand),135,69.95%,19,90.48%,0.070128,0.98179
HP:0001249 (Intellectual disability),137,75.27%,21,91.30%,0.113838,1.0
HP:0007018 (Attention deficit hyperactivity disorder),43,81.13%,8,61.54%,0.151609,1.0
HP:0000365 (Hearing impairment),65,78.31%,9,69.23%,0.48669,1.0
HP:0000534 (Abnormal eyebrow morphology),94,82.46%,10,76.92%,0.703322,1.0
HP:0000729 (Autistic behavior),33,56.90%,8,66.67%,0.749049,1.0
HP:0010938 (Abnormal external nose morphology),98,90.74%,13,92.86%,1.0,1.0
HP:0000325 (Triangular face),57,71.25%,11,73.33%,1.0,1.0


In [24]:
from genophenocorr.model import FeatureType

In [25]:
analysis2.compare_by_protein_feature_type(FeatureType.REGION)

Unnamed: 0_level_0,Inside REGION,Inside REGION,Outside REGION,Outside REGION,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p-value,Corrected p-values
HP:0000534 (Abnormal eyebrow morphology),75,78.95%,29,90.62%,0.186927,1.0
HP:0006482 (Abnormality of dental morphology),134,83.23%,47,90.38%,0.267154,1.0
HP:0004322 (Short stature),90,58.06%,25,67.57%,0.352109,1.0
HP:0001155 (Abnormality of the hand),115,70.12%,39,78.00%,0.368574,1.0
HP:0000365 (Hearing impairment),56,74.67%,18,85.71%,0.385103,1.0
HP:0000325 (Triangular face),49,69.01%,19,79.17%,0.436847,1.0
HP:0011446 (Abnormality of higher mental function),135,88.24%,43,84.31%,0.472516,1.0
HP:0000729 (Autistic behavior),32,56.14%,9,69.23%,0.535957,1.0
HP:0001249 (Intellectual disability),119,76.28%,39,79.59%,0.70046,1.0
HP:0007018 (Attention deficit hyperactivity disorder),41,75.93%,10,83.33%,0.718813,1.0


In [26]:
patientCohort.get_protein_features_affected('NM_013275.6')

16_88489784_89491503_DEL does not have a Protein Effect Location
16_87892207_89455452_DEL does not have a Protein Effect Location
16_88743576_89406219_DEL does not have a Protein Effect Location
16_86647052_89511661_DEL does not have a Protein Effect Location
16_89228900_89593971_DEL does not have a Protein Effect Location
16_89481148_89489612_DEL does not have a Protein Effect Location
16_87306530_89269020_DEL does not have a Protein Effect Location
16_89277486_89499248_DEL does not have a Protein Effect Location
16_89182742_89309778_DEL does not have a Protein Effect Location
16_89182742_89309778_DEL does not have a Protein Effect Location
16_89321706_89475518_DEL does not have a Protein Effect Location
16_89182742_89309778_DEL does not have a Protein Effect Location
16_88697053_89277641_DEL does not have a Protein Effect Location
16_88788350_89454555_DEL does not have a Protein Effect Location
16_89056332_89434622_DEL does not have a Protein Effect Location
16_89217282_89506042_DEL 

AttributeError: 'tuple' object has no attribute 'get_features_variant_overlaps'

In [None]:
for var in patientCohort.all_variants:
    print(var.variant_string)
    for tx in var.tx_annotations:
        print(tx.is_preferred)