<h1>KBG Syndrome</h1>
<p>Data from <a href="https://pubmed.ncbi.nlm.nih.gov/36446582/" target="__blank">Martinez-Cayuelas E, et al. Clinical description, molecular delineation and genotype-phenotype correlation in 340 patients with KBG syndrome: addition of 67 new patients. J Med Genet. 2022 Nov 29:jmedgenet-2022-108632. PMID: 36446582.</a>.</p>

In [1]:
import os
import hpotk

In [2]:
fpath_hpo = 'hpo_data/hp.json'
cache_dir = 'annotations'
fpath_phenopackets = 'phenopackets'
tx_id = 'NM_013275.6'
protein_id = 'Q15327'
if not os.path.isdir(cache_dir):
    os.mkdir(cache_dir)

In [3]:
from genophenocorr.phenotype import PhenotypeCreator

In [4]:
hpo: hpotk.ontology.Ontology = hpotk.ontology.load.obographs.load_ontology(fpath_hpo)
validators = [
    hpotk.validate.AnnotationPropagationValidator(hpo),
    hpotk.validate.ObsoleteTermIdsValidator(hpo),
    hpotk.validate.PhenotypicAbnormalityValidator(hpo)
]
phenotype_creator = PhenotypeCreator(hpo, hpotk.validate.ValidationRunner(validators))

In [5]:
from genophenocorr.protein import UniprotProteinMetadataService, ProteinAnnotationCache, ProtCachingFunctionalAnnotator
from genophenocorr.variant import VarCachingFunctionalAnnotator, VariantAnnotationCache, VepFunctionalAnnotator
from genophenocorr.patient import PhenopacketPatientCreator

In [6]:
# Protein metadata
pm = UniprotProteinMetadataService()
pac = ProteinAnnotationCache(cache_dir)
pfa = ProtCachingFunctionalAnnotator(pac, pm)

# Functional annotator
vac = VariantAnnotationCache(cache_dir)
vep = VepFunctionalAnnotator(pfa)
vfa = VarCachingFunctionalAnnotator(vac, vep)


# Assemble the patient creator
pc = PhenopacketPatientCreator(phenotype_creator, vfa)

In [8]:
from genophenocorr.cohort import load_phenopacket_folder

In [11]:
patientCohort = load_phenopacket_folder(fpath_phenopackets, pc)

Expected at least one HPO term per patient, but received none for patient VanDongen2019_P2
Expected at least one HPO term per patient, but received none for patient VanDongen2019_P12
Expected at least one HPO term per patient, but received none for patient Reuter2020
Expected at least one HPO term per patient, but received none for patient Novara, 2017_P10
Expected at least one variant per patient, but received none for patient Parenti2016_P1
Expected at least one HPO term per patient, but received none for patient VanDongen2019_P13
Expected at least one HPO term per patient, but received none for patient VanDongen2019_P8
Expected at least one HPO term per patient, but received none for patient VanDongen2019_P4
Expected at least one HPO term per patient, but received none for patient VanDongen2019_P5
Expected at least one HPO term per patient, but received none for patient KBG31B
Expected at least one HPO term per patient, but received none for patient VanDongen2019_P9
Expected at leas

In [12]:
patientCohort.list_all_phenotypes()

[('HP:0006482', 224),
 ('HP:0011446', 220),
 ('HP:0001249', 194),
 ('HP:0001155', 189),
 ('HP:0012758', 176),
 ('HP:0004322', 150),
 ('HP:0010938', 134),
 ('HP:0000534', 126),
 ('HP:0000343', 121),
 ('HP:0000365', 97),
 ('HP:0000325', 83),
 ('HP:0000356', 77),
 ('HP:0007018', 61),
 ('HP:0000729', 56)]

In [13]:
patientCohort.list_all_variants()

[('16_89284634_GTGTTT/G', 34),
 ('16_89284129_CTTTTT/C', 10),
 ('16_89284140_TTTTC/T', 9),
 ('16_89285157_GTTTC/G', 8),
 ('16_89275181_-/G', 5),
 ('16_89279750_-/G', 5),
 ('16_89217282_deletion', 4),
 ('16_89277486_deletion', 4),
 ('16_89182742_deletion', 4),
 ('16_89284363_CTTTG/C', 3),
 ('16_89274958_C/G', 3),
 ('16_88197356_deletion', 3),
 ('16_89284358_GAT/G', 3),
 ('16_89284345_G/A', 3),
 ('16_89284565_G/C', 3),
 ('16_89284524_duplication', 3),
 ('16_89282136_C/T', 3),
 ('16_89282710_T/A', 3),
 ('16_89283314_CCTTT/C', 3),
 ('16_89285224_G/A', 2),
 ('16_89282455_G/A', 2),
 ('16_89206685_deletion', 2),
 ('16_89279326_G/A', 2),
 ('16_89275128_G/A', 2),
 ('16_89283496_CG/C', 2),
 ('16_89262070_deletion', 2),
 ('16_89321706_deletion', 2),
 ('16_89280752_G/T', 2),
 ('16_89282158_-/T', 2),
 ('16_89285153_TTTTG/T', 2),
 ('16_89268636_C/A', 2),
 ('16_89283233_-/T', 2),
 ('16_88788350_deletion', 2),
 ('16_89281054_C/A', 2),
 ('16_89282834_CTGTT/C', 2),
 ('16_89228900_deletion', 2),
 ('16_89

In [14]:
patientCohort.list_all_patients()

['KBG18',
 'Ockeloen2015_P1',
 'KBG55',
 'Goldenberg2016_P16',
 'Low, 2016_P5 (6)',
 'KBG59',
 'Novara, 2017_P7',
 'Ockeloen2015_P19',
 'Low, 2016_34 (32)',
 'Ockeloen2015_P7',
 'Sirmaci2011_P4 (previously published Brancati, 2004)',
 'Scarano, 2013_P8',
 'KBG38',
 'Kutkowska-Kazmierczak2021_P10',
 'KBG64',
 'Goldenberg2016_P8',
 'Ockeloen2015_P20',
 'Goldenberg2016_P3',
 'Youngs2011',
 'VanDongen2019_P6',
 'Low, 2016_P24 (21)',
 'Parenti2021_P11',
 'Kutkowska-Kazmierczak2021_P13',
 'Parenti2021_P15',
 'Goldenberg2016_P14',
 'KBG31B',
 'VanDongen2019_P9',
 'KBG20',
 'Ockeloen2015_P8',
 'KBG33',
 'Low, 2016_P27 (24)',
 'Alves, 2019',
 'Goldenberg2016_P25',
 'Parenti2021_P7',
 'Gnazzo, 2020_P5',
 'Parenti2021_P2',
 'Low, 2016_P10 (18)',
 'Low, 2016_P2 (26)',
 'Scarano, 2013_P12',
 'Parenti2021_P10',
 'Goldenberg2016_P27',
 'Gnazzo, 2020_P30',
 'Sayed, 2020_P2',
 'Sirmaci2011_P3/F1? (previously published Tekin, 2004)',
 'Low, 2016_P15 (3)',
 'Scarano, 2013_P10',
 'Sirmaci2011_P5',
 'Golde

In [15]:
patientCohort.list_all_proteins()

[('NP_001243112.1', 337),
 ('NP_001243111.1', 337),
 ('NP_037407.4', 337),
 ('NP_872337.2', 46),
 ('NP_004924.1', 37),
 ('NP_777577.2', 29),
 ('NP_001230208.1', 29),
 ('NP_001120686.1', 29),
 ('NP_001136336.2', 25),
 ('NP_001305459.1', 25),
 ('NP_000503.1', 25),
 ('NP_001025189.1', 25),
 ('NP_001305453.1', 25),
 ('NP_000476.1', 25),
 ('NP_112190.2', 25),
 ('NP_001305454.1', 25),
 ('NP_057293.1', 25),
 ('NP_005178.4', 25),
 ('NP_787127.1', 25),
 ('NP_001305457.1', 25),
 ('NP_001073956.2', 25),
 ('NP_001305458.1', 25),
 ('NP_001281257.1', 25),
 ('NP_001305456.1', 25),
 ('NP_001305461.1', 25),
 ('NP_001305455.1', 25),
 ('NP_001012780.1', 22),
 ('NP_849163.1', 22),
 ('NP_001165286.1', 22),
 ('NP_001305436.1', 22),
 ('NP_001012777.1', 22),
 ('NP_001165287.1', 22),
 ('NP_001305442.1', 22),
 ('NP_840101.1', 21),
 ('NP_000092.2', 20),
 ('NP_002452.1', 20),
 ('NP_001281269.1', 20),
 ('NP_037410.1', 20),
 ('NP_653205.3', 20),
 ('NP_003110.1', 19),
 ('NP_955399.1', 19),
 ('NP_722520.2', 15),
 ('N

In [16]:
patientCohort.list_data_by_tx('NM_013275.6')

{'NM_013275.6': Counter({'stop_gained': 52,
          'downstream_gene_variant': 1,
          'stop_lost': 28,
          'feature_truncation': 51,
          'coding_sequence_variant': 43,
          '5_prime_UTR_variant': 42,
          '3_prime_UTR_variant': 29,
          'intron_variant': 54,
          'frameshift_variant': 97,
          'transcript_ablation': 14,
          'splice_donor_variant': 2,
          'transcript_amplification': 1,
          'missense_variant': 7,
          'inframe_deletion': 2,
          'splice_acceptor_variant': 2,
          'splice_region_variant': 2,
          'feature_elongation': 1})}

In [17]:
patientCohort.list_data_by_tx()

{'NM_001256182.2': Counter({'stop_gained': 52,
          'downstream_gene_variant': 1,
          'stop_lost': 28,
          'feature_truncation': 51,
          'coding_sequence_variant': 43,
          '5_prime_UTR_variant': 42,
          '3_prime_UTR_variant': 29,
          'intron_variant': 54,
          'frameshift_variant': 97,
          'transcript_ablation': 14,
          'splice_donor_variant': 2,
          'transcript_amplification': 1,
          'missense_variant': 7,
          'inframe_deletion': 2,
          'splice_acceptor_variant': 2,
          'splice_region_variant': 2,
          'feature_elongation': 1}),
 'NM_001384929.1': Counter({'transcript_ablation': 8,
          'feature_truncation': 1,
          'coding_sequence_variant': 1,
          '5_prime_UTR_variant': 1,
          'intron_variant': 1,
          'transcript_amplification': 1}),
 'NM_001384918.1': Counter({'transcript_ablation': 8,
          'feature_truncation': 1,
          'coding_sequence_variant': 1,
   

In [18]:
patientCohort.all_proteins

{ProteinMetadata(id=NP_000092.2, label=Cytochrome b-245 light chain, features=(SimpleProteinFeature(type=FeatureType.REGION, info=FeatureInfo(name=Disordered, start=134, end=195)),)),
 ProteinMetadata(id=NP_000476.1, label=Adenine phosphoribosyltransferase, features=()),
 ProteinMetadata(id=NP_000503.1, label=N-acetylgalactosamine-6-sulfatase, features=(SimpleProteinFeature(type=FeatureType.REGION, info=FeatureInfo(name=Catalytic domain, start=27, end=379)),)),
 ProteinMetadata(id=NP_000968.2, label=Large ribosomal subunit protein eL13, features=()),
 ProteinMetadata(id=NP_001012777.1, label=Cytoplasmic tRNA 2-thiolation protein 2, features=(SimpleProteinFeature(type=FeatureType.REGION, info=FeatureInfo(name=Disordered, start=1, end=24)), SimpleProteinFeature(type=FeatureType.REGION, info=FeatureInfo(name=Disordered, start=188, end=217)))),
 ProteinMetadata(id=NP_001012780.1, label=Cytoplasmic tRNA 2-thiolation protein 2, features=(SimpleProteinFeature(type=FeatureType.REGION, info=Fea

In [19]:
from genophenocorr.cohort import CohortAnalysis

In [20]:
analysis = CohortAnalysis(patientCohort, 'NM_013275.6', hpo, include_unmeasured=False)

In [21]:
from genophenocorr.constants import VariantEffect

In [22]:
VariantEffect.FRAMESHIFT_VARIANT.value

'SO:0001589'

In [23]:
analysis.compare_by_variant_type(VariantEffect.FRAMESHIFT_VARIANT)

Unnamed: 0_level_0,With frameshift_variant,With frameshift_variant,Without frameshift_variant,Without frameshift_variant,Unnamed: 5_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p-value
HP:0011446 (Abnormality of higher mental function),114,82.01%,106,95.50%,0.001371
HP:0001249 (Intellectual disability),100,70.92%,94,87.04%,0.003175
HP:0007018 (Attention deficit hyperactivity disorder),35,81.40%,26,66.67%,0.13903
HP:0000325 (Triangular face),45,71.43%,38,58.46%,0.141518
HP:0006482 (Abnormality of dental morphology),125,85.62%,99,81.82%,0.408845
HP:0001155 (Abnormality of the hand),101,67.33%,88,72.13%,0.428461
HP:0012758 (Neurodevelopmental delay),86,94.51%,90,96.77%,0.494487
HP:0000365 (Hearing impairment),52,80.00%,45,76.27%,0.666875
HP:0010938 (Abnormal external nose morphology),71,89.87%,63,92.65%,0.772039
HP:0000729 (Autistic behavior),27,56.25%,29,60.42%,0.836156


In [24]:
analysis.compare_by_variant('16_89284634_GTGTTT/G')

16_89270870_G/A == 16_89284634_GTGTTT/G
16_89275181_-/G == 16_89284634_GTGTTT/G
16_89279840_AT/A == 16_89284634_GTGTTT/G
16_89284811_-/A == 16_89284634_GTGTTT/G
16_89283304_deletion == 16_89284634_GTGTTT/G
16_87886395_deletion == 16_89284634_GTGTTT/G
16_89284634_GTGTTT/G == 16_89284634_GTGTTT/G
16_89283415_GGATT/G == 16_89284634_GTGTTT/G
16_89290674_GATGC/G == 16_89284634_GTGTTT/G
16_89056332_deletion == 16_89284634_GTGTTT/G
16_89275192_C/G == 16_89284634_GTGTTT/G
16_89476288_duplication == 16_89284634_GTGTTT/G
16_89284358_GAT/G == 16_89284634_GTGTTT/G
16_89279914_C/A == 16_89284634_GTGTTT/G
16_89275181_-/G == 16_89284634_GTGTTT/G
16_89282158_-/T == 16_89284634_GTGTTT/G
16_89285369_G/C == 16_89284634_GTGTTT/G
16_89269019_deletion == 16_89284634_GTGTTT/G
16_89284601_GG/A == 16_89284634_GTGTTT/G
16_89283349_T/A == 16_89284634_GTGTTT/G
16_89283775_GCT/G == 16_89284634_GTGTTT/G
16_89283094_G/A == 16_89284634_GTGTTT/G
16_89280029_-/G == 16_89284634_GTGTTT/G
16_89279566_CCTTCGGGG/C == 16_892

Unnamed: 0_level_0,With 16_89284634_GTGTTT/G,With 16_89284634_GTGTTT/G,Without 16_89284634_GTGTTT/G,Without 16_89284634_GTGTTT/G,Unnamed: 5_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p-value
HP:0001249 (Intellectual disability),15,55.56%,179,80.63%,0.005872
HP:0011446 (Abnormality of higher mental function),19,70.37%,201,90.13%,0.007468
HP:0010938 (Abnormal external nose morphology),13,81.25%,121,92.37%,0.153085
HP:0007018 (Attention deficit hyperactivity disorder),7,100.00%,54,72.00%,0.182109
HP:0000534 (Abnormal eyebrow morphology),10,71.43%,116,82.27%,0.299278
HP:0001155 (Abnormality of the hand),22,78.57%,167,68.44%,0.385988
HP:0000729 (Autistic behavior),3,42.86%,53,59.55%,0.445576
HP:0000365 (Hearing impairment),7,70.00%,90,78.95%,0.452661
HP:0012758 (Neurodevelopmental delay),14,93.33%,162,95.86%,0.500526
HP:0000343 (Long philtrum),11,73.33%,110,80.29%,0.50892


In [25]:
analysis2 = CohortAnalysis(patientCohort, 'NM_013275.6', hpo, include_unmeasured=False, include_large_SV=False)

analysis2.compare_by_exon(9)

Unnamed: 0_level_0,Inside Exon 9,Inside Exon 9,Outside Exon 9,Outside Exon 9,Unnamed: 5_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p-value
HP:0004322 (Short stature),107,62.94%,8,36.36%,0.021011
HP:0011446 (Abnormality of higher mental function),155,85.16%,23,100.00%,0.049316
HP:0001155 (Abnormality of the hand),136,70.10%,19,90.48%,0.070007
HP:0001249 (Intellectual disability),137,74.86%,21,91.30%,0.114119
HP:0007018 (Attention deficit hyperactivity disorder),43,81.13%,8,61.54%,0.151609
HP:0000365 (Hearing impairment),65,78.31%,9,69.23%,0.48669
HP:0000534 (Abnormal eyebrow morphology),94,82.46%,10,76.92%,0.703322
HP:0000729 (Autistic behavior),33,56.90%,8,66.67%,0.749049
HP:0000325 (Triangular face),57,71.25%,11,73.33%,1.0
HP:0010938 (Abnormal external nose morphology),98,90.74%,13,92.86%,1.0


In [26]:
from genophenocorr.protein import FeatureType

In [27]:
analysis.compare_by_protein_feature_type(FeatureType.REGION)

Unnamed: 0_level_0,Inside REGION,Inside REGION,Outside REGION,Outside REGION,Unnamed: 5_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p-value
HP:0000325 (Triangular face),54,71.05%,29,55.77%,0.091036
HP:0000365 (Hearing impairment),57,75.00%,40,83.33%,0.372195
HP:0001249 (Intellectual disability),122,76.25%,72,80.90%,0.429425
HP:0000356 (Abnormality of the outer ear),48,82.76%,29,74.36%,0.443133
HP:0000534 (Abnormal eyebrow morphology),79,79.80%,47,83.93%,0.66886
HP:0000343 (Long philtrum),79,80.61%,42,77.78%,0.679394
HP:0001155 (Abnormality of the hand),119,70.41%,70,67.96%,0.685622
HP:0012758 (Neurodevelopmental delay),108,96.43%,68,94.44%,0.713501
HP:0000729 (Autistic behavior),33,56.90%,23,60.53%,0.833053
HP:0004322 (Short stature),93,58.49%,57,57.00%,0.897211
