# KBG Syndrome (Variants in ANKRD11)

This notebook extracts data from [Martinez-Cayuelas E, et al. (2022) Clinical description, molecular delineation and genotype-phenotype correlation in 340 patients with KBG syndrome: addition of 67 new patients](https://pubmed.ncbi.nlm.nih.gov/36446582/).

Accoding to the authors, All variants were referred to the genomic construct hg19 and ANKRD11 transcript NM_013275.6.

There appears to be an error with the variant shown for individual KBG42 (VariantValidator says: "NM_013275.6:c.227G>A: Variant reference (G) does not agree with reference sequence (A)"). Therefore, we remove this row from the analysis.

Note also that c.5483G>T;p.Ser1828* (Parenti2016_P1) should be c.5483C>A (according to the chromatogram in PMID:25652421). We changed this in the excel file.

The authors do not provide all phenotypic data reported in the original publication, but in some cases provide summaries.

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import HTML, display
from pyphetools.creation import *
from pyphetools.validation import *
from pyphetools.visualization import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.9.61


In [2]:
PMID = "PMID:36446582"
title = "Clinical description, molecular delineation and genotype-phenotype correlation in 340 patients with KBG syndrome: addition of 67 new patients"
cite = Citation(pmid=PMID, title=title)
parser = HpoParser("../hp.json")
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199", citation=cite)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2024-02-27


In [3]:
df = pd.read_excel("input/Martinez-KBG-SupplTable-340-v2.xlsx");
# Drop the row with the erroneous variant as mentioned above
df = df.drop(df[df['Patient ID'] == "KBG42"].index)
df.head()

Unnamed: 0,Patient origin (1=our cohort; 2=literature),Patient ID,Gender (1=male; 2=female),ID,ASD,ADHD,History of developmental delay (motor and language >18m),ID/ADHD/ASD,Macrodontia and/or other dental anomalies,"Characteristic nose (anteverted, bulbous, and/or prominent)",...,Postnatal short stature <p10,N comorbidities (N=7),Hearing Loss and/or otitis media,"Other comorbidities (seizures, cardiopathy, visual, feeding, cryptorchydism)",Phenotypic score (columns F:P),Variant type (SNV vs CNV),Exon 9 (Yes/No),c.1903_1907del;p.Lys635GInfs*26 (Yes/No),Deletion size (CNV),Variant
0,1,KBG1,1.0,Yes,,,Yes,Yes,,,...,,3.0,,Yes,8,CNV,CNV,CNV,17000,16q24.3(89336307_89354085)x1
1,1,KBG2,1.0,Yes,,,No,Yes,,,...,,1.0,,Yes,4,CNV,CNV,CNV,250000,16q24.3(89256478_89506223)x1
2,1,KBG3,2.0,,,,Yes,,,,...,,3.0,Yes,Yes,5,SNV,Yes,No,SNV,c.2398_2401del;p.Glu800Asnfs*62
3,1,KBG4,2.0,Yes,,,Yes,Yes,,No,...,No,1.0,No,Yes,6,SNV,Yes,No,SNV,c.7083del;p.Thr2362Profs*39
4,1,KBG5,1.0,,,,,,,,...,,,,,0,SNV,Yes,Yes,SNV,c.1903_1907del;p.Lys635GInfs*26


In [4]:
column_mapper_d = {}

items = {
    'ID':['Intellectual disability','HP:0001249'],
    'ASD':['Autistic behavior','HP:0000729'],
    'ADHD':['Attention deficit hyperactivity disorder', 'HP:0007018'],
    'History of developmental delay (motor and language >18m)': ['Global developmental delay', 'HP:0001263'],
    'ID/ADHD/ASD':['Abnormality of mental function', 'HP:0011446'],
    'Macrodontia and/or other dental anomalies':['Macrodontia', 'HP:0001572'],
    'Characteristic nose (anteverted, bulbous, and/or prominent)': ['Abnormal external nose morphology', 'HP:0010938'],
    'Triangular face': ['Triangular face', 'HP:0000325'],
    'Characteristic eyebrows': ['Thick eyebrow', 'HP:0000574'],
    'Long philtrum': ['Long philtrum', 'HP:0000343'],
    'Characteristic ears (large, prominent, and/or low-set)': ['Abnormality of the outer ear', 'HP:0000356'],
    'Hand anomalies (brachydactyly or clinodactyly)': ['Abnormality of the hand', 'HP:0001155'],
    'Postnatal short stature <p10': ['Short stature', 'HP:0004322'],
    'Hearing Loss and/or otitis media': ['Hearing impairment', 'HP:0000365']
}

item_column_mapper_d = hpo_cr.initialize_simple_column_maps(column_name_to_hpo_label_map=items, observed='Yes', excluded='No')
for k, v in item_column_mapper_d.items():
    column_mapper_d[k] = v
print(f"We created {len(column_mapper_d)} simple column mappers")
column_mapper_list = list(column_mapper_d.values())

We created 14 simple column mappers


In [5]:
def get_hgvs_if_possible_otherwise_cnv(value):
    if ";" in value:
        fields = value.split(";")
        return f"SNV;{fields[0]}"
    else:
        return f"CNV;{value}"

df["ANKRD11_variant"] = df["Variant"].apply(lambda x: get_hgvs_if_possible_otherwise_cnv(x))
df.head()

Unnamed: 0,Patient origin (1=our cohort; 2=literature),Patient ID,Gender (1=male; 2=female),ID,ASD,ADHD,History of developmental delay (motor and language >18m),ID/ADHD/ASD,Macrodontia and/or other dental anomalies,"Characteristic nose (anteverted, bulbous, and/or prominent)",...,N comorbidities (N=7),Hearing Loss and/or otitis media,"Other comorbidities (seizures, cardiopathy, visual, feeding, cryptorchydism)",Phenotypic score (columns F:P),Variant type (SNV vs CNV),Exon 9 (Yes/No),c.1903_1907del;p.Lys635GInfs*26 (Yes/No),Deletion size (CNV),Variant,ANKRD11_variant
0,1,KBG1,1.0,Yes,,,Yes,Yes,,,...,3.0,,Yes,8,CNV,CNV,CNV,17000,16q24.3(89336307_89354085)x1,CNV;16q24.3(89336307_89354085)x1
1,1,KBG2,1.0,Yes,,,No,Yes,,,...,1.0,,Yes,4,CNV,CNV,CNV,250000,16q24.3(89256478_89506223)x1,CNV;16q24.3(89256478_89506223)x1
2,1,KBG3,2.0,,,,Yes,,,,...,3.0,Yes,Yes,5,SNV,Yes,No,SNV,c.2398_2401del;p.Glu800Asnfs*62,SNV;c.2398_2401del
3,1,KBG4,2.0,Yes,,,Yes,Yes,,No,...,1.0,No,Yes,6,SNV,Yes,No,SNV,c.7083del;p.Thr2362Profs*39,SNV;c.7083del
4,1,KBG5,1.0,,,,,,,,...,,,,0,SNV,Yes,Yes,SNV,c.1903_1907del;p.Lys635GInfs*26,SNV;c.1903_1907del


In [6]:
ANKRD11_transcript = 'NM_013275.6'
vvalidator = VariantValidator(genome_build="hg38", transcript=ANKRD11_transcript)
var_d = {}
for item in df["ANKRD11_variant"].unique():
    fields = item.split(";")
    if fields[0] == "SNV":
        v = fields[1]
        var = vvalidator.encode_hgvs(v)
        var_d[item] = var
    elif fields[0] == "CNV":
        cnv = fields[1] + " (hg19)"
        if "x1" in cnv:
            var = StructuralVariant.chromosomal_deletion(cell_contents=cnv, gene_symbol="ANKRD11", gene_id="HGNC:21316")
        elif "x3" in cnv:
            var = StructuralVariant.chromosomal_duplication(cell_contents=cnv, gene_symbol="ANKRD11", gene_id="HGNC:21316")
        else:
            raise ValueError(f"Unrecognized cnv type: {cnv}")
        var_d[item] = var
print(f"Extracted {len(var_d)} distinct variants")

https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_013275.6%3Ac.2398_2401del/NM_013275.6?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_013275.6%3Ac.7083del/NM_013275.6?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_013275.6%3Ac.1903_1907del/NM_013275.6?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_013275.6%3Ac.7407C>G/NM_013275.6?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_013275.6%3Ac.6691dup/NM_013275.6?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_013275.6%3Ac.3590_3594del/NM_013275.6?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_013275.6%3Ac.6792dup/NM_013275.6?content-type=application%2Fjso

In [7]:
varMapper = VariantColumnMapper(variant_d=var_d,
                                variant_column_name="ANKRD11_variant",
                                default_genotype="heterozygous")

In [10]:
sexMapper = SexColumnMapper(male_symbol='1.0', female_symbol='2.0', unknown_symbol='nan', column_name='Gender (1=male; 2=female)')
#sexMapper.preview_column(df['Gender (1=male; 2=female)'])

In [11]:
encoder = CohortEncoder(df=df, 
                        hpo_cr=hpo_cr,
                        column_mapper_list=column_mapper_list, 
                        individual_column_name="Patient ID",
                        sexmapper=sexMapper,
                        age_of_onset_mapper=AgeColumnMapper.not_provided(), 
                        age_at_last_encounter_mapper=AgeColumnMapper.not_provided(),
                        variant_mapper=varMapper,
                        metadata=metadata)
kbg = Disease(disease_id='OMIM:148050', disease_label='KBG syndrome')
encoder.set_disease(kbg)

In [12]:
individuals = encoder.get_individuals()
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
ERROR,INSUFFICIENT_HPOS,11
WARNING,REDUNDANT,224
INFORMATION,NOT_MEASURED,2321

ID,Level,Category,Message,HPO Term
PMID_36446582_KBG31B,ERROR,INSUFFICIENT_HPOS,Minimum HPO terms required 1 but only 0 found,
"PMID_36446582_Novara,_2017_P10",ERROR,INSUFFICIENT_HPOS,Minimum HPO terms required 1 but only 0 found,
PMID_36446582_Reuter2020,ERROR,INSUFFICIENT_HPOS,Minimum HPO terms required 1 but only 0 found,
PMID_36446582_VanDongen2019_P12,ERROR,INSUFFICIENT_HPOS,Minimum HPO terms required 1 but only 0 found,
PMID_36446582_VanDongen2019_P13,ERROR,INSUFFICIENT_HPOS,Minimum HPO terms required 1 but only 0 found,
PMID_36446582_VanDongen2019_P2,ERROR,INSUFFICIENT_HPOS,Minimum HPO terms required 1 but only 0 found,
PMID_36446582_VanDongen2019_P4,ERROR,INSUFFICIENT_HPOS,Minimum HPO terms required 1 but only 0 found,
PMID_36446582_VanDongen2019_P5,ERROR,INSUFFICIENT_HPOS,Minimum HPO terms required 1 but only 0 found,
PMID_36446582_VanDongen2019_P7,ERROR,INSUFFICIENT_HPOS,Minimum HPO terms required 1 but only 0 found,
PMID_36446582_VanDongen2019_P8,ERROR,INSUFFICIENT_HPOS,Minimum HPO terms required 1 but only 0 found,


In [13]:
individuals = cvalidator.get_error_free_individual_list()
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_html()))

In [21]:
table = PhenopacketTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
KBG1 (MALE; P1Y),KBG syndrome (OMIM:148050),16q24.3(89336307_89354085)x1 (hg19): chromosomal_deletion (SO:1000029),Intellectual disability (HP:0001249); Global developmental delay (HP:0001263); Triangular face (HP:0000325); Thick eyebrow (HP:0000574); Long philtrum (HP:0000343); excluded: Abnormality of the outer ear (HP:0000356)
KBG2 (MALE; P1Y),KBG syndrome (OMIM:148050),16q24.3(89256478_89506223)x1 (hg19): chromosomal_deletion (SO:1000029),Intellectual disability (HP:0001249); Triangular face (HP:0000325); Thick eyebrow (HP:0000574); excluded: Global developmental delay (HP:0001263); excluded: Long philtrum (HP:0000343)
KBG3 (FEMALE; P1Y),KBG syndrome (OMIM:148050),NM_013275.6:c.2398_2401del (heterozygous),Global developmental delay (HP:0001263); Thick eyebrow (HP:0000574); Hearing impairment (HP:0000365); excluded: Long philtrum (HP:0000343)
KBG4 (FEMALE; P1Y),KBG syndrome (OMIM:148050),NM_013275.6:c.7083del (heterozygous),Intellectual disability (HP:0001249); Global developmental delay (HP:0001263); Triangular face (HP:0000325); Thick eyebrow (HP:0000574); Long philtrum (HP:0000343); excluded: Abnormal external nose morphology (HP:0010938); excluded: Abnormality of the hand (HP:0001155); excluded: Short stature (HP:0004322); excluded: Hearing impairment (HP:0000365)
KBG5 (MALE; P1Y),KBG syndrome (OMIM:148050),NM_013275.6:c.1903_1907del (heterozygous),excluded: Long philtrum (HP:0000343)
KBG64 (MALE; P1Y),KBG syndrome (OMIM:148050),NM_013275.6:c.7407C>G (heterozygous),Intellectual disability (HP:0001249); Short stature (HP:0004322); Hearing impairment (HP:0000365)
KBG6 (MALE; P1Y),KBG syndrome (OMIM:148050),NM_013275.6:c.6691dup (heterozygous),Intellectual disability (HP:0001249); Attention deficit hyperactivity disorder (HP:0007018); Global developmental delay (HP:0001263); Macrodontia (HP:0001572); Abnormal external nose morphology (HP:0010938); Triangular face (HP:0000325); Thick eyebrow (HP:0000574); Long philtrum (HP:0000343); Abnormality of the outer ear (HP:0000356); Abnormality of the hand (HP:0001155); Short stature (HP:0004322); Hearing impairment (HP:0000365); excluded: Autistic behavior (HP:0000729)
KBG10A (MALE; P1Y),KBG syndrome (OMIM:148050),NM_013275.6:c.3590_3594del (heterozygous),Autistic behavior (HP:0000729); Attention deficit hyperactivity disorder (HP:0007018); Global developmental delay (HP:0001263); Hearing impairment (HP:0000365); excluded: Short stature (HP:0004322)
KBG10B (FEMALE; P1Y),KBG syndrome (OMIM:148050),NM_013275.6:c.3590_3594del (heterozygous),Macrodontia (HP:0001572); Triangular face (HP:0000325); Abnormality of the outer ear (HP:0000356); Abnormality of the hand (HP:0001155); excluded: Abnormal external nose morphology (HP:0010938); excluded: Thick eyebrow (HP:0000574); excluded: Long philtrum (HP:0000343); excluded: Short stature (HP:0004322)
KBG8A (FEMALE; P1Y),KBG syndrome (OMIM:148050),NM_013275.6:c.6792dup (heterozygous),Global developmental delay (HP:0001263); Abnormal external nose morphology (HP:0010938); Triangular face (HP:0000325); Abnormality of the outer ear (HP:0000356); Abnormality of the hand (HP:0001155); Short stature (HP:0004322); excluded: Autistic behavior (HP:0000729); excluded: Thick eyebrow (HP:0000574); excluded: Long philtrum (HP:0000343); excluded: Hearing impairment (HP:0000365)


In [14]:
output_directory = "phenopackets"
Individual.output_individuals_as_phenopackets(individual_list=individuals, 
                                             metadata=metadata,
                                             outdir=output_directory)

We output 328 GA4GH phenopackets to the directory phenopackets


In [None]:
# pxf validate --hpo hp.json *.json
# no errors