# DOCK8 Zhang et al. (2009)

Dat in this notebook was taken from [Zhang Q, et al. (2009) Combined immunodeficiency associated with DOCK8 mutations. N Engl J Med. 2009 Nov 19;361(21):2046-55. PMID:19776401](https://pubmed.ncbi.nlm.nih.gov/19776401/)

The variants described in the publication were coded as follows.

- Family 1: NM_203447.3(DOCK8):c.1126-395_2971-2751del (NM_203447.3(DOCK8):c.1126-395_2971-2751del ); chr9:333830-394034 (GRCh38); 
  360.2kb deletion (ClinVar   VCV000000950), homozygous. Deletion A in Family 1 spanned exons 10 through 23.
- Family 2: NC_000009.12:g.311735_398140del. deletion B in Family 2 spanned exons 5 through 24, homzygous
- Family 3: patient 3-1 the het c.538-15T>G intron 5 splice mutation (Inferred to be NM_203447.4:c.529-15T>G); het chr9:g(278220_285346)_(395056_406292)del
- Family 4: pE385X (inferred to be NM_203447.4:c.1141G>T (NP_982272.2:p.(E381*)) het, and chr9:g.(153190_194193)_(528588_534450)del
- Family 5: R249X (inferred to be NM_203447.4(DOCK8):c.949C>T (p.Arg317Ter) because R249* is listed as an alias in the ClinVar record): het, also chr9:g(153190_194193)_(351777_360184)del
- Patient 6-1 carries the c.538-18C>G intron 5 splice mutation. (Inferred to be NM_203447.4:c.529-18C>G) and also chr9:g(330142_346079)_(395056_292)del
- Patient 7. compound heterozygous frameshift mutations in the DOCK8 gene: a 1-bp deletion (c.3290delC) and an 8-bp insertion (c.3303_3304insTGGCTGCT).
- Family 8: NM_203447.4(DOCK8):c.1418A>G (p.Lys473Arg), homozygous

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import display, HTML
from pyphetools.creation import *
from pyphetools.visualization import *
from pyphetools.validation import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.9.16


In [3]:
PMID = "PMID:19776401"
title = "Combined immunodeficiency associated with DOCK8 mutations"
cite = Citation(pmid=PMID, title=title)
metadata = MetaData(created_by="ORCID:0000-0002-5648-2155", citation=cite)
parser = HpoParser(hpo_json_file="../hp.json")
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2023-10-09


In [54]:
df = pd.read_excel("input/Zhang_2009_DOCK8.xlsx")

In [55]:
dft = df.transpose()
dft.columns = dft.iloc[0]
dft.drop(dft.index[0], inplace=True)
dft['patient_id'] = dft.index 
dft.head()

Variable,Age (yr),Deceased,Sex,Ethnicity,Atopic dermatitis,Allergies,Asthma,Skin and soft tissue infections,Respiratory tract infections,Viral infections,Other,Malignancies,Additional history,HIES score,NM_203447.3(var1),NM_203447.3(var2),patient_id
Patient 1-1,6,No,female,Yemeni,(+),"Food - beef, cow's milk, egg, sesame, Environmental - Bermuda grass, mountain cedar",(-),Diaper cellulitis,"Recurrent otitis media, pneumonias",Recurrent orolabial HSV,"Oral candidiasis, tooth decay",No,"Poor growth, high forehead, thinning hair",40,c.1126-395_2971-2751del,c.1126-395_2971-2751del,Patient 1-1
Patient 2-1,21,Yes,male,Lebanese,(+),"Food - avocado, banana, beef, cantaloupe, carrot, cow's milk, cucumber, egg, kiwi, lamb, lentil, mango, mustard, pea, pineapple, pistachio, pomegranate, salmon, sesame, shrimp, watermelon, wheat, yeast, Environmental - seasonal rhinitis to Alternaria, ragweed, trees",(+),"Staphylococcus aureus skin abscesses, axillary lymphadenitis, otitis externa","Recurrent otitis media, sinusitis, adenoviral pneumonia","Diffuse flat warts, herpes zoster",Pericarditis,Metastatic anal squamous cell carcinoma,"Eosinophilic esophagitis, eosinophilic dermatitis, eosinophilic lung disease, bronchiectasis, hypospadius",na,NC_000009.12:g.311735_398140del,NC_000009.12:g.311735_398140del,Patient 2-1
Patient 3-1,18,Yes,female,Caucasian,(+),"Food- catfish, eggs, peanuts, shellfish*",(+),"Staphylococcus aureus skin abscesses, otitis externa","Streptococcus pneumonia, non-typeable Haemophilus influenzae, RSV pneumonias, otitis media","HSV - keratitis, eczema herpeticum, recurrent genital infections; diffuse molluscum contagiosum","Salmonella enteritis, giardiasis, Staphyloccocus aureus osteomyelitis, vaginal candidiasis","Paranasal squamous cell carcinoma, vulvar squamous cell carcinoma, cutaneous T-cell lymphoma",Poor growth,na,c.529-15T>G,chr9:g(278220_285346)_(395056_406292)del,Patient 3-1
Patient 4-1,17,No,male,Caucasian,(+),Drug - cefaclor,(+),No,"Recurrent otitis media, sinusitis, bronchitis, croup","Diffuse flat warts, diffuse molluscum contagiosum, severe primary varicella",No,No,Cavernous angioma,27,c.1141G>T,chr9:g.(153190_194193)_(528588_534450)del,Patient 4-1
Patient 4-2,14,No,female,Caucasian,(+),Drug - penicillin,(+),No,"Recurrent otitis media, sinusitis, Recurrent pneumonia, bronchitis","Recurrent orolabial HSV, diffuse flat warts, diffuse molluscum contagiosum",Salmonella enteritis,No,No,30,c.1141G>T,chr9:g.(153190_194193)_(528588_534450)del,Patient 4-2


In [56]:
generator = SimpleColumnMapperGenerator(df=dft, hpo_cr=hpo_cr, observed="(+)", excluded="(-)")
column_mapper_d = generator.try_mapping_columns()
display(HTML(generator.to_html()))

Result,Columns
Mapped,Atopic dermatitis; Asthma
Unmapped,Age (yr); Deceased; Sex; Ethnicity; Allergies; Skin and soft tissue infections; Respiratory tract infections; Viral infections; Other; Malignancies; Additional history; HIES score; NM_203447.3(var1); NM_203447.3(var2); patient_id


In [57]:
res = OptionColumnMapper.autoformat(df=dft, concept_recognizer=hpo_cr)
print(res)

age_(yr)_d = {'21': 'PLACEHOLDER',
 '18': 'PLACEHOLDER',
 '17': 'PLACEHOLDER',
 '14': 'PLACEHOLDER',
 '13': 'PLACEHOLDER',
 '16': 'PLACEHOLDER'}
age_(yr)Mapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=age_(yr)_d)
age_(yr)Mapper.preview_column(df['Age (yr)'])
column_mapper_d['Age (yr)'] = age_(yr)Mapper

deceased_d = {'No': 'PLACEHOLDER',
 'Yes': 'PLACEHOLDER'}
deceasedMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=deceased_d)
deceasedMapper.preview_column(df['Deceased'])
column_mapper_d['Deceased'] = deceasedMapper

ethnicity_d = {'Yemeni': 'PLACEHOLDER',
 'Lebanese': 'PLACEHOLDER',
 'Caucasian': 'PLACEHOLDER',
 'Mexican': 'PLACEHOLDER'}
ethnicityMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=ethnicity_d)
ethnicityMapper.preview_column(df['Ethnicity'])
column_mapper_d['Ethnicity'] = ethnicityMapper

atopic_dermatitis_d = {'(+)': 'PLACEHOLDER'}
atopic_dermatitisMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=atopic_dermat

In [53]:
allergies_d = {'Food - beef': 'Meat allergen allergy',
  "cow's milk": 'Dairy allergy',
 'egg': 'Egg allergy',
 "cow's milk": 'Dairy allergy',
 'pistachio': 'Nut food product allergy',
 'salmon': 'Seafood allergy',
 'shrimp': 'Seafood allergy',
 'ragweed': 'Seasonal allergy',
 'trees': 'Seasonal allergy',
 'Food- catfish': 'Seafood allergy',
 'eggs': 'Egg allergy',
 'peanuts': 'Nut food product allergy',
 'shellfish*': 'Seafood allergy',
 'Drug - cefaclor': 'Drug allergy',
 'Drug - penicillin': 'Drug allergy',
 'fish': 'Seafood allergy',
 'peanut': 'Nut food product allergy',
 'shellfish': 'Seafood allergy',
 'tree nuts': 'Nut food product allergy',
 'Drug - penicillin': 'Drug allergy',
 'sulfa;\nEnvironmental -\ngrass': 'Drug allergy',
 "Food - cow's milk": 'Dairy allergy',
 'tree nuts*;\nDrug -\nclarithromycin': 'Nut food product allergy',
 'penicillin': 'Drug allergy',
 'sulfa;\nEnvironmental - cat': 'Drug allergy',
 'Food - crab': 'Seafood allergy',
 'tree nuts*;\nEnvironmental -\nseasonal rhinitis': 'Nut food product allergy',
 "goat's milk": 'Dairy allergy',
 'tuna*;\nDrug - sulfa': 'Drug allergy',
 'chicken': 'Meat allergen allergy',
 'pork': 'Meat allergen allergy',
 'tree nuts': 'Nut food product allergy',
 'tomato*; Drug -\nCefipime': 'Drug allergy',
 'Lactinex': 'Drug allergy',
 'pork*;\nEnvironmental -\ndust': 'Meat allergen allergy'}
allergiesMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=allergies_d)
allergiesMapper.preview_column(dft['Allergies'])
column_mapper_d['Allergies'] = allergiesMapper


In [41]:
skin_and_soft_tissue_infections_d = {'Diaper cellulitis': 'Cellulitis',
 'Staphylococcus aureus skin abscesses': 'Skin abscess',
 'axillary lymphadenitis': 'Lymphadenitis',
 'otitis externa': 'Otitis externa',
 'Staphylococcus aureus impetigo and skin infections': 'Cellulitis',
 'Skin abscesses': 'Skin abscess',
 'impetigo': 'Recurrent bacterial skin infections',
 'Staphylococcus aureus skin infections': 'Cellulitis',
 'Acinetobacter baumanii otitis externa': 'Otitis externa'}
excluded = {'No': 'Skin abscess',}
skin_and_soft_tissue_infectionsMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=skin_and_soft_tissue_infections_d,
                                                          excluded_d=excluded)
skin_and_soft_tissue_infectionsMapper.preview_column(dft['Skin and soft tissue infections'])
column_mapper_d['Skin and soft tissue infections'] = skin_and_soft_tissue_infectionsMapper

Unnamed: 0,original text,terms
0,Diaper cellulitis,HP:0100658 (Cellulitis/observed)
1,"Staphylococcus aureus skin abscesses, axillary lymphadenitis, otitis externa",HP:0031292 (Skin abscess/observed); HP:0002840 (Lymphadenitis/observed); HP:0410017 (Otitis externa/observed)
2,"Staphylococcus aureus skin abscesses, otitis externa",HP:0031292 (Skin abscess/observed); HP:0410017 (Otitis externa/observed)
3,No,HP:0031292 (Skin abscess/excluded)
4,No,HP:0031292 (Skin abscess/excluded)
5,Staphylococcus aureus skin abscesses,HP:0031292 (Skin abscess/observed)
6,"Staphylococcus aureus impetigo and skin infections, otitis externa",HP:0100658 (Cellulitis/observed); HP:0410017 (Otitis externa/observed)
7,"Skin abscesses, impetigo",HP:0031292 (Skin abscess/observed); HP:0005406 (Recurrent bacterial skin infections/observed)
8,"Staphylococcus aureus skin abscesses, otitis externa",HP:0031292 (Skin abscess/observed); HP:0410017 (Otitis externa/observed)
9,"Staphylococcus aureus skin infections, otitis externa",HP:0100658 (Cellulitis/observed); HP:0410017 (Otitis externa/observed)


In [59]:
respiratory_tract_infections_d = {'Recurrent otitis media': 'Recurrent otitis media',
 'pneumonias': 'Recurrent pneumonia',
 'sinusitis': 'Sinusitis',
 'adenoviral pneumonia': 'Pneumonia',
 'Streptococcus pneumonia': 'Pneumonia',
 'RSV pneumonias': 'Recurrent pneumonia',
 'otitis media': 'Recurrent otitis media',
 'bronchitis': 'Recurrent bronchitis',
 'croup': 'Subglottic laryngitis',
 'Recurrent pneumonia': 'Recurrent pneumonia',
 'mastoiditis': 'Mastoiditis',
 'Pneumocystis jirovecii pneumonia': 'Pneumocystis jirovecii pneumonia',
 'Haemophilus influenzae pneumonia': 'Pneumonia',
 'Recurrent pneumonias': 'Recurrent pneumonia',
 'Otitis media': 'Recurrent otitis media',
 'pneumonia': 'Recurrent pneumonia'}

excluded = { 'No': 'Pneumonia', }
respiratory_tract_infectionsMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=respiratory_tract_infections_d, excluded_d=excluded)
respiratory_tract_infectionsMapper.preview_column(dft['Respiratory tract infections'])
column_mapper_d['Respiratory tract infections'] = respiratory_tract_infectionsMapper

In [48]:
viral_infections_d = {'Recurrent orolabial HSV': 'Recurrent oral herpes',
 'Diffuse flat warts': 'Verrucae',
 #'herpes zoster': 'PLACEHOLDER',
 'HSV - keratitis': 'Keratitis',
 'eczema herpeticum': 'Eczema',
 'recurrent genital infections; diffuse molluscum contagiosum': 'Molluscum contagiosum',
 'diffuse molluscum contagiosum': 'Molluscum contagiosum',
 'severe primary varicella': 'Severe varicella zoster infection',
 'diffuse flat warts': 'Verrucae',
 'Diffuse molluscum contagiosum': 'Molluscum contagiosum',
 'verrucous warts on fingers': 'Verrucae',
 'recurrent herpes zoster': 'Recurrent herpes',
 'persistent orolabial HSV': 'Recurrent oral herpes',
 'Flat and verrucous warts on face': 'Verrucae',}
viral_infectionsMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=viral_infections_d)
viral_infectionsMapper.preview_column(dft['Viral infections'])
column_mapper_d['Viral infections'] = viral_infectionsMapper

In [28]:
other_d = {'Oral candidiasis': 'Chronic mucocutaneous candidiasis',
 'tooth decay': 'Carious teeth',
 'Pericarditis': 'Pericarditis',
# 'Salmonella enteritis': 'PLACEHOLDER',
 'giardiasis': 'Severe giardiasis', # infer severe
 'Staphyloccocus aureus osteomyelitis': 'Osteomyelitis',
 'vaginal candidiasis': 'Recurrent vulvovaginal candidiasis',
 'nail candidiasis': 'Onychomycosis',
 'Vaginal candidiasis': 'Recurrent vulvovaginal candidiasis',
 'Haemophilus\ninfluenzae and\ncryptococcal meningitis; recurrent\nStaphylococcus\naureus and\nAcinetobacter\nbaumanii sepsis': 'Cryptococcal meningitis'}
otherMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=other_d)
otherMapper.preview_column(dft['Other'])
column_mapper_d['Other'] = otherMapper

In [62]:
malignancies_d = {
 'Metastatic anal squamous cell carcinoma': 'Squamous cell carcinoma',
 'Paranasal squamous cell carcinoma': 'Squamous cell carcinoma',
 'vulvar squamous cell carcinoma': 'Squamous cell carcinoma of the vulva',
 'cutaneous T-cell lymphoma': 'Cutaneous T-cell lymphoma',
 'Vulvar squamous cell carcinoma': 'Squamous cell carcinoma of the vulva'}
excluded = {'No': 'Neoplasm',}
malignanciesMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=malignancies_d, excluded_d=excluded)
malignanciesMapper.preview_column(dft['Malignancies'])
column_mapper_d['Malignancies'] = malignanciesMapper

In [65]:
additional_history_d = {'Poor growth': 'Growth delay',
 'high forehead': 'High forehead',

 'Eosinophilic esophagitis': 'Eosinophilic infiltration of the esophagus',
 'eosinophilic dermatitis': 'Eosinophilic dermal infiltration',
 'eosinophilic lung disease': 'Eosinophilic pneumonia',
 'bronchiectasis': 'Bronchiectasis',
 'hypospadius': 'Hypospadias',
 'Cavernous angioma': 'Cavernous hemangioma',
 'bronchiectasis and lung cyst': 'Pulmonary cyst',
 'high-arched palate': 'High palate',
 'hyperextensibility': 'Joint hyperextensibility',
 'High-arched palate': 'High palate',
 #'minimal trauma fracture': 'PLACEHOLDER',
 'Bronchiectasis': 'Bronchiectasis',
 'scoliosis': 'Scoliosis',
 'Retained primary teeth': 'Persistence of primary teeth',
 'pneumatocele': 'Pulmonary pneumatocele',
 'poor growth': 'Growth delay',
 'Delayed puberty': 'Delayed puberty'}
additional_historyMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=additional_history_d)
additional_historyMapper.preview_column(dft['Additional history'])
column_mapper_d['Additional history'] = additional_historyMapper

In [68]:
var_list = list(dft["NM_203447.3(var1)"].unique())
var_list.extend(list(dft["NM_203447.3(var2)"].unique()))

In [71]:
var_d = {}
dock8_transcript = "NM_203447.3"
vvalidator = VariantValidator(genome_build="hg38", transcript=dock8_transcript)
for v in var_list:
    print(v)
    if v.startswith("c."):
        var = vvalidator.encode_hgvs(v)
    else:
        var = StructuralVariant.chromosomal_deletion(cell_contents=v, 
                                                     gene_symbol="DOCK8",
                                                     gene_id="HGNC:19191"
                                                    )
    var.set_heterozygous()
    var_d[v] = var

c.1126-395_2971-2751del
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_203447.3%3Ac.1126-395_2971-2751del/NM_203447.3?content-type=application%2Fjson
NC_000009.12:g.311735_398140del
c.529-15T>G
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_203447.3%3Ac.529-15T>G/NM_203447.3?content-type=application%2Fjson
c.1141G>T
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_203447.3%3Ac.1141G>T/NM_203447.3?content-type=application%2Fjson
c.949C>T
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_203447.3%3Ac.949C>T/NM_203447.3?content-type=application%2Fjson
c.529-18C>G
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_203447.3%3Ac.529-18C>G/NM_203447.3?content-type=application%2Fjson
c.3290delC
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_203447.3%3Ac.3290delC/NM_203447.3?content-type=application%2Fjson


ValueError: Expecting to get a gene_variant from Variant Validator but got warning