<h1>WFS1: Strom et al. (1998)</h1>
<p>Data derived from <a href="https://pubmed.ncbi.nlm.nih.gov/9817917/" target="__blank">Strom, et al. (1998) Diabetes insipidus, diabetes mellitus, optic atrophy and deafness (DIDMOAD) caused by mutations in a novel gene (wolframin) coding for a predicted transmembrane protein</a></p>

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import display, HTML
from pyphetools.creation import *
from pyphetools.visualization import *
from pyphetools.validation import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.9.39


<h2>Importing HPO data</h2>

In [2]:
PMID = "PMID:9817917"
title = "Diabetes insipidus, diabetes mellitus, optic atrophy and deafness (DIDMOAD) caused by mutations in a novel gene (wolframin) coding for a predicted transmembrane protein"
cite = Citation(pmid=PMID, title=title)
parser = HpoParser(hpo_json_file="../hp.json")
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
metadata = MetaData(created_by="ORCID:0000-0002-5648-2155", citation=cite)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2024-01-16


<h2>Importing the supplemental table</h2>

In [3]:
df = pd.read_excel('input/PMID_9817917.xlsx')

In [4]:
df.head(2)

Unnamed: 0,Family,Patient,Sex,Age,Diabetes mellitus,Progressive optic atrophy,Hearing impairment,Diabetes insipidus,Abnormality of the kidney,Neurological abnormalities,Other complications,Consangui,Variant
0,1.0,5519,f,22.0,+,+,+,+,+,"Ataxia, nystagmus","Retarded sexual maturation, depression",-,1380del9
1,1.0,13883,f,11.0,+,-,-,-,+,-,-,-,1380del9


Some column names might include spaces in front or after, and a couple of columns are subheadings and only contain NaNs, so lets correct that. Furthermore, remove individuals without an age specified or a variant in this gene.

In [5]:
df.columns = df.columns.str.strip()
df = df.dropna(axis=1, how='all')
df['patient_id'] = df['Patient']
df = df[~df['Age'].isna()]
df = df[~df['Variant'].isna()]
df.head()

Unnamed: 0,Family,Patient,Sex,Age,Diabetes mellitus,Progressive optic atrophy,Hearing impairment,Diabetes insipidus,Abnormality of the kidney,Neurological abnormalities,Other complications,Consangui,Variant,patient_id
0,1.0,5519,f,22.0,+,+,+,+,+,"Ataxia, nystagmus","Retarded sexual maturation, depression",-,1380del9,5519
1,1.0,13883,f,11.0,+,-,-,-,+,-,-,-,1380del9,13883
2,2.0,13775,f,20.0,+,+,+,+,-,-,-,-,460+1G>A,13775
3,2.0,13776,m,17.0,+,+,+,+,+,-,Retarded sexual maturation,-,460+1G>A,13776
4,4.0,13070,f,22.0,+,+,+,-,+,Abnormal EEG,Psychiatric illness,-,599delT,13070


<h2>Column mappers</h2>

In [6]:
generator = SimpleColumnMapperGenerator(df=df, observed='+', excluded='-', hpo_cr=hpo_cr)
column_mapper_list = generator.try_mapping_columns()
display(HTML(generator.to_html()))

Result,Columns
Mapped,Diabetes mellitus; Hearing impairment; Diabetes insipidus; Abnormality of the kidney
Unmapped,Family; Patient; Sex; Age; Progressive optic atrophy; Neurological abnormalities; Other complications; Consangui; Variant; patient_id


In [7]:
neurological = {'Ataxia': 'Ataxia',
                 'nystagmus': 'Nystagmus',
               'Abnormal EEG': 'EEG abnormality'}
neurologicalMapper = OptionColumnMapper(column_name='Neurological abnormalities',
                                        concept_recognizer=hpo_cr, option_d=neurological)
column_mapper_list.append(neurologicalMapper)
neurologicalMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Ataxia (HP:0001251) (observed),3
1,Nystagmus (HP:0000639) (observed),2
2,EEG abnormality (HP:0002353) (observed),2


In [8]:
other = {'Retarded sexual maturation': 'Puberty and gonadal disorders',
                 'depression': 'Depression',
               'psychiatric illness': 'Atypical behavior',
                'cataract' : 'Cataract', 
         'mental retardation': 'Intellectual disability',
         'ragged red fibers': 'Ragged-red muscle fibers'
        }
otherMapper = OptionColumnMapper(column_name='Other complications',concept_recognizer=hpo_cr, option_d=other)
column_mapper_list.append(otherMapper)
otherMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Puberty and gonadal disorders (HP:0008373) (observed),4
1,Depression (HP:0000716) (observed),1
2,Atypical behavior (HP:0000708) (observed),2
3,Cataract (HP:0000518) (observed),2
4,Intellectual disability (HP:0001249) (observed),1
5,Ragged-red muscle fibers (HP:0003200) (observed),1


<h2>Variant Data</h2>
<p>The variant data (HGVS< transcript) is listed in the Variant (hg19, NM_015133.4) column.</p>

In [9]:
hg38 = 'hg38'
default_genotype = 'heterozygous'
WFS1_transcript='NM_006005.3'
vvalidator = VariantValidator(genome_build=hg38, transcript=WFS1_transcript)
variant_list = df['Variant'].unique()
print(variant_list)
variant_d = {}
for v in variant_list:
    if v == "1380del9":
        hgvs = "c.1385_1393del"
    else:
        hgvs = f"c.{v}"
    print(f"{v} - {hgvs}")
    var = vvalidator.encode_hgvs(hgvs)
    print(f"{v}: {var}")
    variant_d[v] = var
print(f"Extracted {len(variant_d)} unique variants")

['1380del9' ' 460+1G>A' ' 599delT' ' 1096C>T' '676C>T' '599delT' '1096C>T'
 '1558C>T']
1380del9 - c.1385_1393del
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_006005.3%3Ac.1385_1393del/NM_006005.3?content-type=application%2Fjson
1380del9: NM_006005.3:c.1385_1393del(chr4:6301174CCACCGAGGT>C)
 460+1G>A - c. 460+1G>A
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_006005.3%3Ac. 460+1G>A/NM_006005.3?content-type=application%2Fjson
 460+1G>A: NM_006005.3:c.460+1G>A(chr4:6289132G>A)
 599delT - c. 599delT
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_006005.3%3Ac. 599delT/NM_006005.3?content-type=application%2Fjson
 599delT: NM_006005.3:c.599del(chr4:6291334CT>C)
 1096C>T - c. 1096C>T
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_006005.3%3Ac. 1096C>T/NM_006005.3?content-type=application%2Fjson
 1096C>T: NM_006005.3:c.1096C>T(chr4:6300891C>T)
676C>T - c.676C>T
https://rest.var

In [10]:
varMapper = VariantColumnMapper(variant_d=variant_d,variant_column_name='Variant', default_genotype=default_genotype)

<h1>Demographic data</h1>

In [11]:
ageMapper = AgeColumnMapper.by_year('Age')
#ageMapper.preview_column(df['Age'])
sexMapper = SexColumnMapper(male_symbol='m', female_symbol='f', column_name='Sex')
#sexMapper.preview_column(df['Sex'])

In [13]:
encoder = CohortEncoder(df=df, 
                        hpo_cr=hpo_cr, 
                        column_mapper_list=column_mapper_list, 
                        individual_column_name="patient_id", 
                        agemapper=ageMapper, 
                        sexmapper=sexMapper,
                        variant_mapper=varMapper, 
                        metadata=metadata)
wolfram1 = Disease(disease_id='OMIM:222300', disease_label='Wolfram syndrome 1')
encoder.set_disease(wolfram1)

In [14]:
individuals = encoder.get_individuals()
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.MONO_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

In [15]:
individuals = cvalidator.get_error_free_individual_list()
table = PhenopacketTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
5519 (FEMALE; P22Y0M),Wolfram syndrome 1 (OMIM:222300),NM_006005.3:c.1385_1393del (heterozygous),Diabetes mellitus (HP:0000819); Hearing impairment (HP:0000365); Diabetes insipidus (HP:0000873); Abnormality of the kidney (HP:0000077); Ataxia (HP:0001251); Nystagmus (HP:0000639); Puberty and gonadal disorders (HP:0008373); Depression (HP:0000716)
13883 (FEMALE; P11Y0M),Wolfram syndrome 1 (OMIM:222300),NM_006005.3:c.1385_1393del (heterozygous),Diabetes mellitus (HP:0000819); Abnormality of the kidney (HP:0000077); excluded: Hearing impairment (HP:0000365); excluded: Diabetes insipidus (HP:0000873)
13775 (FEMALE; P20Y0M),Wolfram syndrome 1 (OMIM:222300),NM_006005.3:c.460+1G>A (heterozygous),Diabetes mellitus (HP:0000819); Hearing impairment (HP:0000365); Diabetes insipidus (HP:0000873); excluded: Abnormality of the kidney (HP:0000077)
13776 (MALE; P17Y0M),Wolfram syndrome 1 (OMIM:222300),NM_006005.3:c.460+1G>A (heterozygous),Diabetes mellitus (HP:0000819); Hearing impairment (HP:0000365); Diabetes insipidus (HP:0000873); Abnormality of the kidney (HP:0000077); Puberty and gonadal disorders (HP:0008373)
13070 (FEMALE; P22Y0M),Wolfram syndrome 1 (OMIM:222300),NM_006005.3:c.599del (heterozygous),Diabetes mellitus (HP:0000819); Hearing impairment (HP:0000365); Abnormality of the kidney (HP:0000077); EEG abnormality (HP:0002353); Atypical behavior (HP:0000708); excluded: Diabetes insipidus (HP:0000873)
13885 (FEMALE; P35Y0M),Wolfram syndrome 1 (OMIM:222300),NM_006005.3:c.1096C>T (heterozygous),Diabetes mellitus (HP:0000819); Hearing impairment (HP:0000365); Abnormality of the kidney (HP:0000077); Cataract (HP:0000518); excluded: Diabetes insipidus (HP:0000873)
13062 (FEMALE; P25Y0M),Wolfram syndrome 1 (OMIM:222300),NM_006005.3:c.676C>T (heterozygous),Diabetes mellitus (HP:0000819); Hearing impairment (HP:0000365); Diabetes insipidus (HP:0000873); Abnormality of the kidney (HP:0000077); Ataxia (HP:0001251); Nystagmus (HP:0000639)
13076 (MALE; P26Y0M),Wolfram syndrome 1 (OMIM:222300),NM_006005.3:c.599del (heterozygous),Diabetes mellitus (HP:0000819); Hearing impairment (HP:0000365); Diabetes insipidus (HP:0000873); Puberty and gonadal disorders (HP:0008373); Intellectual disability (HP:0001249); excluded: Abnormality of the kidney (HP:0000077)
13073 (FEMALE; P35Y0M),Wolfram syndrome 1 (OMIM:222300),NM_006005.3:c.1096C>T (heterozygous),Diabetes mellitus (HP:0000819); Hearing impairment (HP:0000365); Abnormality of the kidney (HP:0000077); Ataxia (HP:0001251); Cataract (HP:0000518); Atypical behavior (HP:0000708); Ragged-red muscle fibers (HP:0003200); excluded: Diabetes insipidus (HP:0000873)
13781 (MALE; P19Y0M),Wolfram syndrome 1 (OMIM:222300),NM_006005.3:c.1558C>T (heterozygous),Diabetes mellitus (HP:0000819); Hearing impairment (HP:0000365); Diabetes insipidus (HP:0000873); Abnormality of the kidney (HP:0000077); EEG abnormality (HP:0002353); Puberty and gonadal disorders (HP:0008373)


In [16]:
output_directory = "phenopackets"
Individual.output_individuals_as_phenopackets(individual_list=individuals,
                                             metadata=metadata,
                                             outdir=output_directory)

We output 10 GA4GH phenopackets to the directory phenopackets
