<h1>WWOX: Gribaa (2007)</h1>
<p>We will process <a href="https://pubmed.ncbi.nlm.nih.gov/17470496/" target="__blank">Gribaa, et al. (2007) A new form of childhood onset, autosomal recessive spinocerebellar ataxia and epilepsy is localized at 16q21-q23</a></p>

In [1]:
import phenopackets as php
from google.protobuf.json_format import MessageToDict, MessageToJson
from google.protobuf.json_format import Parse, ParseDict
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from collections import defaultdict
import os
import sys
import numpy as np
import pyphetools
from pyphetools.creation import *
from pyphetools.visualization import *
print(f"pyphetools version {pyphetools.__version__}")

pyphetools version 0.6.5


<h2>Importing HPO data</h2>
<p>pyphetools uses the Human Phenotype Ontology (HPO) to encode phenotypic features. The recommended way of doing this is to ingest the hp.json file using HpoParser, which in turn creates an HpoConceptRecognizer object. </p>
<p>The HpoParser can accept a hpo_json_file argument if you want to use a specific file. If the argument is not passed, it will download the latext hp.json file from the HPO GitHub site and store it in a new subdirectory called hpo_data. It will not download the file if the file is already downloaded.</p>

In [2]:
parser = HpoParser()
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
PMID = "PMID:17470496"
title = "A new form of childhood onset, autosomal recessive spinocerebellar ataxia and epilepsy is localized at 16q21-q23"
metadata = MetaData(created_by="ORCID:0000-0002-5648-2155", pmid=PMID, pubmed_title=title)
metadata.default_versions_with_hpo(version=hpo_version)

<h2>Importing the supplemental table</h2>
<p>Here, we use the pandas library to import this file (note that the Python package called openpyxl must be installed to read Excel files with pandas, although the library does not need to be imported in this notebook). pyphetools expects a pandas DataFrame as input, and users can choose any input format available for pandas include CSV, TSV, and Excel, or can use any other method to transform their input data into a Pandas DataFrame before using pyphetools.</p>

In [3]:
df = pd.read_excel('input/PMID_17470496.xlsx')

In [4]:
df

Unnamed: 0,Patient,II,II2,II3,II4
0,Sex,female,female,male,female
1,Age,19,18,16,10
2,Seizures,+,+,+,+
3,Motor delay,+,+,+,+
4,Developmental delay,+,+,+,+
5,Ataxia,+,+,+,+
6,Gait ataxia,+,+,+,+
7,Dysarthria,+,+,+,+
8,Hyporeflexia,+,+,+,+
9,Impaired continence,-,-,-,+


In [5]:
# Transpose table
df = df.set_index('Patient').T.reset_index()
df['patient_id'] = df.index
df.head()

Patient,index,Sex,Age,Seizures,Motor delay,Developmental delay,Ataxia,Gait ataxia,Dysarthria,Hyporeflexia,Impaired continence,Nystagmus,Variant,patient_id
0,II,female,19,+,+,+,+,+,+,+,-,+,c.139C>A,0
1,II2,female,18,+,+,+,+,+,+,+,-,+,c.139C>A,1
2,II3,male,16,+,+,+,+,+,+,+,-,+,c.139C>A,2
3,II4,female,10,+,+,+,+,+,+,+,+,+,c.139C>A,3


In [6]:
generator = SimpleColumnMapperGenerator(df=df, observed='+', excluded='-', hpo_cr=hpo_cr)
column_mappers_d = generator.try_mapping_columns()

In [7]:
from IPython.display import display, HTML
display(HTML(generator.to_html()))

Result,Columns
Mapped,Seizures; Motor delay; Developmental delay; Ataxia; Gait ataxia; Dysarthria; Hyporeflexia; Impaired continence; Nystagmus
Unmapped,index; Sex; Age; Variant; patient_id


<h2>Variant Data</h2>
<p>The variant data (HGVS< transcript) is listed in the Variant (hg19, NM_015133.4) column.</p>

In [8]:
genome = 'hg38'
default_genotype = 'heterozygous'
WWOX_transcript='NM_016373.2'
vvalidator = VariantValidator(genome_build=genome, transcript=WWOX_transcript)
var = vvalidator.encode_hgvs("c.139C>A")
var_d = {"c.139C>A": var}
varMapper = VariantColumnMapper(variant_d=var_d, variant_column_name='Variant', default_genotype="homozygous")

https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_016373.2%3Ac.139C>A/NM_016373.2?content-type=application%2Fjson


<h1>Demographic data</h1>

In [9]:
ageMapper = AgeColumnMapper.by_year('Age')
ageMapper.preview_column(df['Age'])

Unnamed: 0,original column contents,age
0,19,P19Y
1,18,P18Y
2,16,P16Y
3,10,P10Y


In [10]:
sexMapper = SexColumnMapper(male_symbol='male', female_symbol='female', column_name='Sex')
sexMapper.preview_column(df['Sex'])

Unnamed: 0,original column contents,sex
0,female,FEMALE
1,female,FEMALE
2,male,MALE
3,female,FEMALE


In [11]:
encoder = CohortEncoder(df=df, hpo_cr=hpo_cr, 
                        column_mapper_d=column_mappers_d, 
                        individual_column_name="patient_id", 
                        agemapper=ageMapper, 
                        sexmapper=sexMapper,
                        variant_mapper=varMapper, 
                        metadata=metadata,
                        pmid=PMID)
encoder.set_disease(disease_id='OMIM:614322', label='Spinocerebellar ataxia, autosomal recessive 12')

In [12]:
individuals = encoder.get_individuals()

In [13]:
from IPython.display import HTML, display

phenopackets = [i.to_ga4gh_phenopacket(metadata=metadata.to_ga4gh()) for i in individuals]
table = PhenopacketTable(phenopacket_list=phenopackets)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
0 (FEMALE; P19Y),"Spinocerebellar ataxia, autosomal recessive 12 (OMIM:614322)",NM_016373.2:c.139C>A (homozygous),Seizure (HP:0001250); Motor delay (HP:0001270); Global developmental delay (HP:0001263); Ataxia (HP:0001251); Gait ataxia (HP:0002066); Dysarthria (HP:0001260); Hyporeflexia (HP:0001265); Nystagmus (HP:0000639)
1 (FEMALE; P18Y),"Spinocerebellar ataxia, autosomal recessive 12 (OMIM:614322)",NM_016373.2:c.139C>A (homozygous),Seizure (HP:0001250); Motor delay (HP:0001270); Global developmental delay (HP:0001263); Ataxia (HP:0001251); Gait ataxia (HP:0002066); Dysarthria (HP:0001260); Hyporeflexia (HP:0001265); Nystagmus (HP:0000639)
2 (MALE; P16Y),"Spinocerebellar ataxia, autosomal recessive 12 (OMIM:614322)",NM_016373.2:c.139C>A (homozygous),Seizure (HP:0001250); Motor delay (HP:0001270); Global developmental delay (HP:0001263); Ataxia (HP:0001251); Gait ataxia (HP:0002066); Dysarthria (HP:0001260); Hyporeflexia (HP:0001265); Nystagmus (HP:0000639)
3 (FEMALE; P10Y),"Spinocerebellar ataxia, autosomal recessive 12 (OMIM:614322)",NM_016373.2:c.139C>A (homozygous),Seizure (HP:0001250); Motor delay (HP:0001270); Global developmental delay (HP:0001263); Ataxia (HP:0001251); Gait ataxia (HP:0002066); Dysarthria (HP:0001260); Hyporeflexia (HP:0001265); Impaired continence (HP:0031064); Nystagmus (HP:0000639)


In [15]:
output_directory = "phenopackets"
Individual.output_individuals_as_phenopackets(individual_list=individuals,
                                              pmid=PMID,
                                              metadata=metadata.to_ga4gh(),
                                              outdir=output_directory)

We output 4 GA4GH phenopackets to the directory phenopackets
