<H1>SLC45A2: Oculo-Cutaneous Albinism Type 4 (OCA4) - Moreno-Artero et al., 2022</H1>
<p>Extract clinical data from <a href="https://pubmed.ncbi.nlm.nih.gov/36553465/" target="__blank">
Moreno-Artero E, et al. (2022). Oculo-Cutaneous Albinism Type 4 (OCA4): Phenotype-Genotype Correlation. Genes (Basel). 2022 Nov 23;13(12):2198</a>:  PMID:36553465.</p>
<p>The authors classify patients 1-20 as group 1 and patients 21-30 as group 2. The describe the following genotype-phenotype correlation: The first, found in 20 patients, is clinically indistinguishable from the classical OCA1 phenotype. The genotype-to-phenotype correlation suggests that <b>this phenotype is associated with homozygous or compound heterozygous nonsense or deletion variants with frameshift</b> leading to translation interruption in the SLC45A2 gene. The second phenotype, found in 10 patients, is characterized by very mild hypopigmentation of the hair (light brown or even dark hair) and skin that is similar to the general population. In this group, visual acuity is variable, but it can be subnormal, foveal hypoplasia can be low grade or even normal, and nystagmus may be lacking. These <b>mild to moderate phenotypes are associated with at least one missense mutation in SLC45A2</b>.</p>

In [1]:
import phenopackets as php
from google.protobuf.json_format import MessageToDict, MessageToJson
from google.protobuf.json_format import Parse, ParseDict
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from collections import defaultdict
import os
import sys
import numpy as np
from IPython.display import display, HTML, JSON
from pyphetools.creation import *
import pyphetools
from pyphetools.visualization import *
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.6.3


In [2]:
parser = HpoParser()
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
PMID = "PMID:36553465"
title = "Oculo-Cutaneous Albinism Type 4 (OCA4): Phenotype-Genotype Correlation"
metadata = MetaData(created_by="ORCID:0000-0002-5648-2155", pmid=PMID, pubmed_title=title)
metadata.default_versions_with_hpo(version=hpo_version)

<h3>Ingest the data</h3>
<p>The clinical and variant data were copied from Table 1 of the publication. For ease of parsing, we manually split the Gender,Age column into two columns.</p>

In [3]:
df = pd.read_excel('input/Moreno-Artero2022_table1.xlsx')

In [4]:
df.head()

Unnamed: 0,Patients,Gender,Age (Years),Genetic Background,Consanguinity,Nevi,Eyes,Hair,Eyebrows,Eyelashes,Nystagmus,Strabismus,VA,Refraction,ITI,MT,FHP,Variant 1 (SLC45A2 NM_016180.5),Variant 2 (SLC45A2 NM_016180.5)
0,P1,M,20,Morocco,Yes,"Present, amelanotic",Blue,White,White,White,Yes,"Yes, esotropia",1.6/10 RE; 2/10 LE,Hypermetropia astigmatism,,,Grade IV,NM_016180.5(SLC45A2):c.267_271del\nChr5(GRCh37):g.33984422_33984426del\np.(Ser90Glnfs*42),NM_016180.5(SLC45A2):c.267_271del\nChr5(GRCh37):g.33984422_33984426del\np.(Ser90Glnfs*42)
1,P2,F,7,Morocco,Yes,"Present, pigmented",Blue,White blond,White,White,Yes,"Yes, left exotropia",1/20 RE; 1/20 LE,Hypermetropia Astigmatism,Grade IV,Grade II,Grade IV,NM_016180.5(SLC45A2):c.1028_1029del\nChr5(GRCh37):g.33954469_33954470del\np.(Gly343Alafs*10),NM_016180.5(SLC45A2):c.1028_1029del\nChr5(GRCh37):g.33954469_33954470del\np.(Gly343Alafs*10)
2,P3,M,7,Morocco,Yes,"Present, pigmented",Blue,White blond,White,White,Yes,No,,Hypermetropia\nAstigmatism,Grade III,Grade II,Grade IV,NM_016180.5(SLC45A2):c.1028_1029del\nChr5(GRCh37):g.33954469_33954470del\np.(Gly343Alafs*10),NM_016180.5(SLC45A2):c.1028_1029del\nChr5(GRCh37):g.33954469_33954470del\np.(Gly343Alafs*10)
3,P4,F,49,France,No,Absent,Blue grey,White,White,White,Yes,No,2/10 RE; 2/10 LE,,Grade IV,,Grade IV,NM_016180.5(SLC45A2):c.273del\nChr5(GRCh37):g.33984417del\np.(Ser92Alafs*21),NM_016180.5(SLC45A2):c.1068C>G\nChr5(GRCh37):g.33951747G>C\np.(Asn356Lys) (N356K)
4,P5,F,63,France,No,"Present, amelanotic",Blue,White,White,White,Yes,No,2/10 RE; 3/10 LE,,Grade III,,Grade III,NM_016180.5(SLC45A2):c.986del\nChr5(GRCh37):g.33954512del\np.(Thr329Lysfs*69),NM_016180.5(SLC45A2):c.1036G>T\nChr5(GRCh37):g.33951779C>A\np.(Val346Leu) (V346L)


In [5]:
column_mapper_d = defaultdict(ColumnMapper)
nystagmusMapper = SimpleColumnMapper(hpo_id="HP:0000639", hpo_label="Nystagmus",observed='Yes',excluded='No')
nystagmusMapper.preview_column(df["Nystagmus"])
column_mapper_d["Nystagmus"] = nystagmusMapper

In [6]:
# This was used to conveniently generate OptionColumnMapper code, but is not longer needed.
#result = OptionColumnMapper.autoformat(df, hpo_cr)
#print(result)

In [7]:
nevi_d = {'Present': 'Nevus',
 'amelanotic': 'Nevus',  ## TODO needs new HPO term
 'pigmented': 'Melanocytic nevus',
}
excluded = {"Absent": "Nevus"}
neviMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=nevi_d, excluded_d=excluded)
neviMapper.preview_column(df['Nevi'])
column_mapper_d['Nevi'] = neviMapper

In [8]:
eyes_d = {'Blue': 'Iris hypopigmentation',
 'Blue grey': 'Iris hypopigmentation',}
excluded = {"Brown": "Iris hypopigmentation"}
eyesMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=eyes_d,excluded_d=excluded)
eyesMapper.preview_column(df['Eyes'])
column_mapper_d['Eyes'] = eyesMapper

In [9]:
hair_d = {'White': 'Hypopigmentation of hair',
 'White blond': 'Hypopigmentation of hair',
 'Blond': 'Hypopigmentation of hair',
 'Dark blond': 'Hypopigmentation of hair',
 'Red blond': 'Hypopigmentation of hair'}
hairMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=hair_d)
hairMapper.preview_column(df['Hair'])
column_mapper_d['Hair'] = hairMapper

In [10]:
eyebrows_d = {'White': 'White eyebrow',
 'Blond': 'White eyebrow',
 'White + Blond': 'White eyebrow'}
excluded = {"Brown": "White eyebrow"}
eyebrowsMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=eyebrows_d,  excluded_d=excluded)
eyebrowsMapper.preview_column(df['Eyebrows'])
column_mapper_d['Eyebrows'] = eyebrowsMapper

In [11]:
eyelashes_d = {'White': 'White eyelashes',
 'Blond': 'White eyelashes',
 'White + Blond': 'White eyelashes'}
eyelashesMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=eyelashes_d)
eyelashesMapper.preview_column(df['Eyelashes'])
column_mapper_d['Eyelashes'] = eyelashesMapper

In [12]:
strabismus_d = {'Yes': 'Strabismus',
 'esotropia': 'Esotropia',
 'left exotropia': 'Exotropia',
 'No': 'PLACEHOLDER',
 'exotropia': 'Exotropia',
 'Yes microexotropia': 'Exotropia'}
excluded = {"No": 'Strabismus'}
strabismusMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=strabismus_d, excluded_d=excluded)
strabismusMapper.preview_column(df['Strabismus'])
column_mapper_d['Strabismus'] = strabismusMapper

<h2>reduced visual acuity</h2>
<p>For reduced visual acuity, the representation of the features uses slash and semicolon, which pyphetools interprets as being
delimiters. For this reason, we only use the numerator (the denominator is always ten) for the abnormal findings. We do not distinguish between left and right here.</p>

In [13]:
va_d = {'1.6': 'Reduced visual acuity',
        '2': 'Reduced visual acuity',
         '1': 'Reduced visual acuity',
        '3': 'Reduced visual acuity',
         '5': 'Reduced visual acuity',
         '7': 'Reduced visual acuity',
         '1.2': 'Reduced visual acuity',
         '1.4': 'Reduced visual acuity'}
vaMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=va_d)
vaMapper.preview_column(df['VA'])
column_mapper_d['VA'] = vaMapper

In [14]:
refraction_d = {'Hypermetropia astigmatism': 'Hypermetropia',
 'Hypermetropia Astigmatism': 'Astigmatism',
 'Hypermetropia\nAstigmatism': 'Astigmatism',
 'Hypermetropia': 'Hypermetropia',
 'HypermetropiaAstigmatism': 'Hypermetropia',
 'Myopia Astigmatism': 'Myopia'}
refractionMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=refraction_d)
refractionMapper.preview_column(df['Refraction'])
column_mapper_d['Refraction'] = refractionMapper

In [15]:
iti_d = {
 'Grade IV': 'Iris transillumination defect',
 'Grade III': 'Iris transillumination defect',
 'Grade II': 'Iris transillumination defect',
 'Grade I': 'Iris transillumination defect'}
excluded = {"No": "Iris transillumination defect"}
itiMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=iti_d, excluded_d=excluded)
itiMapper.preview_column(df['ITI'])
column_mapper_d['ITI'] = itiMapper

In [16]:
mt_d = {'nan': 'PLACEHOLDER',
 'Grade II': 'PLACEHOLDER',
 'Grade III': 'PLACEHOLDER',
 'Grade I': 'PLACEHOLDER'}
#mtMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=mt_d)
#mtMapper.preview_column(df['MT']))
#column_mapper_d['MT'] = mtMapper
# Macular transparency -- need HPO term

In [17]:
fhp_d = {'Grade IV': 'Hypoplasia of the fovea',
 'Grade III': 'Hypoplasia of the fovea',
 'Grade II': 'Hypoplasia of the fovea',
 'Grade I': 'Hypoplasia of the fovea'}
fhpMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=fhp_d)
fhpMapper.preview_column(df['FHP'])
column_mapper_d['FHP'] = fhpMapper

<h2>Variants</h2>
<p>The original table describes variants like this: <tt>NM_016180.5(SLC45A2):c.267_271del\nChr5(GRCh37):g.33984422_33984426del\np.(Ser90Glnfs*42)</tt>.
    The following code extracts the transcript variant - c.267_271del in this example.</p>

In [18]:
def extract_var(cell_contents):
    if not cell_contents.startswith("NM_016180.5(SLC45A2):"):
        return cell_contents
    cell_contents = cell_contents[21:] # remove the above string
    if '\n' in cell_contents:
        return cell_contents.split('\n')[0]
    else:
        return cell_contents

In [19]:
df["var1"] = df["Variant 1 (SLC45A2 NM_016180.5)"].transform(lambda x: extract_var(x))

In [20]:
df["var2"] = df["Variant 2 (SLC45A2 NM_016180.5)"].transform(lambda x: extract_var(x))

In [21]:
all_variant_set = set(df["var1"]).union(df["var2"])
validator = VariantValidator(genome_build='hg38', transcript="NM_016180.5")
validated_var_d = defaultdict()

In [22]:
for var in all_variant_set:
    if var == 'Deletion exons 1-4':
        sv = StructuralVariant.chromosomal_deletion(cell_contents='Deletion exons 1-4',
                 gene_symbol="SLC45A2",
                 gene_id="HGNC:16472")
        validated_var_d[var] = sv
    else:
        var_object = validator.encode_hgvs(hgvs=var)
        validated_var_d[var] = var_object
print(f"We got {len(validated_var_d)} variant objects")

https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_016180.5%3Ac.1166_1167del/NM_016180.5?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_016180.5%3Ac.1273del/NM_016180.5?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_016180.5%3Ac.273del/NM_016180.5?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_016180.5%3Ac.258del/NM_016180.5?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_016180.5%3Ac.1532C>T/NM_016180.5?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_016180.5%3Ac.1036G>T/NM_016180.5?content-type=application%2Fjson
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_016180.5%3Ac.1506del/NM_016180.5?content-type=application%2Fjson
https://re

In [23]:
ageMapper = AgeColumnMapper.by_year('Age (Years)')
#ageMapper.preview_column(df['Age (Years)'])
sexMapper = SexColumnMapper(male_symbol='M', female_symbol='F', column_name='Gender')
#sexMapper.preview_column(df['Gender'])

In [24]:
encoder = CohortEncoder(df=df, 
                        hpo_cr=hpo_cr, 
                        column_mapper_d=column_mapper_d, 
                        individual_column_name="Patients", 
                        agemapper=ageMapper, 
                        sexmapper=sexMapper,
                        metadata=metadata,
                        pmid=PMID)
encoder.set_disease(disease_id='OMIM:606574', label='Albinism, oculocutaneous, type IV')

In [25]:
individuals = encoder.get_individuals()

In [26]:
for i in individuals:
    rows = df.loc[df['Patients'] == i.id]
    if len(rows) != 1:
        raise ValueError(f"Got {len(rows)} rows but expected only 1")
    var1 = rows.iloc[0]['var1']
    var2 = rows.iloc[0]['var2']
    if var1 == var2:
        # homozygous
        var_object = validated_var_d.get(var1)
        var_object.set_homozygous()
        i.add_variant(var_object)
    else:
        var1_object  = validated_var_d.get(var1) 
        var2_object  = validated_var_d.get(var2)
        var1_object.set_heterozygous()
        var2_object.set_heterozygous()
        i.add_variant(var1_object)
        i.add_variant(var2_object)

In [27]:
i1 = individuals[-1]
phenopacket1 = i1.to_ga4gh_phenopacket(metadata=metadata.to_ga4gh())
json_string = MessageToJson(phenopacket1)
print(json_string)

{
  "id": "PMID_36553465_P30",
  "subject": {
    "id": "P30",
    "timeAtLastEncounter": {
      "age": {
        "iso8601duration": "P30Y"
      }
    },
    "sex": "MALE"
  },
  "phenotypicFeatures": [
    {
      "type": {
        "id": "HP:0000639",
        "label": "Nystagmus"
      }
    },
    {
      "type": {
        "id": "HP:0000995",
        "label": "Melanocytic nevus"
      }
    },
    {
      "type": {
        "id": "HP:0003764",
        "label": "Nevus"
      }
    },
    {
      "type": {
        "id": "HP:0007730",
        "label": "Iris hypopigmentation"
      }
    },
    {
      "type": {
        "id": "HP:0005599",
        "label": "Hypopigmentation of hair"
      }
    },
    {
      "type": {
        "id": "HP:0002226",
        "label": "White eyebrow"
      }
    },
    {
      "type": {
        "id": "HP:0002227",
        "label": "White eyelashes"
      }
    },
    {
      "type": {
        "id": "HP:0000486",
        "label": "Strabismus"
      },
      "

In [28]:
ppacket_list = [i.to_ga4gh_phenopacket(metadata=metadata.to_ga4gh()) for i in individuals]
table = PhenopacketTable(phenopacket_list=ppacket_list)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
P1 (MALE; P20Y),"Albinism, oculocutaneous, type IV (OMIM:606574)",NM_016180.5:c.267_271del (homozygous),Nystagmus (HP:0000639); Nevus (HP:0003764); Iris hypopigmentation (HP:0007730); Hypopigmentation of hair (HP:0005599); White eyebrow (HP:0002226); White eyelashes (HP:0002227); Esotropia (HP:0000565); Strabismus (HP:0000486); Reduced visual acuity (HP:0007663); Hypermetropia (HP:0000540); Hypoplasia of the fovea (HP:0007750)
P2 (FEMALE; P7Y),"Albinism, oculocutaneous, type IV (OMIM:606574)",NM_016180.5:c.1028_1029del (homozygous),Nystagmus (HP:0000639); Melanocytic nevus (HP:0000995); Nevus (HP:0003764); Iris hypopigmentation (HP:0007730); Hypopigmentation of hair (HP:0005599); White eyebrow (HP:0002226); White eyelashes (HP:0002227); Exotropia (HP:0000577); Strabismus (HP:0000486); Reduced visual acuity (HP:0007663); Astigmatism (HP:0000483); Hypermetropia (HP:0000540); Iris transillumination defect (HP:0012805); Hypoplasia of the fovea (HP:0007750)
P3 (MALE; P7Y),"Albinism, oculocutaneous, type IV (OMIM:606574)",NM_016180.5:c.1028_1029del (homozygous),Nystagmus (HP:0000639); Melanocytic nevus (HP:0000995); Nevus (HP:0003764); Iris hypopigmentation (HP:0007730); Hypopigmentation of hair (HP:0005599); White eyebrow (HP:0002226); White eyelashes (HP:0002227); Astigmatism (HP:0000483); Hypermetropia (HP:0000540); Iris transillumination defect (HP:0012805); Hypoplasia of the fovea (HP:0007750)
P4 (FEMALE; P49Y),"Albinism, oculocutaneous, type IV (OMIM:606574)",NM_016180.5:c.273del (heterozygous) NM_016180.5:c.1068C>G (heterozygous),Nystagmus (HP:0000639); Iris hypopigmentation (HP:0007730); Hypopigmentation of hair (HP:0005599); White eyebrow (HP:0002226); White eyelashes (HP:0002227); Reduced visual acuity (HP:0007663); Iris transillumination defect (HP:0012805); Hypoplasia of the fovea (HP:0007750)
P5 (FEMALE; P63Y),"Albinism, oculocutaneous, type IV (OMIM:606574)",NM_016180.5:c.986del (heterozygous) NM_016180.5:c.1036G>T (heterozygous),Nystagmus (HP:0000639); Nevus (HP:0003764); Iris hypopigmentation (HP:0007730); Hypopigmentation of hair (HP:0005599); White eyebrow (HP:0002226); White eyelashes (HP:0002227); Reduced visual acuity (HP:0007663); Iris transillumination defect (HP:0012805); Hypoplasia of the fovea (HP:0007750)
P6 (MALE; P18Y),"Albinism, oculocutaneous, type IV (OMIM:606574)",NM_016180.5:c.986del (heterozygous) NM_016180.5:c.1471G>A (heterozygous),Nystagmus (HP:0000639); Melanocytic nevus (HP:0000995); Nevus (HP:0003764); Iris hypopigmentation (HP:0007730); Hypopigmentation of hair (HP:0005599); White eyebrow (HP:0002226); White eyelashes (HP:0002227); Reduced visual acuity (HP:0007663); Iris transillumination defect (HP:0012805); Hypoplasia of the fovea (HP:0007750)
P7 (MALE; P9Y),"Albinism, oculocutaneous, type IV (OMIM:606574)",NM_016180.5:c.986del (heterozygous) NM_016180.5:c.1471G>A (heterozygous),Nystagmus (HP:0000639); Iris hypopigmentation (HP:0007730); Hypopigmentation of hair (HP:0005599); White eyebrow (HP:0002226); White eyelashes (HP:0002227); Reduced visual acuity (HP:0007663); Iris transillumination defect (HP:0012805); Hypoplasia of the fovea (HP:0007750)
P8 (MALE; P7Y),"Albinism, oculocutaneous, type IV (OMIM:606574)",NM_016180.5:c.986del (heterozygous) NM_016180.5:c.1036G>T (heterozygous),Nystagmus (HP:0000639); Iris hypopigmentation (HP:0007730); Hypopigmentation of hair (HP:0005599); White eyebrow (HP:0002226); White eyelashes (HP:0002227); Esotropia (HP:0000565); Strabismus (HP:0000486); Reduced visual acuity (HP:0007663); Iris transillumination defect (HP:0012805); Hypoplasia of the fovea (HP:0007750)
P9 (MALE; P16Y),"Albinism, oculocutaneous, type IV (OMIM:606574)",NM_016180.5:c.986del (homozygous),Nystagmus (HP:0000639); Melanocytic nevus (HP:0000995); Nevus (HP:0003764); Iris hypopigmentation (HP:0007730); Hypopigmentation of hair (HP:0005599); White eyebrow (HP:0002226); White eyelashes (HP:0002227); Reduced visual acuity (HP:0007663); Iris transillumination defect (HP:0012805); Hypoplasia of the fovea (HP:0007750)
P10 (FEMALE; P52Y),"Albinism, oculocutaneous, type IV (OMIM:606574)",NM_016180.5:c.986del (heterozygous) NM_016180.5:c.1166_1167del (heterozygous),Nystagmus (HP:0000639); Iris hypopigmentation (HP:0007730); Hypopigmentation of hair (HP:0005599); White eyebrow (HP:0002226); White eyelashes (HP:0002227); Esotropia (HP:0000565); Strabismus (HP:0000486); Reduced visual acuity (HP:0007663); Iris transillumination defect (HP:0012805); Hypoplasia of the fovea (HP:0007750)


In [29]:
output_dir = "phenopackets"
Individual.output_individuals_as_phenopackets(individual_list=individuals,
                                             metadata=metadata.to_ga4gh(),
                                             pmid=PMID,
                                             outdir=output_dir)

We output 30 GA4GH phenopackets to the directory phenopackets
