<H1>SLC45A2: Oculo-Cutaneous Albinism Type 4 (OCA4) - Moreno-Artero et al., 2022</H1>
<p>Extract clinical data from <a href="https://pubmed.ncbi.nlm.nih.gov/36553465/" target="__blank">
Moreno-Artero E, et al. (2022). Oculo-Cutaneous Albinism Type 4 (OCA4): Phenotype-Genotype Correlation. Genes (Basel). 2022 Nov 23;13(12):2198</a>:  PMID:36553465.</p>
<p>The authors classify patients 1-20 as group 1 and patients 21-30 as group 2. The describe the following genotype-phenotype correlation: The first, found in 20 patients, is clinically indistinguishable from the classical OCA1 phenotype. The genotype-to-phenotype correlation suggests that <b>this phenotype is associated with homozygous or compound heterozygous nonsense or deletion variants with frameshift</b> leading to translation interruption in the SLC45A2 gene. The second phenotype, found in 10 patients, is characterized by very mild hypopigmentation of the hair (light brown or even dark hair) and skin that is similar to the general population. In this group, visual acuity is variable, but it can be subnormal, foveal hypoplasia can be low grade or even normal, and nystagmus may be lacking. These <b>mild to moderate phenotypes are associated with at least one missense mutation in SLC45A2</b>.</p>

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import display, HTML
from pyphetools.creation import *
from pyphetools.visualization import *
from pyphetools.validation import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.9.66


In [2]:
PMID = "PMID:36553465"
title = "Oculo-Cutaneous Albinism Type 4 (OCA4): Phenotype-Genotype Correlation"
cite = Citation(pmid=PMID, title=title)
parser = HpoParser(hpo_json_file="../hp.json")
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
metadata = MetaData(created_by="ORCID:0000-0002-5648-2155", citation=cite)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2024-03-06


<h3>Ingest the data</h3>
<p>The clinical and variant data were copied from Table 1 of the publication. For ease of parsing, we manually split the Gender,Age column into two columns.</p>

In [3]:
df = pd.read_excel('input/Moreno-Artero2022_table1.xlsx')

In [4]:
df.head(2)

Unnamed: 0,Patients,Gender,Age (Years),Genetic Background,Consanguinity,Nevi,Eyes,Hair,Eyebrows,Eyelashes,Nystagmus,Strabismus,VA,Refraction,ITI,MT,FHP,Variant 1 (SLC45A2 NM_016180.5),Variant 2 (SLC45A2 NM_016180.5)
0,P1,M,20,Morocco,Yes,"Present, amelanotic",Blue,White,White,White,Yes,"Yes, esotropia",1.6/10 RE; 2/10 LE,Hypermetropia astigmatism,,,Grade IV,NM_016180.5(SLC45A2):c.267_271del\nChr5(GRCh37):g.33984422_33984426del\np.(Ser90Glnfs*42),NM_016180.5(SLC45A2):c.267_271del\nChr5(GRCh37):g.33984422_33984426del\np.(Ser90Glnfs*42)
1,P2,F,7,Morocco,Yes,"Present, pigmented",Blue,White blond,White,White,Yes,"Yes, left exotropia",1/20 RE; 1/20 LE,Hypermetropia Astigmatism,Grade IV,Grade II,Grade IV,NM_016180.5(SLC45A2):c.1028_1029del\nChr5(GRCh37):g.33954469_33954470del\np.(Gly343Alafs*10),NM_016180.5(SLC45A2):c.1028_1029del\nChr5(GRCh37):g.33954469_33954470del\np.(Gly343Alafs*10)


In [5]:
column_mapper_list = list()
nystagmusMapper = SimpleColumnMapper(column_name="Nystagmus",hpo_id="HP:0000639", hpo_label="Nystagmus",observed='Yes',excluded='No')
column_mapper_list.append(nystagmusMapper)
nystagmusMapper.preview_column(df)

Unnamed: 0,mapping,count
0,"original value: ""Yes"" -> HP: Nystagmus (HP:0000639) (observed)",28
1,"original value: ""No"" -> HP: Nystagmus (HP:0000639) (excluded)",2


In [6]:
# This was used to conveniently generate OptionColumnMapper code, but is not longer needed.
#result = OptionColumnMapper.autoformat(df, hpo_cr)
#print(result)

In [7]:
nevi_d = {'Present': 'Nevus',
 'amelanotic': 'Nevus',  ## TODO needs new HPO term
 'pigmented': 'Melanocytic nevus',
}
excluded = {"Absent": "Nevus"}
neviMapper = OptionColumnMapper(column_name='Nevi',concept_recognizer=hpo_cr, option_d=nevi_d, excluded_d=excluded)
column_mapper_list.append(neviMapper)
neviMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Nevus (HP:0003764) (observed),32
1,Melanocytic nevus (HP:0000995) (observed),12
2,Nevus (HP:0003764) (excluded),8


In [8]:
eyes_d = {'Blue': 'Iris hypopigmentation',
 'Blue grey': 'Iris hypopigmentation',}
excluded = {"Brown": "Iris hypopigmentation"}
eyesMapper = OptionColumnMapper(column_name='Eyes',concept_recognizer=hpo_cr, option_d=eyes_d,excluded_d=excluded)
column_mapper_list.append(eyesMapper)
eyesMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Iris hypopigmentation (HP:0007730) (observed),28
1,Iris hypopigmentation (HP:0007730) (excluded),2


In [9]:
hair_d = {'White': 'Hypopigmentation of hair',
 'White blond': 'Hypopigmentation of hair',
 'Blond': 'Hypopigmentation of hair',
 'Dark blond': 'Hypopigmentation of hair',
 'Red blond': 'Hypopigmentation of hair'}
hairMapper = OptionColumnMapper(column_name='Hair',concept_recognizer=hpo_cr, option_d=hair_d)
column_mapper_list.append(hairMapper)
hairMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Hypopigmentation of hair (HP:0005599) (observed),30


In [10]:
eyebrows_d = {'White': 'White eyebrow',
 'Blond': 'White eyebrow',
 'White + Blond': 'White eyebrow'}
excluded = {"Brown": "White eyebrow"}
eyebrowsMapper = OptionColumnMapper(column_name='Eyebrows',concept_recognizer=hpo_cr, option_d=eyebrows_d,  excluded_d=excluded)
column_mapper_list.append(eyebrowsMapper)
eyebrowsMapper.preview_column(df)

Unnamed: 0,mapping,count
0,White eyebrow (HP:0002226) (observed),29
1,White eyebrow (HP:0002226) (excluded),1


In [11]:
eyelashes_d = {'White': 'White eyelashes',
 'Blond': 'White eyelashes',
 'White + Blond': 'White eyelashes'}
eyelashesMapper = OptionColumnMapper(column_name='Eyelashes',concept_recognizer=hpo_cr, option_d=eyelashes_d)
column_mapper_list.append(eyelashesMapper)
eyelashesMapper.preview_column(df)

Unnamed: 0,mapping,count
0,White eyelashes (HP:0002227) (observed),29


In [12]:
strabismus_d = {'Yes': 'Strabismus',
 'esotropia': 'Esotropia',
 'left exotropia': 'Exotropia',
 'exotropia': 'Exotropia',
 'Yes microexotropia': 'Exotropia'}
excluded = {"No": 'Strabismus'}
strabismusMapper = OptionColumnMapper(column_name='Strabismus',concept_recognizer=hpo_cr, option_d=strabismus_d, excluded_d=excluded)
column_mapper_list.append(strabismusMapper)
strabismusMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Strabismus (HP:0000486) (observed),12
1,Esotropia (HP:0000565) (observed),9
2,Exotropia (HP:0000577) (observed),4
3,Strabismus (HP:0000486) (excluded),17


<h2>reduced visual acuity</h2>
<p>For reduced visual acuity, the representation of the features uses slash and semicolon, which pyphetools interprets as being
delimiters. For this reason, we only use the numerator (the denominator is always ten) for the abnormal findings. We do not distinguish between left and right here.</p>

In [13]:
va_d = {'1.6': 'Reduced visual acuity',
        '2': 'Reduced visual acuity',
         '1': 'Reduced visual acuity',
        '3': 'Reduced visual acuity',
         '5': 'Reduced visual acuity',
         '7': 'Reduced visual acuity',
         '1.2': 'Reduced visual acuity',
         '1.4': 'Reduced visual acuity'}
vaMapper = OptionColumnMapper(column_name='VA',concept_recognizer=hpo_cr, option_d=va_d)
column_mapper_list.append(vaMapper)
vaMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Reduced visual acuity (HP:0007663) (observed),104


In [14]:
refraction_d = {'Hypermetropia astigmatism': 'Hypermetropia',
 'Hypermetropia Astigmatism': 'Astigmatism',
 'Hypermetropia\nAstigmatism': 'Astigmatism',
 'Hypermetropia': 'Hypermetropia',
 'HypermetropiaAstigmatism': 'Hypermetropia',
 'Myopia Astigmatism': 'Myopia'}
refractionMapper = OptionColumnMapper(column_name='Refraction',concept_recognizer=hpo_cr, option_d=refraction_d)
column_mapper_list.append(refractionMapper)
refractionMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Hypermetropia (HP:0000540) (observed),5
1,Astigmatism (HP:0000483) (observed),9
2,Myopia (HP:0000545) (observed),1


In [15]:
iti_d = {
 'Grade IV': 'Iris transillumination defect',
 'Grade III': 'Iris transillumination defect',
 'Grade II': 'Iris transillumination defect',
 'Grade I': 'Iris transillumination defect'}
excluded = {"No": "Iris transillumination defect"}
itiMapper = OptionColumnMapper(column_name='ITI',concept_recognizer=hpo_cr, option_d=iti_d, excluded_d=excluded)
column_mapper_list.append(itiMapper)
itiMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Iris transillumination defect (HP:0012805) (observed),24
1,Iris transillumination defect (HP:0012805) (excluded),4


In [16]:
mt_d = {'nan': 'PLACEHOLDER',
 'Grade II': 'PLACEHOLDER',
 'Grade III': 'PLACEHOLDER',
 'Grade I': 'PLACEHOLDER'}
#mtMapper = OptionColumnMapper(colname='MT',concept_recognizer=hpo_cr, option_d=mt_d)
#mtMapper.preview_column(df))
#column_mapper_list.append(mtMapper)
# Macular transparency -- need HPO term

In [17]:
fhp_d = {'Grade IV': 'Hypoplasia of the fovea',
 'Grade III': 'Hypoplasia of the fovea',
 'Grade II': 'Hypoplasia of the fovea',
 'Grade I': 'Hypoplasia of the fovea'}
fhpMapper = OptionColumnMapper(column_name='FHP',concept_recognizer=hpo_cr, option_d=fhp_d)
column_mapper_list.append(fhpMapper)
fhpMapper.preview_column(df)

Unnamed: 0,mapping,count
0,Hypoplasia of the fovea (HP:0007750) (observed),26


<h2>Variants</h2>
<p>The original table describes variants like this: <tt>NM_016180.5(SLC45A2):c.267_271del\nChr5(GRCh37):g.33984422_33984426del\np.(Ser90Glnfs*42)</tt>.
    The following code extracts the transcript variant - c.267_271del in this example.</p>

In [18]:
def extract_var(cell_contents):
    if not cell_contents.startswith("NM_016180.5(SLC45A2):"):
        return cell_contents
    cell_contents = cell_contents[21:] # remove the above string
    if '\n' in cell_contents:
        return cell_contents.split('\n')[0]
    else:
        return cell_contents

In [19]:
df["var1"] = df["Variant 1 (SLC45A2 NM_016180.5)"].transform(lambda x: extract_var(x))
df["var2"] = df["Variant 2 (SLC45A2 NM_016180.5)"].transform(lambda x: extract_var(x))

In [20]:
gene_symbol="SLC45A2"
gene_id="HGNC:16472"
transcript="NM_016180.5"
vman = VariantManager(df=df, individual_column_name="Patients", allele_1_column_name="var1",
                      allele_2_column_name="var2", gene_id=gene_id, gene_symbol=gene_symbol, transcript=transcript)

In [21]:
vman.to_summary()

Unnamed: 0,status,count,alleles
0,mapped,27,"c.806G>A, c.1518C>T, c.258del, c.1255C>A, c.1466T>C, c.1045G>A, c.950A>G, c.1506del, c.130G>A, c.267_271del, c.1028_1029del, c.1532C>T, c.179T>G, c.1471G>A, c.533_534dup, c.977T>A, c.147C>G, c.273del, c.1068C>G, c.606G>C, c.1532C>A, c.1036G>T, c.1166_1167del, c.1033-6_1033-3del, c.1273del, c.953G>A, c.986del"
1,unmapped,1,Deletion exons 1-4


In [22]:
vman.code_as_chromosomal_deletion({'Deletion exons 1-4'})

In [23]:
validated_var_d = vman.get_variant_d()
print(f"We got {len(validated_var_d)} variant objects")

We got 28 variant objects


In [24]:
ageMapper = AgeColumnMapper.by_year('Age (Years)')
#ageMapper.preview_column(df)
sexMapper = SexColumnMapper(male_symbol='M', female_symbol='F', column_name='Gender')
#sexMapper.preview_column(df)

In [25]:
encoder = CohortEncoder(df=df, 
                        hpo_cr=hpo_cr, 
                        column_mapper_list=column_mapper_list, 
                        individual_column_name="Patients", 
                        age_at_last_encounter_mapper=ageMapper, 
                        sexmapper=sexMapper,
                        metadata=metadata)
oca4 = Disease(disease_id='OMIM:606574',  disease_label='Albinism, oculocutaneous, type IV')
encoder.set_disease(oca4)

In [26]:
individuals = encoder.get_individuals()

In [27]:
for i in individuals:
    rows = df.loc[df['Patients'] == i.id]
    if len(rows) != 1:
        raise ValueError(f"Got {len(rows)} rows but expected only 1")
    var1 = rows.iloc[0]['var1']
    var2 = rows.iloc[0]['var2']
    if var1 == var2:
        # homozygous
        var_object = validated_var_d.get(var1)
        var_object.set_homozygous()
        i.add_variant(var_object)
    else:
        var1_object  = validated_var_d.get(var1) 
        var2_object  = validated_var_d.get(var2)
        var1_object.set_heterozygous()
        var2_object.set_heterozygous()
        i.add_variant(var1_object)
        i.add_variant(var2_object)

In [28]:
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.BI_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
WARNING,REDUNDANT,24


In [29]:
individuals = cvalidator.get_error_free_individual_list()
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.BI_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

In [30]:
individuals = cvalidator.get_error_free_individual_list()
table = PhenopacketTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
P1 (MALE; P20Y),"Albinism, oculocutaneous, type IV (OMIM:606574)",NM_016180.5:c.267_271del (homozygous),Nevus (HP:0003764); White eyelashes (HP:0002227); White eyebrow (HP:0002226); Reduced visual acuity (HP:0007663); Esotropia (HP:0000565); Hypermetropia (HP:0000540); Nystagmus (HP:0000639); Iris hypopigmentation (HP:0007730); Hypopigmentation of hair (HP:0005599); Hypoplasia of the fovea (HP:0007750)
P2 (FEMALE; P7Y),"Albinism, oculocutaneous, type IV (OMIM:606574)",NM_016180.5:c.1028_1029del (homozygous),Melanocytic nevus (HP:0000995); Iris transillumination defect (HP:0012805); White eyelashes (HP:0002227); White eyebrow (HP:0002226); Reduced visual acuity (HP:0007663); Exotropia (HP:0000577); Astigmatism (HP:0000483); Nystagmus (HP:0000639); Iris hypopigmentation (HP:0007730); Hypopigmentation of hair (HP:0005599); Hypoplasia of the fovea (HP:0007750)
P3 (MALE; P7Y),"Albinism, oculocutaneous, type IV (OMIM:606574)",NM_016180.5:c.1028_1029del (homozygous),Nystagmus (HP:0000639); Melanocytic nevus (HP:0000995); Iris hypopigmentation (HP:0007730); Hypopigmentation of hair (HP:0005599); White eyebrow (HP:0002226); White eyelashes (HP:0002227); Astigmatism (HP:0000483); Iris transillumination defect (HP:0012805); Hypoplasia of the fovea (HP:0007750); excluded: Strabismus (HP:0000486)
P4 (FEMALE; P49Y),"Albinism, oculocutaneous, type IV (OMIM:606574)",NM_016180.5:c.273del (heterozygous) NM_016180.5:c.1068C>G (heterozygous),Iris transillumination defect (HP:0012805); White eyelashes (HP:0002227); White eyebrow (HP:0002226); Reduced visual acuity (HP:0007663); Nystagmus (HP:0000639); Iris hypopigmentation (HP:0007730); Hypopigmentation of hair (HP:0005599); Hypoplasia of the fovea (HP:0007750); excluded: Nevus (HP:0003764); excluded: Strabismus (HP:0000486)
P5 (FEMALE; P63Y),"Albinism, oculocutaneous, type IV (OMIM:606574)",NM_016180.5:c.986del (heterozygous) NM_016180.5:c.1036G>T (heterozygous),Nevus (HP:0003764); Iris transillumination defect (HP:0012805); White eyelashes (HP:0002227); White eyebrow (HP:0002226); Reduced visual acuity (HP:0007663); Nystagmus (HP:0000639); Iris hypopigmentation (HP:0007730); Hypopigmentation of hair (HP:0005599); Hypoplasia of the fovea (HP:0007750); excluded: Strabismus (HP:0000486)
P6 (MALE; P18Y),"Albinism, oculocutaneous, type IV (OMIM:606574)",NM_016180.5:c.986del (heterozygous) NM_016180.5:c.1471G>A (heterozygous),Melanocytic nevus (HP:0000995); Iris transillumination defect (HP:0012805); White eyelashes (HP:0002227); White eyebrow (HP:0002226); Reduced visual acuity (HP:0007663); Nystagmus (HP:0000639); Iris hypopigmentation (HP:0007730); Hypopigmentation of hair (HP:0005599); Hypoplasia of the fovea (HP:0007750); excluded: Strabismus (HP:0000486)
P7 (MALE; P9Y),"Albinism, oculocutaneous, type IV (OMIM:606574)",NM_016180.5:c.986del (heterozygous) NM_016180.5:c.1471G>A (heterozygous),Iris transillumination defect (HP:0012805); White eyelashes (HP:0002227); White eyebrow (HP:0002226); Reduced visual acuity (HP:0007663); Nystagmus (HP:0000639); Iris hypopigmentation (HP:0007730); Hypopigmentation of hair (HP:0005599); Hypoplasia of the fovea (HP:0007750); excluded: Nevus (HP:0003764); excluded: Strabismus (HP:0000486)
P8 (MALE; P7Y),"Albinism, oculocutaneous, type IV (OMIM:606574)",NM_016180.5:c.986del (heterozygous) NM_016180.5:c.1036G>T (heterozygous),Iris transillumination defect (HP:0012805); White eyelashes (HP:0002227); White eyebrow (HP:0002226); Reduced visual acuity (HP:0007663); Esotropia (HP:0000565); Nystagmus (HP:0000639); Iris hypopigmentation (HP:0007730); Hypopigmentation of hair (HP:0005599); Hypoplasia of the fovea (HP:0007750); excluded: Nevus (HP:0003764)
P9 (MALE; P16Y),"Albinism, oculocutaneous, type IV (OMIM:606574)",NM_016180.5:c.986del (homozygous),Melanocytic nevus (HP:0000995); Iris transillumination defect (HP:0012805); White eyelashes (HP:0002227); White eyebrow (HP:0002226); Reduced visual acuity (HP:0007663); Nystagmus (HP:0000639); Iris hypopigmentation (HP:0007730); Hypopigmentation of hair (HP:0005599); Hypoplasia of the fovea (HP:0007750); excluded: Strabismus (HP:0000486)
P10 (FEMALE; P52Y),"Albinism, oculocutaneous, type IV (OMIM:606574)",NM_016180.5:c.986del (heterozygous) NM_016180.5:c.1166_1167del (heterozygous),Iris transillumination defect (HP:0012805); White eyelashes (HP:0002227); White eyebrow (HP:0002226); Reduced visual acuity (HP:0007663); Esotropia (HP:0000565); Nystagmus (HP:0000639); Iris hypopigmentation (HP:0007730); Hypopigmentation of hair (HP:0005599); Hypoplasia of the fovea (HP:0007750); excluded: Nevus (HP:0003764)


In [31]:
output_dir = "phenopackets"
Individual.output_individuals_as_phenopackets(individual_list=individuals,
                                             metadata=metadata,
                                             outdir=output_dir)

We output 30 GA4GH phenopackets to the directory phenopackets
