# RPGRIP1
Date from [Beryozkin A, et al. (2021) Retinal Degeneration Associated With RPGRIP1: A Review of Natural History, Mutation Spectrum, and Genotype-Phenotype Correlation in 228 Patients. Front Cell Dev Biol. 2021 Oct 14;9:746781. doi: 10.3389/fcell.2021.746781. PMID: 34722527; PMCID: PMC8551679.]().
RPGRIP1 encodes a ciliary protein expressed in the photoreceptor connecting cilium. Mutations in this gene cause ∼5% of Leber congenital amaurosis (LCA) worldwide, but are also associated with cone-rod dystrophy (CRD) and retinitis pigmentosa (RP) phenotypes. Our purpose was to clinically characterize RPGRIP1 patients from our cohort, collect clinical data of additional RPGRIP1 patients reported previously in the literature, identify common clinical features, and seek genotype-phenotype correlations.

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import display, HTML
from pyphetools.creation import *
from pyphetools.visualization import *
from pyphetools.validation import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.9.3


In [2]:
PMID = "PMID:34722527"
title = "Retinal Degeneration Associated With RPGRIP1: A Review of Natural History, Mutation Spectrum, and Genotype-Phenotype Correlation in 228 Patients"
cite = Citation(pmid=PMID, title=title)
parser = HpoParser()
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
metadata = MetaData(created_by="ORCID:0000-0002-0736-9199", citation=cite)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2023-10-09


In [3]:
df = pd.read_excel('input/Table_2_Beryozkin.xlsx')
df = df.astype(str)

In [4]:
df.head(2)

Unnamed: 0,# of family or patient,Article title,First author,First mutation (C.),First mutation (P.),Second mutation (C.),Second mutation (P.),Diagnosis,Intelectual disability/ Neurodevelopmental delay,First sighns and symptoms,...,Nyctalopia (night blidness),ODR (oculodigital reflex),Refraction/RNS (cycloplaegic retinoscopy spherical equivalent),VA,age of VA,VF,age of VF,rod,cone,age of ERG
0,P030,Targeted next generation sequencing identified novel mutations in RPGRIP1 associated with both retinitis pigmentosa and Leber's congenital amaurosis in unrelated Chinese patients,Hui Huang,Exon 1-22 deletion,,Exon 1-22 deletion,,LCA,no,0.0,...,,,,0.001,36.0,,,not detectable,not detectable,36.0
1,41,Clinical and Genetic Evaluation of a Cohort of Pediatric Patients with Severe Inherited Retinal Dystrophies,Valentina Di Iorio,c.86-3T>G,Splicing,c.2225_2226del,p.(G742fs),LCA,no,0.1,...,,,,0.05,8.0,,,0,0,8.0


In [5]:
# create a column with a unique patient id
from collections import defaultdict
# For each publication, count from 1 to k
# The original supplemental file does not always provide an id, but we need one
patient_id_d = defaultdict(int)

def unique_id(row:pd.Series):
    #print(row)
  
    number = str(row["# of family or patient"])
    art_title = str(row["Article title"])
    art_title = art_title[:15]
    first_author = str(row["First author"])
    first_author = first_author.replace(" ", "_").replace(",", "_").replace(".", "_")
    number = number.replace(" ", "")
    title = art_title.replace(" ", "")
    patient_id_part = f"{first_author}_{title}"
    first_author = first_author.replace(" ", "")
    if number == "nan" or len(number) < 2:
        patient_id_d[patient_id_part] += 1
        p_id = patient_id_d.get(patient_id_part)
        p_id = f"individual_{p_id}"
    else:
        patient_id_d[patient_id_part] += 1
        p_id = patient_id_d.get(patient_id_part) 
        p_id = f"{number}_{p_id}"
    return f"individual_{p_id}_{patient_id_part}"

df["individual_id"] = df.apply(unique_id, axis=1)

In [6]:
df["individual_id"]

0                   individual_P030_1_Hui_Huang_Targetednextg
1            individual_41_1_Valentina_Di_Iorio_ClinicalandGe
2      individual_arRP-F083_1_Leen_Abu-Safieh_Autozygome-guid
3      individual_individual_1_Juan_C__Zenteno_Extensivegenic
4                   individual_P024_2_Hui_Huang_Targetednextg
                                ...                          
224                 individual_MOL0358_6_13_Beryozkin_Current
225             individual_individual_4_Lin_Li__DetectionofVa
226       individual_237–523_6_Farzad_Jamshidi_Contributionof
227        individual_79–194_7_Farzad_Jamshidi_Contributionof
228                  individual_GUY_6_Gerber_S_Completeexon-i
Name: individual_id, Length: 229, dtype: object

In [7]:
column_mapper_d = {}

In [8]:
idMapper = SimpleColumnMapper(hpo_id="HP:0001263", hpo_label="Global developmental delay", observed="yes", excluded="no")
idMapper.preview_column(df['Intelectual disability/ Neurodevelopmental delay'])
column_mapper_d['Intelectual disability/ Neurodevelopmental delay'] = idMapper

In [9]:
df['First sighns and symptoms'].unique()
ageMapper = AgeColumnMapper.by_year(column_name='First sighns and symptoms')
#ageMapper.preview_column(df['First sighns and symptoms'])
# todo make robust against none errors

In [10]:
nystagmusMapper = SimpleColumnMapper(hpo_id="HP:0000639", hpo_label="Nystagmus", observed="1.", excluded="no")
nystagmusMapper.preview_column(df['Nystagmus/wandering/no fixation'])
column_mapper_d['Nystagmus/wandering/no fixation'] = nystagmusMapper

In [11]:
photophobiaMapper = SimpleColumnMapper(hpo_id="HP:0000613", hpo_label="Photophobia", observed="1.", excluded="n")
photophobiaMapper.preview_column(df['Photophobia'])
column_mapper_d['Photophobia'] = photophobiaMapper

In [12]:
# Nyctalopia (night blidness)
nyctalopiaMapper = SimpleColumnMapper(hpo_id="HP:0000662", hpo_label="Nyctalopia", observed="1.", excluded="n")
nyctalopiaMapper.preview_column(df['Nyctalopia (night blidness)'])
column_mapper_d['Nyctalopia (night blidness)'] = nyctalopiaMapper

In [13]:
# Eye poking HP:0001483
odrMapper = SimpleColumnMapper(hpo_id="HP:0001483", hpo_label="Eye poking", observed=["yes","1"], excluded="no")
odrMapper.preview_column(df['ODR (oculodigital reflex)'])
column_mapper_d['ODR (oculodigital reflex)'] = odrMapper

In [14]:
#results = OptionColumnMapper.autoformat(df, concept_recognizer=hpo_cr)
#print(results)

In [15]:
refraction_d = {
 '2.5': 'Moderate hypermetropia',
 '6.5': 'High hypermetropia',
 '10': 'High hypermetropia',
 '2.25': 'Moderate hypermetropia',
 '1.5': 'Mild hypermetropia',
 '3.75': 'Moderate hypermetropia',
 '2.75': 'Moderate hypermetropia',
 '1.25': 'Mild hypermetropia',
 'hyperopia': 'Hypermetropia',
 '4.5': 'Moderate hypermetropia',
 '10.25': 'High hypermetropia',
 '3.5': 'Moderate hypermetropia',
 '2.6': 'Moderate hypermetropia',
 '6.9': 'High hypermetropia',
 'high myopia': 'High myopia',
 '3.25': 'Moderate hypermetropia',
 '5.5': 'High hypermetropia'}
refractionMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=refraction_d)
refractionMapper.preview_column(df['Refraction/RNS (cycloplaegic retinoscopy spherical equivalent)'])
column_mapper_d['Refraction/RNS (cycloplaegic retinoscopy spherical equivalent)'] = refractionMapper

In [16]:
va_d = {'0.001': 'Very low visual acuity',
 '0.05': 'Severely reduced visual acuity',
 '0.03': 'Very low visual acuity',
 '0.0': 'Very low visual acuity',
 '0.1': 'Moderately reduced visual acuity',
 '0.01': 'Very low visual acuity',
 '0.0001': 'Very low visual acuity',
 '0.025': 'Very low visual acuity',
 '0.15': 'Moderately reduced visual acuity',
 '0.15037593984962405': 'Moderately reduced visual acuity',
 '0.25': 'Moderately reduced visual acuity',
 '0.41': 'Moderately reduced visual acuity',
 '0.13333333333333333': 'Moderately reduced visual acuity',
 '0.06': 'Severely reduced visual acuity',
 '0.2': 'Moderately reduced visual acuity',
 '0.5': 'Mildly reduced visual acuity',
 '0.41500000000000004': 'Moderately reduced visual acuity',
 '0.4': 'Mildly reduced visual acuity',
 '0.09': 'Severely reduced visual acuity',
 '0.02': 'Very low visual acuity',
 '0.07500000000000001': 'Very low visual acuity'}
vaMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=va_d)
vaMapper.preview_column(df['VA'])
column_mapper_d['VA'] = vaMapper

In [17]:
# A normal visual field is an island of vision measuring 90 degrees temporally to central Fixation, 
# 50 degrees superiorly and nasally, and 60 degrees inferiorly. 
vf_d = {
 '30': 'Constriction of peripheral visual field',
 '12.5': 'Constriction of peripheral visual field',
 '40': 'Constriction of peripheral visual field',
 '10': 'Constriction of peripheral visual field',
 '15': 'Constriction of peripheral visual field',
 '20': 'Constriction of peripheral visual field',
 'severly constricted': 'Constriction of peripheral visual field'}
vfMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=vf_d)
vfMapper.preview_column(df['VF'])
column_mapper_d['VF'] = vfMapper

In [18]:
# We will not attempt to map ERG findings in more detail because very scant information is provided
rod_d = {'not detectable': 'Abnormal electroretinogram',
 'no detectable responses': 'Abnormal electroretinogram',
 'flat': 'Abnormal electroretinogram',
 'subnormal': 'Abnormal electroretinogram',
 'extinguished': 'Abnormal electroretinogram',
 'non-recordable': 'Abnormal electroretinogram',
 'Reduced and delayed': 'Abnormal electroretinogram',
 'reduced': 'Abnormal electroretinogram',
 'not detected': 'Abnormal electroretinogram'}
excluded = { 'norm': 'Abnormal electroretinogram',}
rodMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=rod_d, excluded_d=excluded)
rodMapper.preview_column(df['rod'])
column_mapper_d['rod'] = rodMapper

In [19]:
cone_d = {'not detectable': 'Abnormal electroretinogram',
 'no detectable responses': 'Abnormal electroretinogram',
 'flat': 'Abnormal electroretinogram',
 'extinguished': 'Abnormal electroretinogram',
 'non-recordable': 'Abnormal electroretinogram',
 'severly reduced': 'Abnormal electroretinogram',
 'not detected': 'Abnormal electroretinogram'}
excluded = { 'norm': 'Abnormal electroretinogram',}
coneMapper = OptionColumnMapper(concept_recognizer=hpo_cr, option_d=cone_d, excluded_d=excluded)
coneMapper.preview_column(df['cone'])
column_mapper_d['cone'] = coneMapper

# Variants

Note that we corrected some HGVS, e.g. c.3618-5_c.3618-1del should be c.3618-1_3621del

In [20]:
v1_list = df['First mutation (C.)'].unique()
v2_list = df['Second mutation (C.)'].unique()
var_set = set(v1_list)
var_set.update(v2_list)
RPGRIP1_transcript = "NM_020366.4"
var_d = {}

svars_del = {  "Exon 10–18 deletion" , "Exon19 del", "Exon 1-22 deletion", "Exon 17–19 deletion"}
svar_dup = {"Exon 1–2 dup", "Exon 2 duplication"}
symbol = "RPGRIP1"
hgnc = "HGNC:13436"
vvaldidator = VariantValidator(genome_build="hg38", transcript=RPGRIP1_transcript)
for v in var_set:
    print(f"encoding {v}")
    if v in svars_del:
        var = StructuralVariant.chromosomal_deletion(cell_contents=v, gene_id=hgnc, gene_symbol=symbol)
        var_d[v] = var
    elif v in svar_dup:
        var = StructuralVariant.chromosomal_duplication(cell_contents=v, gene_id=hgnc, gene_symbol=symbol)
        var_d[v] = var
    else:
        var = vvaldidator.encode_hgvs(v)
        var_d[v] = var
print(f"Encoded {len(var_d)} variants with Variant Validator")

encoding c.2710+372_2895+76del
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_020366.4%3Ac.2710+372_2895+76del/NM_020366.4?content-type=application%2Fjson
encoding c.3358A>G
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_020366.4%3Ac.3358A>G/NM_020366.4?content-type=application%2Fjson
encoding c.2890delT
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_020366.4%3Ac.2890delT/NM_020366.4?content-type=application%2Fjson
encoding c.3620T>G
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_020366.4%3Ac.3620T>G/NM_020366.4?content-type=application%2Fjson
encoding c.3100_3238del139
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_020366.4%3Ac.3100_3238del139/NM_020366.4?content-type=application%2Fjson
encoding c.2314C>T
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_020366.4%3Ac.2314C>T/NM_020366.4?content-type=application%2Fjso

# Diagnosis

OMIM lists two diagnoses for the RPGRIP1. The authors list eight clinical diagnoses that we map as follows:

- [Cone-rod dystrophy 13, OMIM:608194 ](https://omim.org/entry/608194): 'RP', 'CRD', , 'eoRP', 'CRD (because cones was reduced more than rods, among 16 patients from 2 families, ages 14-16)', 'CRD ', 'RP\u2009'
- [Leber congenital amaurosis 6; OMIM:613826](https://omim.org/entry/613826): 'LCA', 'CRD/LCA'

In [21]:
crd13 = Disease(disease_id="OMIM:608194", disease_label="Cone-rod dystrophy 13")
lca6 = Disease(disease_id="OMIM:613826", disease_label="Leber congenital amaurosis 6")
disease_d = {
'RP': crd13, 
    'CRD': crd13,
    'eoRP': crd13, 
    'CRD (because cones was reduced more than rods, among 16 patients from 2 families, ages 14-16)': crd13,
    'CRD ': crd13,
    'RP\u2009': crd13,
    'LCA':lca6, 
    'CRD/LCA': lca6
}

In [22]:
from collections import defaultdict
patient_id_to_var_list_d = defaultdict(list)
patient_id_to_disease_d = {}
for _, row in df.iterrows():
    iid = row['individual_id']
    v1 = row['First mutation (C.)']
    v2 = row['Second mutation (C.)']
    var1 = var_d.get(v1)
    if var1 is None:
        raise ValueError(f"Could not retrieve variant for {v1}")
    var2 = var_d.get(v2)
    if var2 is None:
        raise ValueError(f"Could not retrieve variant for {v2}")
    patient_id_to_var_list_d[iid].append(var1)
    patient_id_to_var_list_d[iid].append(var2)
    dx = row['Diagnosis']
    disease = disease_d.get(dx)
    if disease is None:
        raise ValueError(f"Could not retrieve disease for {dx}")
    patient_id_to_disease_d[iid] = disease

In [23]:
encoder = CohortEncoder(df=df,
                       hpo_cr=hpo_cr,
                        column_mapper_d=column_mapper_d,
                        individual_column_name='individual_id',
                        metadata=metadata,
                        agemapper=AgeColumnMapper.not_provided(),
                        sexmapper=SexColumnMapper.not_provided()
                    )
encoder.set_disease_dictionary(patient_id_to_disease_d)
individuals = encoder.get_individuals()
for ii in individuals:
    i_id = ii.id
    var_list = patient_id_to_var_list_d.get(i_id)
    if var_list is None or len(var_list) != 2:
        raise ValueError(f"Malformed variant list for {i_id}: {var_list}")
    else:
        var1 = var_list[0]
        var2 = var_list[1]
        if var1 == var2:
            var1.set_homozygous()
            ii.add_variant(var1)
        else:
            var1.set_heterozygous()
            var2.set_heterozygous()
            ii.add_variant(var1)
            ii.add_variant(var2)
    
        

In [24]:
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.BI_ALLELIC)
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

Level,Error category,Count
WARNING,REDUNDANT,28
INFORMATION,NOT_MEASURED,877


In [25]:
individuals = cvalidator.get_error_free_individual_list()
table = PhenopacketTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
individual_P030_1_Hui_Huang_Targetednextg (UNKNOWN; ),Leber congenital amaurosis 6 (OMIM:613826),Exon 1-22 deletion: chromosomal_deletion (SO:1000029),Very low visual acuity (HP:0032122); Abnormal electroretinogram (HP:0000512); excluded: Global developmental delay (HP:0001263)
individual_41_1_Valentina_Di_Iorio_ClinicalandGe (UNKNOWN; ),Leber congenital amaurosis 6 (OMIM:613826),NM_020366.4:c.86-3T>G (heterozygous) NM_020366.4:c.2227_2228del (heterozygous),Severely reduced visual acuity (HP:0001141); excluded: Global developmental delay (HP:0001263)
individual_arRP-F083_1_Leen_Abu-Safieh_Autozygome-guid (UNKNOWN; ),Cone-rod dystrophy 13 (OMIM:608194),NM_020366.4:c.154C>T (homozygous),excluded: Global developmental delay (HP:0001263)
individual_individual_1_Juan_C__Zenteno_Extensivegenic (UNKNOWN; ),Cone-rod dystrophy 13 (OMIM:608194),NM_020366.4:c.154C>T (homozygous),excluded: Global developmental delay (HP:0001263)
individual_P024_2_Hui_Huang_Targetednextg (UNKNOWN; ),Leber congenital amaurosis 6 (OMIM:613826),NM_020366.4:c.154C>T (heterozygous) NM_020366.4:c.2020C>T (heterozygous),Very low visual acuity (HP:0032122); Abnormal electroretinogram (HP:0000512); excluded: Global developmental delay (HP:0001263)
individual_individual_1_Cathrine_Jespersgaard_Moleculargenet (UNKNOWN; ),Leber congenital amaurosis 6 (OMIM:613826),NM_020366.4:c.194G>A (homozygous),Global developmental delay (HP:0001263)
individual_individual_1_Jana_Zernant_GenotypingMicr (UNKNOWN; ),Leber congenital amaurosis 6 (OMIM:613826),NM_020366.4:c.194G>A (homozygous),excluded: Global developmental delay (HP:0001263)
individual_individual_1_Lin_Li_DetectionofVa (UNKNOWN; ),Leber congenital amaurosis 6 (OMIM:613826),NM_020366.4:c.242T>A (homozygous),excluded: Global developmental delay (HP:0001263)
individual_QT587_2_Lin_Li_DetectionofVa (UNKNOWN; ),Leber congenital amaurosis 6 (OMIM:613826),NM_020366.4:c.358C>T (homozygous),excluded: Global developmental delay (HP:0001263)
individual_QT491_3_Lin_Li_DetectionofVa (UNKNOWN; ),Leber congenital amaurosis 6 (OMIM:613826),NM_020366.4:c.358C>T (homozygous),excluded: Global developmental delay (HP:0001263)


In [26]:
Individual.output_individuals_as_phenopackets(individual_list=individuals,
                                             metadata=metadata)

We output 229 GA4GH phenopackets to the directory phenopackets


In [None]:
# pxf validate --hpo hp.json  *.json
# no errors