# Retinal Degeneration Associated With RPGRIP1


Data from [Beryozkin A, et al. Retinal Degeneration Associated With RPGRIP1: A Review of Natural History, Mutation Spectrum, and Genotype-Phenotype Correlation in 228 Patients](https://pubmed.ncbi.nlm.nih.gov/34722527)

In [1]:
import genophenocorr

print(f"Using genophenocorr version {genophenocorr.__version__}")

Using genophenocorr version 0.1.1dev


## Setup

### Load HPO

We use HPO `v2023-10-09` release for this analysis.

In [2]:
import hpotk

fpath_hpo = 'https://github.com/obophenotype/human-phenotype-ontology/releases/download/v2023-10-09/hp.json'
hpo = hpotk.load_minimal_ontology(fpath_hpo)

### Load Phenopackets

We will load phenopacket JSON files located in `phenopackets` folder that is next to the notebook.

In [3]:
from genophenocorr.preprocessing import configure_caching_patient_creator, load_phenopacket_folder

fpath_phenopackets = 'phenopackets'
pc = configure_caching_patient_creator(hpo)
cohort = load_phenopacket_folder(fpath_phenopackets, pc)

Patients Created:   0%|          | 0/229 [00:00<?, ?it/s]Expected a VCF record, a VRS CNV, or an expression with `hgvs.c` but did not find one in patient PMID_34722527_individual_individual_5_Juliana_Maria_Ferraz_Sallum1_3_Clinicalandmo
Expected at least one variant per patient, but received none for patient PMID_34722527_individual_individual_5_Juliana_Maria_Ferraz_Sallum1_3_Clinicalandmo
Patient individual_individual_5_Juliana_Maria_Ferraz_Sallum1_3_Clinicalandmo has no variants listed and will not be included in this analysis.
Patients Created:  16%|█▌        | 36/229 [00:01<00:10, 18.37it/s]Expected a VCF record, a VRS CNV, or an expression with `hgvs.c` but did not find one in patient PMID_34722527_individual_individual_3_Juliana_Maria_Ferraz_Sallum1_3_Clinicalandmo
Expected at least one variant per patient, but received none for patient PMID_34722527_individual_individual_3_Juliana_Maria_Ferraz_Sallum1_3_Clinicalandmo
Patient individual_individual_3_Juliana_Maria_Ferraz_Sallum1_3

### Pick transcript

We choose the [MANE Select](https://www.ncbi.nlm.nih.gov/nuccore/NM_020366.4) transcript for *RPGRIP1*.

In [4]:
tx_id = 'NM_020366.4'

## Explore cohort

Explore the cohort to guide selection of the genotype-phenotype analysis.


In [5]:
cohort.list_all_variants()

[('14_21312457_21312458_GA_G', 25),
 ('14_21325943_21325943_G_T', 12),
 ('14_21302530_21302531_AG_A', 8),
 ('14_21345145_21345145_C_T', 8),
 ('14_21345139_21345146_CAAGGCCG_C', 7),
 ('14_21325252_21325252_G_A', 7),
 ('14_21327671_21327671_A_AT', 6),
 ('14_21317724_21317724_C_T', 5),
 ('14_21325265_21325265_A_G', 5),
 ('14_21303542_21303542_C_T', 5),
 ('14_21326131_21326131_C_T', 4),
 ('14_21327800_21327801_CT_C', 4),
 ('14_21348174_21348174_T_G', 4),
 ('14_21348210_21348214_AAAAG_A', 4),
 ('14_21326544_21327883_ATTTTTAGTAGAGATGGGATTTCTCCATGTTGGTCAGGCTGGTCTTCAACTCCCGACCTCAGGTGAACCTCCCACCTGAGCCTCCCAAAGTGCTGGGATTACAGACGTGAGCCACCGCGCCTGGCTGAACAAACTTTTTCAAGCTCTGTAATGCTGTCTAGTATCTGTCTTTACTAAAGGCCTGTTGTTTCTTAGTGCATGACTACATAGATATCTGATTATAAACTGAGACCTTAACACTCCCCCATCATTCTCTCACTTCTTTTAAACACTGGACACAAGTTAGAGAGATTTCCACACCAGATCATGACAAACACAAATTTCTTGGATTTTTTTTTTCCTCCCAATGTGGAGCTGAGCTCCATACTGTCTTTCCTAACTTTTATACCTAGGATTGTGGGGGTGTACCAAGAGGGGTCAACTCTTTGACTACAGTCCTGGGAGGGTGAGGTGGGGGTATCCATGTTTTCCTTAGGAAGTGGG

In [6]:
cohort.list_data_by_tx()

{'NM_001377523.1': Counter({'INTRON_VARIANT': 85,
          'STOP_GAINED': 37,
          'FRAMESHIFT_VARIANT': 36,
          'SPLICE_ACCEPTOR_VARIANT': 10,
          'SPLICE_POLYPYRIMIDINE_TRACT_VARIANT': 1,
          'MISSENSE_VARIANT': 7,
          'SPLICE_REGION_VARIANT': 11,
          'SPLICE_DONOR_VARIANT': 8,
          'SPLICE_DONOR_5TH_BASE_VARIANT': 4,
          'CODING_SEQUENCE_VARIANT': 5,
          'INFRAME_DELETION': 1,
          'SYNONYMOUS_VARIANT': 1}),
 'NM_020366.4': Counter({'FRAMESHIFT_VARIANT': 95,
          'STOP_GAINED': 81,
          'MISSENSE_VARIANT': 50,
          'SPLICE_ACCEPTOR_VARIANT': 12,
          'SPLICE_POLYPYRIMIDINE_TRACT_VARIANT': 2,
          'INTRON_VARIANT': 10,
          'SPLICE_REGION_VARIANT': 20,
          'CODING_SEQUENCE_VARIANT': 7,
          'SPLICE_DONOR_VARIANT': 19,
          'SPLICE_DONOR_5TH_BASE_VARIANT': 5,
          'INFRAME_DELETION': 2,
          'SYNONYMOUS_VARIANT': 1}),
 'NM_001377948.1': Counter({'FRAMESHIFT_VARIANT': 47,
 

## Configure the analysis

In [7]:
from genophenocorr.analysis import configure_cohort_analysis
from genophenocorr.analysis.predicate import BooleanPredicate

analysis = configure_cohort_analysis(cohort, hpo)

## Run the analyses

Test for presence of genotype-phenotype correlations between missense variants vs. others.

In [8]:
from genophenocorr.model import VariantEffect

by_missense = analysis.compare_by_variant_effect(VariantEffect.MISSENSE_VARIANT, tx_id=tx_id)
by_missense.summarize(hpo, BooleanPredicate.YES)

MISSENSE_VARIANT on NM_020366.4,No,No,Yes,Yes,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Eye poking [HP:0001483],25,64.102564,3,7.692308,0.608819,1.0
Very low visual acuity [HP:0032122],66,81.481481,15,18.518519,1.0,1.0
Reduced visual acuity [HP:0007663],90,78.947368,24,21.052632,1.0,1.0
Visual impairment [HP:0000505],90,78.947368,24,21.052632,1.0,1.0
Abnormality of vision [HP:0000504],90,78.947368,24,21.052632,1.0,1.0
Abnormal eye physiology [HP:0012373],95,79.831933,24,20.168067,1.0,1.0
Abnormality of the eye [HP:0000478],95,79.831933,24,20.168067,1.0,1.0
Phenotypic abnormality [HP:0000118],97,80.165289,24,19.834711,1.0,1.0
All [HP:0000001],97,80.165289,24,19.834711,1.0,1.0
Neurodevelopmental delay [HP:0012758],11,100.0,0,0.0,1.0,1.0


Test for presence of genotype-phenotype correlations between frameshift variants vs. others.

In [9]:
by_frameshift = analysis.compare_by_variant_effect(VariantEffect.FRAMESHIFT_VARIANT, tx_id=tx_id)
by_frameshift.summarize(hpo, BooleanPredicate.YES)

FRAMESHIFT_VARIANT on NM_020366.4,No,No,Yes,Yes,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Eye poking [HP:0001483],5,12.820513,23,58.974359,0.016983,0.628369
Very low visual acuity [HP:0032122],45,55.555556,36,44.444444,1.0,1.0
Reduced visual acuity [HP:0007663],66,57.894737,48,42.105263,1.0,1.0
Visual impairment [HP:0000505],66,57.894737,48,42.105263,1.0,1.0
Abnormality of vision [HP:0000504],66,57.894737,48,42.105263,1.0,1.0
Abnormal eye physiology [HP:0012373],67,56.302521,52,43.697479,1.0,1.0
Abnormality of the eye [HP:0000478],67,56.302521,52,43.697479,1.0,1.0
Phenotypic abnormality [HP:0000118],69,57.024793,52,42.975207,1.0,1.0
All [HP:0000001],69,57.024793,52,42.975207,1.0,1.0
Neurodevelopmental delay [HP:0012758],3,27.272727,8,72.727273,1.0,1.0


Or between subjects with >=1 allele of a variant vs. the other subjects:

In [10]:
variant_key = '14_21312457_21312458_GA_G'

by_var = analysis.compare_by_variant_key(variant_key)
by_var.summarize(hpo, BooleanPredicate.YES)

>=1 allele of the variant 14_21312457_21312458_GA_G,No,No,Yes,Yes,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p value,Corrected p value
Eye poking [HP:0001483],12,30.769231,16,41.025641,0.000919,0.03401
Very low visual acuity [HP:0032122],64,79.012346,17,20.987654,1.0,1.0
Reduced visual acuity [HP:0007663],97,85.087719,17,14.912281,1.0,1.0
Visual impairment [HP:0000505],97,85.087719,17,14.912281,1.0,1.0
Abnormality of vision [HP:0000504],97,85.087719,17,14.912281,1.0,1.0
Abnormal eye physiology [HP:0012373],102,85.714286,17,14.285714,1.0,1.0
Abnormality of the eye [HP:0000478],102,85.714286,17,14.285714,1.0,1.0
Phenotypic abnormality [HP:0000118],104,85.950413,17,14.049587,1.0,1.0
All [HP:0000001],104,85.950413,17,14.049587,1.0,1.0
Neurodevelopmental delay [HP:0012758],7,63.636364,4,36.363636,1.0,1.0


TODO - finalize!