r2redux

The ‘r2redux’ package can be used to derive test statistics for R² values from polygenic risk score (PGS) models (variance and covariance of R² values, p-value and 95% confidence intervals (CI)). For example, it can test if two sets of R² values from two different PGS models are significantly different to each other whether the two sets of PGS are independent or dependent. Because R² value is often regarded as the predictive ability of PGS, r2redux package can be useful to assess the performances of PGS methods or multiple sets of PGS based on different information sources. Furthermore, the package can derive the information matrix of β ̂_1^2and β ̂_2^2 from a multiple regression (see olkin_beta1_2 or olkin_beta_info function in the manual), which is a basis of a novel PGS-based genomic partitioning method (see r2_enrich or r2_enrich_beta function in the manual). It is recommended that the target sample size in the PGS study should be more than 2,000 for quantitative traits and more than 5,000 for binary responses or case-control studies. The P-value generated from the r2redux package provides two types of p-values (for one- and two-tailed test) unless the comparison is for nested models (e.g. y=PGS_1+PGS_2+e vs. y=PGS_2+e) where the R² of the full model is expected to be always higher than the reduced model. When there are multiple covariates (e.g. age, sex and other demographic variables), the phenotypes can be adjusted for the covariates, and pre-adjusted phenotypes (residuals) should be used in the r2redux.

INSTALLATION

To use r2redux:

install.packages("r2redux") 
library(r2redux)

or

install.packages("devtools")
library(devtools)
devtools::install_github("mommy003/r2redux")
library(r2redux)

QUICK START

We illustrate the usage of r2redux using multiple sets of PGS estimated based on GWAS summary statistics from UK Biobank or Biobank Japan (reference datasets). In a target dataset, the phenotypes of target samples (y) can be predicted with PGS (a PGS model, e.g. y=PRS+e, where y and PGS are column-standardised 1. Note that the target individuals should be independent from reference individuals. We can test the significant differences of the predictive ability (R²) between a pair of PGS (see r2_diff function and example in the manual).

DATA PREPARATION

a. Statistical testing of significant difference between R2 values for p-value thresholds: r2redux requires only phenotype and estimated PGS (from PLINK or any other software). Note that any missing value in the phenotypes and PGS tested in the model should be removed. If we want to test the significant difference of R² values for p-value thresholds, r2_diff function can be used with an input file that includes the following fields (also see test_ukbb_thresholds_scaled in the example directory form github (https://github.com/mommy003/r2redux) or read dat1 file embedded within the package and r2_diff function in the manual).

Phenotype (y)
PGS for p value 1 (x1)
PGS for p value 0.5 (x2)
PGS for p value 0.4 (x3)
PGS for p value 0.3 (x4)
PGS for p value 0.2 (x5)
PGS for p value 0.1 (x6)
PGS for p value 0.05 (x7)
PGS for p value 0.01 (x8)
PGS for p value 0.001 (x9)
PGS for p value 0.0001 (x10)

To get the test statistics for the difference between R2(y=x[,v1]) and R2(yx[,v2]). (here we define R_1^2= R^2(y=x[,v1])) and R_2^2=R^2(y=x[,v2])))

dat=read.table("test_ukbb_thresholds_scaled") #(see example files) or
dat=dat1 #(this example embedded within the package)
nv=length(dat$V1)
v1=c(1)
v2=c(2)
output=r2_diff(dat,v1,v2,nv)

r2redux output
output$var1 (variance of R_1^2)
0.0001436128
output$var2 (variance of R_2^2)
0.0001451358
output$var_diff (variance of difference between R_1^2and R_2^2)
5.678517e-07
output$r2_based_p (p-value for significant difference between R_1^2 and R_2^2)
0.5514562
output$mean_diff (differences between R_1^2 and R_2^2)
-0.0004488044
output$upper_diff (upper limit of 95% CI for the difference)
0.001028172
output$lower_diff (lower limit of 95% CI for the difference)
-0.001925781
output$p$nested
1
output$p$nonnested
0.5514562
output$p$LRT
1

$\color{black}{Note:}$ If the directions are not consistent, for instance, if one correlation is positive (R_1) and another is negative (R_2), or vice versa, it is crucial to approach the interpretation of the comparative test with caution. It's important to note that R² alone does not provide information about the direction or sign of the relationships between predictors and the response variable.

When faced with multiple predictors common between two models (see a few scenarios below), please see r2_diff section in the manual (page no 14-17)

y = PRS+any_cov1+any_cov2+...+any_covN+e vs. y = any_cov1+any_cov2+...+any_covN+e
y = PRS1+any_cov1+any_cov2+...+any_covN+e vs. y = PRS2+any_cov1+any_cov2+...+any_covN+e

b. PGS-based genomic enrichment analysis: If we want to perform some enrichment analysis (e.g., regulatory vs non_regulatory) in the PGS context to test significantly different from the expectation (p_exp= # SNPs in the regulatory / total # SNPs = 4%). We simultaneously fit two sets of PGS from regulatory and non-regulatory to get β ̂_regu^2 and β ̂_(non-regu)^2, using a multiple regression, and assess if the ratio, (β ̂_1^2)/(r_(y,(x_1,x_2))^2 ) are significantly different from the expectation, p_exp. To test this, we need to prepare input file for r2redux that includes the following fields (e.g. test_ukbb_enrichment_choles in example directory or read dat2 file embedded within the package and r2_enrich_beta function in the manual).

Phenotype (y)
PGS for regulatory region (x1)
PGS for non-regulatory region (x2)

To get the test statistic for the ratio which is significantly different from the expectation. var(β ̂_1^2/r_(y,(x_1,x_2))^2), where β ̂_1^2 is the squared regression coefficient of x_1 from a multiple regression model, i.e. y=x_1 β_1+ x_2 β_2+e, and r_(y,(x_1,x_2))^2 is the coefficient of determination of the model. It is noted that y, x_1 and x_2 are column standardised (mean 0 and variance 1).

dat=read.table("test_ukbb_enrichment_choles") #(see example file) or 
dat=dat2 #(this example embedded within the package)
nv=length(dat$V1)
v1=c(1)
v2=c(2)
output=r2_beta_var(dat,v1,v2,nv)

r2redux output
output$beta1_sq (beta1^2)
0.01118301
output$beta2_sq (beta2^2)
0.004980285
output$var1 (variance of beta1^2)
7.072931e-05
output$var2 (variance of beta2^2)
3.161929e-05
output$var1_2 (variance of difference between beta1^2 and beta2^2)
0.000162113
output$cov (covariance between beta1^2 and beta2^2)
-2.988221e-05
output$upper_beta1_sq (upper limit of 95% CI for beta1^2)
0.03037793
output$lower_beta1_sq (lower limit of 95% CI for beta1^2)
-0.00123582
output$upper_beta2_sq (upper limit of 95% CI for beta2^2)
0.02490076
output$lower_beta2_sq (lower limit of 95% CI for beta2^2)
-0.005127546

dat=dat2 #(this example embedded within the package)
nv=length(dat$V1)
v1=c(1)
v2=c(2)
expected_ratio=0.04
output=r2_enrich_beta(dat,v1,v2,nv,expected_ratio)

r2redux output
output$beta1_sq (beta1^2)
0.01118301
output$beta2_sq (beta2^2)
0.004980285
output$ratio1 (beta1^2/R^2)
0.4392572
output$ratio2 (beta2^2/R^2)
0.1956205
output$ratio_var1 (variance of ratio 1)
0.08042288
output$ratio_var2 (variance of ratio 2)
0.0431134
output$upper_ratio1 (upper limit of 95% CI for ratio 1)
0.9950922
output$lower_ratio1 (lower limit of 95% CI for ratio 1)
-0.1165778
output$upper_ratio2 upper limit of 95% CI for ratio 2)
0.6025904
output$lower_ratio2 (lower limit of 95% CI for ratio 2)
-0.2113493
output$enrich_p1 (two tailed P-value for beta1^2/R^2 is significantly different from exp1)
0.1591692
output$enrich_p1_one_tail (one tailed P-value for beta1^2/R^2 is significantly different from exp1)
0.07958459
output$enrich_p2 (P-value for beta2^2/R2 is significantly different from (1-exp1))
0.000232035
output$enrich_p2_one_tail (one tailed P-value for beta2^2/R2 is significantly different from (1-exp1))
0.0001160175

References

Olkin, I. and Finn, J.D. Correlations redux. Psychological Bulletin, 1995. 118(1): p. 155.
Momin, M.M., Lee, S., Wray, N.R. and Lee S.h. 2022. Significance tests for R2 of out-of-sample prediction using polygenic scores. The American Journal of Human Genetics,110: p. 349-358. https://doi.org/10.1016/j.ajhg.2023.01.004 .

Contact information

Please contact Md Moksedul Momin (cvasu.momin@gmail.com) or Hong Lee (hong.lee@unisa.edu.au) if you have any queries.

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
R		R
data		data
example		example
man		man
tests		tests
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
README.tex		README.tex
r2redux_1.0.17.pdf		r2redux_1.0.17.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

r2redux

INSTALLATION

QUICK START

DATA PREPARATION

References

Contact information

About

Releases

Packages

Languages

mommy003/r2redux

Folders and files

Latest commit

History

Repository files navigation

r2redux

INSTALLATION

QUICK START

DATA PREPARATION

References

Contact information

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages