Skip to content
/ csrnaseq Public

csrnaseq: Identifying relevant covariates in RNA-seq analysis by pseudo-variable augmentation

Notifications You must be signed in to change notification settings

ntyet/csrnaseq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Summary

The R package csrnaseq performs a backward variable selection procedure using pseudo-variables for RNA-seq differential expression analysis (Nguyen and Nettleton 2024). The idea is to select the most relevant covariates such that the false selection rate is below a pre-specified threshold. The method is built upon the approach of Wu et al. (2007). While Wu et al. (2007)’s method works for one response variable, our method works for multiple response variables such as RNA-seq data. The selected covariates are then included in differential expression analysis using voom-limma pipeline (Law et al. 2014). The proposed method is implemented in function FSRAnalysisBS.

Installation

csrnaseq can be installed from GitHub:

# install.packages("devtools")
devtools::install_github("ntyet/csrnaseq")

Example

This is a basic example that shows how to use our method:

library(csrnaseq)
data(counts)
data(FixCov)
data(VarCov)
option <- "OWN"
B <- 2
m <- 2
alphamax <- 5
alpha0 <- 0.05
ncores <- 1
print.progress <- FALSE
saveall <- TRUE
FSRAnalysisBSOut <- FSRAnalysisBS(counts, FixCov, VarCov, 
                                  option, B, m, alphamax, alpha0, 
                                  ncores, print.progress, saveall)
names(FSRAnalysisBSOut)

Data Analysis and Simulation Based on the RFI RNA-seq Dataset

  • Codes for RFI RNA-seq data analysis are here and here.

  • Codes for generating six simulation scenarios are here and here. The outputs are here.

  • Codes for simulation are here and here. The outputs are here.

  • Codes for additional simulation to investigate covariates orthogonal to the primary variables that include:

    • Codes for generating six simulation scenarios: here. The outputs are here.
    • Codes for simulation: here and here. The outputs are here.

Data Analysis and Simulation Based on the Zebrafish RNA-seq Dataset

  • Zebrafish RNA-seq dataset are available at here.

  • Codes for Zebrafish RNA-seq data analysis are here and here.

  • Codes for generating two simulation scenarios are here. The outputs are here.

  • Codes for simulation are here and here. The outputs are here.

References

Law, C. W., Chen, Y., Shi, W., and Smyth, G. K. (2014), “Voom: Precision weights unlock linear model analysis tools for RNA-seq read counts,” Genome Biology, 15, R29. https://doi.org/10.1186/gb-2014-15-2-r29.

Nguyen, Y., and Nettleton, D. (2024), “Identifying relevant covariates in RNA-seq analysis by pseudo-variable augmentation,” Journal of Agricultural, Biological and Environmental Statistics. https://doi.org/10.1007/s13253-024-00665-3.

Wu, Y., Boos, D. D., and Stefanski, L. A. (2007), “Controlling variable selection by the addition of pseudovariables,” Journal of the American Statistical Association, Taylor & Francis, 102, 235–243. https://doi.org/10.1198/016214506000000843.

About

csrnaseq: Identifying relevant covariates in RNA-seq analysis by pseudo-variable augmentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published