Skip to content
SICS: A phylogeny-regularized sparse regression model for microbiome-based prediction
R C++ C
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R
data
figures
man
src
.DS_Store
DESCRIPTION
NAMESPACE
README.md
caff.md
library.R
smoking.md

README.md

SICS: A phylogeny-regularized sparse regression model for microbiome-based prediction

SICS is a package designed for microbiome-based prediction based on OTU profiles of 16S-rRNA experiment.

1. Introduction

We develop a phylogeny-regularized sparse regression model for “sparse and clustered” microbiome signal. The proposed method uses a novel phylogeny-based smoothness penalty, which is defined based on the inverse matrix of the phylogeny-induced correlation matrix. The new phylogeny-based penalty addresses the two major drawback of the Laplacian-type penalty. It encourages local smoothing, i.e., smoothing effects from close neighbors, as well as, enjoy the data-driven grouping if the tree is mis-specified.

2. Installation

Run following commands in R:

library(devtools)
install_github("lichen-lab/SICS")

3. Use SICS

SICS starts from a OTU abundance matrix (normalized counts), where each row corresponds to a individual and each column corresponds to an OTU, and and a distance matrix among all OTUs. Below we will use a simulated data (traing set includes 100 individuals, 200 OTUs; testing set includes 200 individuals, 200 OTUs) to illustrate the workflow of SICS. The data is distributed in SICS as data_SICS.

1. Load library and example data

library(SICS)
data(data_SICS)
library(ROCR)
help(SICS)

2. Continous outcome: train a model based on the training set, and test the prediction using the testing set

set.seed(1234)
beta.sics=SICS(data_gaussian$z,data_gaussian$y,data_gaussian$D,family='gaussian',pho=c(1/4,4),lambda2=c(1/4,4))
yhat=predict(beta.sics,data_gaussian$z.te,family='gaussian')

plot(data_gaussian$y.te, yhat, main='Continous Outcome',xlab='Observed',ylab='Predicted',col=1,lwd=3)
legend("bottomright",legend=paste('R:',cor(data_gaussian$y.te, yhat)),pch=16)

3. Binary outcome: train a model based on the training set, and test the prediction using the testing set

set.seed(1234)
beta.sics=SICS(data_binary$z,data_binary$y,data_binary$D,family='binomial',pho=c(1/4,4),lambda2=c(1/4,4))
yhat=predict(beta.sics,data_binary$z.te,family='binomial')

pred=prediction(yhat,data_binary$y.te)
perf=performance(pred,"tpr","fpr")
auc=performance(pred,"auc")@y.values[[1]]
plot(perf,main="Binary Outcome",col=1,lwd=3)
abline(0,1)
legend("bottomright",legend=paste('AUC:',auc),pch=16)

3. Real data example

We first demonstrate two real data examples to compare SICS with other prediction methods. The following packages are required to be installed before running real data examples.

install.packages(c('ape','ade4','cluster','randomForest','glmnet','glmgraph','ncvreg'))
install.packages('devtools')
library(devtools)
install_github("lichen-lab/SICS")
install_github("lichen-lab/GMPR")

The tutorial for caffeine data analysis (continuous outcome) is https://github.com/lichen-lab/SICS/blob/master/caff.md The tutorial for smoking data analysis (binary outcome) is https://github.com/lichen-lab/SICS/blob/master/smoking.md

You can’t perform that action at this time.