-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Welcome to the cjBitSeq wiki!
cjBitSeq [1] implements a Bayesian model selection approach in order to simultaneously estimate transcript expression and perform Differential Expression (DE) analysis from RNA-seq data, given two (replicated) samples of biological conditions. The method has been also extended to the special case of Differential Transcript Usage [2]. A hierarchical Bayesian model builds upon the BitSeq [3, 4] framework and the posterior distribution of transcript expression and differential expression is inferred using Markov Chain Monte Carlo (MCMC).
-
Papastamoulis P. and Rattray M. (2017a). A Bayesian model selection approach for identifying differentially expressed transcripts from RNA-Seq data. Journal of the Royal Statistical Society, Series C.
-
Papastamoulis P. and Rattray M. (2017b). Bayesian estimation of Differential Transcript Usage from RNA-seq data.. Statistical Applications in Genetics and Molecular Biology.
-
Glaus P, Honkela A. and Rattray M. (2012). Identifying differentially expressed transcripts from RNA-Seq data with biological variation. Bioinformatics (28): 1721-1728.
-
Papastamoulis P., Hensman J., Glaus, P. and Rattray M. (2014). Improved variational Bayes inference for transcript expression estimation. Statistical Applications in Genetics and Molecular Biology (13), vol 2: 213-216.
In order to successfully install the cjBitSeq pipeline, the following software is required:
-
Boost (C++ libraries: how to install). Alternatively, for Ubuntu installations the easiest way is to open a terminal and type
sudo apt-get install libboost-dev
. - BitSeq
-
R
(version >= 3.0.2) -
R
libraries:Matrix
,foreach
,doMC
- GNUparallel shell tool.
Bowtie2 (or 1) is also required in order to map the RNA-seq reads into the reference transcriptome. To install cjBitSeq on your Linux system download and extract the source code, then enter into the extracted directory and run make
. The g++
GNU compiler should be also available on your system.
Make sure that all cjBitSeq binaries are included in your $PATH variable by running the command:
PATH=$PATH:/full-path-to-cjBitSeq-directory/
After that, the user can check if the installation is successful, see this link for details.
cjBitSeq works with alignment probabilities (.prob
files) of reads on a given set of transcripts. There should be at least one .prob
file for each of the two compared conditions. These probabilities are used as input of the main calling function.
In order to compute the alignment probabilities (.prob
files) it is required to
-
Use
bowtie
to map each set of reads (.fastq
files) to the reference transcriptome and obtain the corresponding.sam
files. -
Use the
parseAlignment
command of BitSeq for each.sam
file.
Assuming that cjBitSeq is compiled from source and that its binaries are included in your $PATH
variable, the following command wraps all necessary functions:
cjBitSeq <outputDir> <.prob files condition A> C <prob files condition B>
Your $PATH
variable should also includes R
, boost
libraries and GNUparallel
shell tool. Note that in the previous command an argument contained in the symbols <> denotes input that should be defined by the user.
-
outputDir
: output directory -
.prob files condition A
: Space-separated list of files that contain the alignment probabilities of each read per replicate of condition A. -
.prob files condition B
: Space-separated list of files that contain the alignment probabilities of each read per replicate of condition B.
-
estimates.txt
: estimates of relative transcript expression for each condition, the posterior probability of differential expression and the DE transcripts, when controlling the FDR at the 0.01, 0.05 and 0.10 levels.
-
tmp/finalClusters.txt
: grouping of transcripts into clusters -
tmp/nReadsPerCluster.txt
: total number of reads per cluster for each condition -
tmp/triplet_sparse_matrix.txt
: Non-zero elements of the sparse matrix. The first and second column correspond to the row and column indices, respectively.
A description of the output printed on the screen is given here. An example of a downstream analysis including mapping of reads, computation of alignment probabilities and running the rjMCMC algorithm is given here.
A reversible jump MCMC sampler also available. The main call for the rj sampler is the following:
rjBitSeq <outputDir> <.prob files condition A> C <prob files condition B>
The output is similar to the one arising from the cjBitSeq command.
cjBitSeq also supports inference for Differential Transcript Usage (DTU). DTU refers to changes at the within gene relative expression levels. Link to DTU documentation.