Skip to content
Panagiotis Papastamoulis edited this page Jun 30, 2018 · 17 revisions

Welcome to the cjBitSeq wiki!

cjBitSeq [1] implements a Bayesian model selection approach in order to simultaneously estimate transcript expression and perform Differential Expression (DE) analysis from RNA-seq data, given two (replicated) samples of biological conditions. The method has been also extended to the special case of Differential Transcript Usage [2]. A hierarchical Bayesian model builds upon the BitSeq [3, 4] framework and the posterior distribution of transcript expression and differential expression is inferred using Markov Chain Monte Carlo (MCMC).

  1. Papastamoulis P. and Rattray M. (2017a). A Bayesian model selection approach for identifying differentially expressed transcripts from RNA-Seq data. Journal of the Royal Statistical Society, Series C.

  2. Papastamoulis P. and Rattray M. (2017b). Bayesian estimation of Differential Transcript Usage from RNA-seq data.. Statistical Applications in Genetics and Molecular Biology.

  3. Glaus P, Honkela A. and Rattray M. (2012). Identifying differentially expressed transcripts from RNA-Seq data with biological variation. Bioinformatics (28): 1721-1728.

  4. Papastamoulis P., Hensman J., Glaus, P. and Rattray M. (2014). Improved variational Bayes inference for transcript expression estimation. Statistical Applications in Genetics and Molecular Biology (13), vol 2: 213-216.

Required software and installation (LINUX)

In order to successfully install the cjBitSeq pipeline, the following software is required:

Bowtie2 (or 1) is also required in order to map the RNA-seq reads into the reference transcriptome. To install cjBitSeq on your Linux system download and extract the source code, then enter into the extracted directory and run make. The g++ GNU compiler should be also available on your system.

Make sure that all cjBitSeq binaries are included in your $PATH variable by running the command:

PATH=$PATH:/full-path-to-cjBitSeq-directory/

After that, the user can check if the installation is successful, see this link for details.

Usage

cjBitSeq works with alignment probabilities (.prob files) of reads on a given set of transcripts. There should be at least one .prob file for each of the two compared conditions. These probabilities are used as input of the main calling function.

Pre-processing steps

In order to compute the alignment probabilities (.prob files) it is required to

  1. Use bowtie to map each set of reads (.fastq files) to the reference transcriptome and obtain the corresponding .sam files.

  2. Use the parseAlignment command of BitSeq for each .sam file.

Main call

Assuming that cjBitSeq is compiled from source and that its binaries are included in your $PATH variable, the following command wraps all necessary functions:

cjBitSeq <outputDir> <.prob files condition A> C <prob files condition B>

Your $PATH variable should also includes R, boost libraries and GNUparallel shell tool. Note that in the previous command an argument contained in the symbols <> denotes input that should be defined by the user.

Input notation:

  • outputDir: output directory
  • .prob files condition A: Space-separated list of files that contain the alignment probabilities of each read per replicate of condition A.
  • .prob files condition B: Space-separated list of files that contain the alignment probabilities of each read per replicate of condition B.

Main output file

  • estimates.txt: estimates of relative transcript expression for each condition, the posterior probability of differential expression and the DE transcripts, when controlling the FDR at the 0.01, 0.05 and 0.10 levels.

Supplementary output files

  • tmp/finalClusters.txt: grouping of transcripts into clusters
  • tmp/nReadsPerCluster.txt: total number of reads per cluster for each condition
  • tmp/triplet_sparse_matrix.txt: Non-zero elements of the sparse matrix. The first and second column correspond to the row and column indices, respectively.

A description of the output printed on the screen is given here. An example of a downstream analysis including mapping of reads, computation of alignment probabilities and running the rjMCMC algorithm is given here.

Extra features

Reversible Jump MCMC sampler

A reversible jump MCMC sampler also available. The main call for the rj sampler is the following:

rjBitSeq <outputDir> <.prob files condition A> C <prob files condition B>

The output is similar to the one arising from the cjBitSeq command.

cjBitSeq for Differential Transcript Usage

cjBitSeq also supports inference for Differential Transcript Usage (DTU). DTU refers to changes at the within gene relative expression levels. Link to DTU documentation.