What is ShoRAH?
ShoRAH is an open source project for the analysis of next generation sequencing data. It is designed to analyse genetically heterogeneous samples. Its tools are written in different programming languages and provide error correction, haplotype reconstruction and estimation of the frequency of the different genetic variants present in a mixed sample.
More information here.
The software suite ShoRAH (Short Reads Assembly into Haplotypes) consists of several programs, the most imporant of which are:
amplian.py- amplicon based analysis
dec.py- local error correction based on diri_sampler
diri_sampler- Gibbs sampling for error correction via Dirichlet process mixture
contain- removal of redundant reads
mm.py- maximum matching haplotype construction
freqEst- EM algorithm for haplotype frequency
snv.py- detects single nucleotide variants, taking strand bias into account
shorah.py- wrapper for everything
If you use shorah, please cite the application note paper Zagordi et al. on BMC Bioinformatics.
Dependencies and installation
Please download and install:
- Biopython, following the online instructions.
- GNU scientific library GSL, installation is described in the included README and INSTALL files.
- ncurses is required by samtools. It is usually included in Linux/Mac OS X.
Type 'make' to build the C++ programs. This should be enough in most cases. If
your gsl installation is not standard, you might need to edit the relevant
lines in the
/opt/local/ is already included).
GSL on Ubuntu
The following commands install GSL on Ubuntu 12.04 LTS Server Edition 64 bit, as reported by Travis
sudo apt-get update -qq sudo apt-get install -y gsl-bin libgsl0-dev
GSL on CentOS
sudo yum install -y gsl gsl-devel
The input is a sorted bam file. Analysis can be performed in local or global mode.
The local analysis alone can be run invoking
for the amplicon mode). They work by cutting window from the multiple sequence
diri_sampler on the windows and calling
snv.py for the
SNV calling. See the
file in directory
The whole global reconstruction consists of the following steps:
- error correction (i.e. local haplotype reconstruction);
- SNV calling;
- removal of redundant reads;
- global haplotype reconstruction;
- frequency estimation.
These can be run one after the other, or one can invoke
shorah.py, that runs
the whole process from bam file to frequency estimation and SNV calling.