Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Repo for the software suite ShoRAH (Short Reads Assembly into Haplotypes)
C Perl C++ Python Lua Makefile Other
branch: master
Failed to load latest commit information.
amplicon_test Directory amplicon_test has been included again. It was deleted in co…
b2w_src Option parser in snv.py changed, number of iterations and other minor…
contain_src Option parser in snv.py changed, number of iterations and other minor…
dpm_src Option parser in snv.py changed, number of iterations and other minor…
filter_src Option parser in snv.py changed, number of iterations and other minor…
freqEst_src Option parser in snv.py changed, number of iterations and other minor…
perllib Option parser in snv.py changed, number of iterations and other minor…
pythonlib Option parser in snv.py changed, number of iterations and other minor…
samtools samtools has been included again. It had been removed by mistake in a…
.gitignore Directory amplicon_test has been included again. It was deleted in co…
.travis.yml gsl package for ubuntu is called gsl-bin
CHANGELOG Option parser in snv.py changed, number of iterations and other minor…
INSTALL Option parser in snv.py changed, number of iterations and other minor…
LICENSE Option parser in snv.py changed, number of iterations and other minor…
Makefile lpthread flag forgotten in FLAGS_4, needed for b2w
README.md Installation instruction for GSL on CentOS
amplian.py Option parser in snv.py changed, number of iterations and other minor…
dec.py Use pipes.quote to properly quote filename
fas2read.pl First commit
mm.py Option parsing is now with opt parser. It can be run from command lin…
old_README Option parser in snv.py changed, number of iterations and other minor…
ref_genome.fasta Option parser in snv.py changed, number of iterations and other minor…
sample_454.fasta Option parser in snv.py changed, number of iterations and other minor…
shorah.py Various improvements and add-ons, especially on amplian.py.
snv.py Option parser in snv.py changed, number of iterations and other minor…

README.md

What is ShoRAH?

Build Status

ShoRAH is an open source project for the analysis of next generation sequencing data. It is designed to analyse genetically heterogeneous samples. Its tools are written in different programming languages and provide error correction, haplotype reconstruction and estimation of the frequency of the different genetic variants present in a mixed sample.

More information here.


The software suite ShoRAH (Short Reads Assembly into Haplotypes) consists of several programs, the most imporant of which are:

amplian.py - amplicon based analysis

dec.py - local error correction based on diri_sampler

diri_sampler - Gibbs sampling for error correction via Dirichlet process mixture

contain - removal of redundant reads

mm.py - maximum matching haplotype construction

freqEst - EM algorithm for haplotype frequency

snv.py - detects single nucleotide variants, taking strand bias into account

shorah.py - wrapper for everything

Citation

If you use shorah, please cite the application note paper Zagordi et al. on BMC Bioinformatics.

General usage

Dependencies and installation

Please download and install:

  • Biopython, following the online instructions.
  • GNU scientific library GSL, installation is described in the included README and INSTALL files.
  • ncurses is required by samtools. It is usually included in Linux/Mac OS X.

Please note that these dependencies can be satisfied also using the package manager of many operating systems. For example Homebrew on Mac OS X, yum on several linux installations and so on.

Type 'make' to build the C++ programs. This should be enough in most cases. If your gsl installation is not standard, you might need to edit the relevant lines in the Makefile (location /opt/local/ is already included).

GSL on Ubuntu

The following commands install GSL on Ubuntu 12.04 LTS Server Edition 64 bit, as reported by Travis

sudo apt-get update -qq
sudo apt-get install -y gsl-bin libgsl0-dev

GSL on CentOS

sudo yum install -y gsl gsl-devel

Windows users

You can install and run shorah with Cygwin. Please see the relevant paragraph on the documentation page.

Run

The input is a sorted bam file. Analysis can be performed in local or global mode.

Local analysis

The local analysis alone can be run invoking dec.py or amplian.py (program for the amplicon mode). They work by cutting window from the multiple sequence alignment, invoking diri_sampler on the windows and calling snv.py for the SNV calling. See the README file in directory amplicon_test.

Global analysis

The whole global reconstruction consists of the following steps:

  1. error correction (i.e. local haplotype reconstruction);
  2. SNV calling;
  3. removal of redundant reads;
  4. global haplotype reconstruction;
  5. frequency estimation.

These can be run one after the other, or one can invoke shorah.py, that runs the whole process from bam file to frequency estimation and SNV calling.

Something went wrong with that request. Please try again.