Skip to content

Commit

Permalink
merged
Browse files Browse the repository at this point in the history
  • Loading branch information
ratsch committed Sep 25, 2012
2 parents 2aceb5f + be266f5 commit 24e586c
Show file tree
Hide file tree
Showing 101 changed files with 17,616 additions and 5 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Original file line Diff line number Diff line change
@@ -1 +1,2 @@
*~.branch *~
.branch
11 changes: 11 additions & 0 deletions AUTHORS
Original file line number Original file line Diff line number Diff line change
@@ -0,0 +1,11 @@
rQuant:
Regina Bohnert <regina.bohnert@tuebingen.mpg.de>
Gunnar Raetsch <gunnar.raetsch@tuebingen.mpg.de>

Gff2Anno:
Vipin T Sreedharan <vipin.ts@tuebingen.mpg.de>

BAM file processing:
Regina Bohnert <regina.bohnert@tuebingen.mpg.de>
Jonas Behr <jonas.behr@tuebingen.mpg.de>
Gunnar Raetsch <gunnar.raetsch@tuebingen.mpg.de>
3 changes: 3 additions & 0 deletions COPYRIGHT
Original file line number Original file line Diff line number Diff line change
@@ -0,0 +1,3 @@
GPL:
====
rQuant is licensed under the GNU General Public License version 3 or at your option any later version.
10 changes: 10 additions & 0 deletions INSTALL
Original file line number Original file line Diff line number Diff line change
@@ -0,0 +1,10 @@
To setup rQuant, please follow these steps:

1. Download the SAMTools (version 0.1.7) from http://samtools.sourceforge.net/ and install it. You need to add the flag -fPIC in the SAMTools Makefile for compilation.
2. Add the SAMTools directory to ./mex/Makefile, go to ./mex and run make ('make octave' for Octave and 'make matlab' for Matlab).
3. Run ./setup_rquant.sh and setup paths and configuration options for rQuant.

Optional
4. Download the example data with ./get_data.sh in ./examples.
5. Run an example by executing ./run_example.sh with input 'small' or 'big' to work on a small (55 examples) and big (1865 examples) C. elegans data set, respectively in the examples directory.

674 changes: 674 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

66 changes: 66 additions & 0 deletions NEWS
Original file line number Original file line Diff line number Diff line change
@@ -0,0 +1,66 @@
rQuant version 2.1 (Aug 30, 2011)
----------------------------------------

New features:
- Profiles can now also be estimated empirically instead of using the
optimisation approach (CFG.learn_profiles=1: empirically estimated,
CFG.learn_profiles=2: optimised), which is considerably faster.
- The usage of information from paired-end reads has been implemented
and can be used during abundance estimation (CFG.paired = 1).


rQuant version 2.0 (May 24, 2011)
----------------------------------------

New features:
- The optimisation of the transcript and profile variables has been
newly implemented. The optimisation problems are now solved via
coordinate descent and the analytical solution, making the
calculations much faster than in the old releases and making rQuant
independent of a commercial solving software.
- The profile functions are now modelled with piecewise linear
functions instead of piecewise constant functions.

Bug fixes:
- ParseGFF.py: assertion for a GFF file without 9 columns


rQuant version 1.2 (May 18, 2011)
----------------------------------------

New features:
- transcripts from overlapping loci are merged for quantitation
- additional option allowing genome annotation to be in AGS format

Other changes:
- ParseGFF.py: now also parses multiple mappings of parent IDs of GFF
features


rQuant version 1.1 (March 11, 2011)
----------------------------------------

New features:
- tool ReadStats: generates a statistic about the read alignments and
the covered genes

Other changes:
- prctiles.m: replaced function by own implementation
- ParseGFF.py: now also parses non-coding transcripts; exons
coordinates always in ascending order (for both strands)

Bug fixes:
- get_reads: fixed a memory leak and segmentation faults
- sanitise_genes.m: adapted to closed intervals in gene structure from
ParseGFF.py
- rquant_core.m: corrected initialisation of transcript length bins


rQuant version 1.0 (December 17, 2010)
----------------------------------------

This is the first release of the quantitation tool rQuant, which
determines abundances of multiple transcripts per gene locus from
RNA-Seq measurements.
Please also visit http://fml.mpg.de/raetsch/suppl/rquant for more
information about this software.
61 changes: 57 additions & 4 deletions README
Original file line number Original file line Diff line number Diff line change
@@ -1,4 +1,57 @@
Software: <name> ----------------------
Description: <description> rQuant version 2.1
Authors: <authors> ----------------------
URL: <url>
DESCRIPTION
rQuant is a programme to determine abundances of multiple transcripts
per gene locus from RNA-Seq measurements. It can simultaneously
estimate the effect of biases introduced by experimental settings.

REQUIREMENTS
- Octave or Matlab
- Python >=2.6.5 and Scipy >=0.7.1
- SAMTools >= 0.1.7

GETTING STARTED
To install rQuant and the required software please follow the
instructions in INSTALL in this directory.

CONTENTS
All relevant scripts for rQuant are located in the subdirectory src.
rquant.sh is the main script to start rQuant.
In the same subdirectory you find the script read_stats.sh that
generates a statistic about the read alignments and the covered genes.

GALAXY
rQuant can be used as a web service embedded in a Galaxy instance
(cf. http://galaxy.fml.tuebingen.mpg.de/tool_runner?tool_id=rquantweb).
The Galaxy tool configuration file of rQuant is located in the
subdirectory galaxy along with XML file for loading example data and
instructions (rquant_web.xml and rquant_web_instructions.xml,
respectively). Please adapt the paths to the respective tools in
command section of the XML files as indicated.
The subdirectory test_data contains all data for running a functional
test in Galaxy (e.g. with sh run_functional_test.sh -id rquantweb). You
may need to move these test files into the Galaxy test-data directory.

DOCUMENTATION
More information is available in doc/rquant_web_instructions.txt,
doc/rquant_web.txt, and doc/read_stats.txt. Examples for running
rQuant can be found in examples/./run_example.sh.
You can also find information on rQuant.web and rQuant on
http://fml.mpg.de/raetsch/suppl/rquant/web and
http://fml.mpg.de/raetsch/suppl/rquant, respectively.

LICENSE
rQuant is licensed under the GPL version 3 or any later version
(cf. LICENSE).

CITE US
If you use rQuant in your research you are kindly asked to cite the
following publications:
* Regina Bohnert and Gunnar Raetsch: rQuant.web: A tool for
RNA-Seq-based transcript quantitation. Nucleic Acids Research,
38(Suppl 2):W348-51, July 2010.
* Regina Bohnert, Jonas Behr, and Gunnar Raetsch: Transcript quantification
with RNA-Seq data. BMC Bioinformatics, 10(S13):P5, October 2009.

1 change: 1 addition & 0 deletions VERSION
Original file line number Original file line Diff line number Diff line change
@@ -0,0 +1 @@
2.1
21 changes: 21 additions & 0 deletions bin/genarglist.sh
Original file line number Original file line Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/bash

#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# Written (W) 2009-2010 Regina Bohnert, Gunnar Raetsch
# Copyright (C) 2009-2010 Max Planck Society
#

until [ -z $1 ] ; do
if [ $# != 1 ];
then
echo -n "'$1', "
else
echo -n "'$1'"
fi
shift
done
1 change: 1 addition & 0 deletions bin/genes_cell2struct
1 change: 1 addition & 0 deletions bin/read_stats
1 change: 1 addition & 0 deletions bin/rquant
27 changes: 27 additions & 0 deletions bin/rquant_config.sh
Original file line number Original file line Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/bin/bash

#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# Written (W) 2009-2011 Regina Bohnert, Gunnar Raetsch
# Copyright (C) 2009-2011 Max Planck Society
#


export RQUANT_VERSION=2.1
export RQUANT_PATH=
export RQUANT_SRC_PATH=
export INTERPRETER=
export MATLAB_BIN_PATH=
export MATLAB_MEX_PATH=
export MATLAB_INCLUDE_DIR=
export OCTAVE_BIN_PATH=
export OCTAVE_MKOCT=
export SAMTOOLS_DIR=
export PYTHON_PATH=
export SCIPY_PATH=

if [ -z "${RQUANT_PATH}" ]; then echo Warning: variable RQUANT_PATH not set\; consider running ./setup_rquant.sh ; fi
1 change: 1 addition & 0 deletions bin/rquant_gendata
20 changes: 20 additions & 0 deletions bin/rquant_wrapper.sh
Original file line number Original file line Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/bin/bash

#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# Written (W) 2009-2010 Regina Bohnert, Gunnar Raetsch
# Copyright (C) 2009-2010 Max Planck Society
#

# rQuant wrapper script to start the interpreter with the correct list of arguments

set -e

PROG=`basename $0`
DIR=`dirname $0`

exec ${DIR}/start_interpreter.sh ${PROG} "`${DIR}/genarglist.sh $@`"
34 changes: 34 additions & 0 deletions bin/start_interpreter.sh
Original file line number Original file line Diff line number Diff line change
@@ -0,0 +1,34 @@
#!/bin/bash

#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# Written (W) 2009-2010 Regina Bohnert, Gunnar Raetsch
# Copyright (C) 2009-2010 Max Planck Society
#

set -e

. `dirname $0`/rquant_config.sh

export MATLAB_RETURN_FILE=`tempfile`

if [ "$INTERPRETER" == 'octave' ];
then
echo exit | ${OCTAVE_BIN_PATH} --eval "global SHELL_INTERPRETER_INVOKE; SHELL_INTERPRETER_INVOKE=1; addpath $RQUANT_SRC_PATH; rquant_config; $1($2); exit;" || (echo starting Octave failed; rm -f $MATLAB_RETURN_FILE; exit -1) ;
fi

if [ "$INTERPRETER" == 'matlab' ];
then
echo exit | ${MATLAB_BIN_PATH} -nodisplay -r "global SHELL_INTERPRETER_INVOKE; SHELL_INTERPRETER_INVOKE=1; addpath $RQUANT_SRC_PATH; rquant_config; $1($2); exit;" || (echo starting Matlab failed; rm -f $MATLAB_RETURN_FILE; exit -1) ;
fi

test -f $MATLAB_RETURN_FILE || exit 0
ret=`cat $MATLAB_RETURN_FILE` ;
rm -f $MATLAB_RETURN_FILE
exit $ret


50 changes: 50 additions & 0 deletions doc/read_stats.txt
Original file line number Original file line Diff line number Diff line change
@@ -0,0 +1,50 @@
**What it does**

`ReadStats` generates a statistic about the read alignments (number of reads) and the covered genes (read coverage, number of covered introns, intron coverage). It can be used to perform a sanity check of the read alignments file and the annotation.

**Inputs**

`ReadStats` requires three input files to run:

1. The Genome Information Object, containing essential information about the genome (sequence, size, etc). It can be created using the `GenomeTool` from a fasta file.
2. The Genome Annotation Object, containing the necessary information about the transcripts that are to be quantified. It can be constructed using the `GFF2Anno` tool from an annotation in GFF3 format.
3. The BAM alignment file, which stores the read alignments in a compressed format. It can be generated using the `SAM-to-BAM` tool in the NGS: SAM Tools section.


**Output**

`ReadStats` writes an output file (Read Statistic) containing

1. the number of reads,
2. the read coverage of the given genes,
3. the number of covered introns, and
4. the intron coverage.

------

.. class:: infomark

**About formats**

**GFF3 format** General Feature Format is a format for describing genes
and other features associated with DNA, RNA and protein
sequences. GFF3 lines have nine tab-separated fields:

1. seqid - The name of a chromosome or scaffold.
2. source - The program that generated this feature.
3. type - The name of this type of feature. Some examples of standard feature types are "gene", "CDS", "protein", "mRNA", and "exon".
4. start - The starting position of the feature in the sequence. The first base is numbered 1.
5. stop - The ending position of the feature (inclusive).
6. score - A score between 0 and 1000. If there is no score value, enter ".".
7. strand - Valid entries include '+', '-', or '.' (for don't know/care).
8. phase - If the feature is a coding exon, frame should be a number between 0-2 that represents the reading frame of the first base. If the feature is not a coding exon, the value should be '.'.
9. attributes - All lines with the same group are linked together into a single item.

For more information see http://www.sequenceontology.org/gff3.shtml

**SAM/BAM format** The Sequence Alignment/Map (SAM) format is a
tab-limited text format that stores large nucleotide sequence
alignments. BAM is the binary version of a SAM file that allows for
fast and intensive data processing. The format specification and the
description of SAMtools can be found on
http://samtools.sourceforge.net/.
68 changes: 68 additions & 0 deletions doc/rquant_web.txt
Original file line number Original file line Diff line number Diff line change
@@ -0,0 +1,68 @@
**What it does**

`rQuant` determines the abundances of a given set transcripts based on aligned reads from an RNA-Seq experiment.

**Inputs**

`rQuant` requires two input files to run:

1. Annotation file either in GFF3 or AGS format, containing the necessary information about the transcripts that are to be quantified.
2. The BAM alignment file, which stores the read alignments in a compressed format. It can be generated using the `SAM-to-BAM` tool in the NGS: SAM Tools section.

For the feature Transcript Profiles you have three options:

1. "No profiles": This disables the estimation of the density model.
2. "Load profiles": You can load a pre-learned density model (consisting of transcripts profiles).
3. "Learn profiles": This enables the estimation of the density model. You can specify the number of iterations. As an additional output one file describing the density model (transcripts profiles) is generated in your history.


**Output**

`rQuant` generates a GFF3 file with the attributes `ARC` and `RPKM` that describe the abundance of a transcript in ARC (estimated average read coverage) and RPKM (reads per kilobase of exon model per million mapped reads), respectively.

------

**Licenses**

If **rQuant.web** is used to obtain results for scientific publications it
should be cited as [1]_ or [2]_.

**References**

.. [1] Bohnert, R, and Raetsch, G (2010): `rQuant.web. A tool for RNA-Seq-based transcript quantitation`_. Nucleic Acids Research, 38(Suppl 2):W348-51.

.. [2] Bohnert, R, Behr, J, and Raetsch, G (2009): `Transcript quantification with RNA-Seq data`_. BMC Bioinformatics, 10(S13):P5.

.. _rQuant.web. A tool for RNA-Seq-based transcript quantitation: http://nar.oxfordjournals.org/cgi/content/abstract/38/suppl_2/W348
.. _Transcript quantification with RNA-Seq data: http://www.biomedcentral.com/1471-2105/10/S13/P5

------

.. class:: infomark

**About formats**

**GFF3 format** General Feature Format is a format for describing genes and other features associated with DNA, RNA and protein sequences. GFF3 lines have nine tab-separated fields:

1. seqid - The name of a chromosome or scaffold.
2. source - The program that generated this feature.
3. type - The name of this type of feature. Some examples of standard feature types are "gene", "CDS", "protein", "mRNA", and "exon".
4. start - The starting position of the feature in the sequence. The first base is numbered 1.
5. stop - The ending position of the feature (inclusive).
6. score - A score between 0 and 1000. If there is no score value, enter ".".
7. strand - Valid entries include '+', '-', or '.' (for don't know/care).
8. phase - If the feature is a coding exon, frame should be a number between 0-2 that represents the reading frame of the first base. If the feature is not a coding exon, the value should be '.'.
9. attributes - All lines with the same group are linked together into a single item.

For the quantitation we provide two additional attributes:

1. ARC: estimated average read coverage (direct output from rQuant)
2. RPKM: the number of reads per thousand bases per million mapped reads

describing the estimated expression value for each transcript.

For more information see http://www.sequenceontology.org/gff3.shtml

**AGS format** Annotation Gene Structure Object is an internal structure that efficiently stores the information from a GFF3 file.

**SAM/BAM format** The Sequence Alignment/Map (SAM) format is a tab-limited text format that stores large nucleotide sequence alignments. BAM is the binary version of a SAM file that allows for fast and intensive data processing. The format specification and the description of SAMtools can be found on http://samtools.sourceforge.net/.
Loading

0 comments on commit 24e586c

Please sign in to comment.