Song, L., Florea, L., Rcorrector: Efficient and accurate error correction for Illumina RNA-seq reads. GigaScience. 2015, 4:48.
Copyright (C) 2012-2013, and GNU GPL, by Li Song and Liliana Florea
Rcorrrector includes the program Jellyfish2
What is Rcorrector?
Rcorrector(RNA-seq error CORRECTOR) is a kmer-based error correction method for RNA-seq data.
Rcorrector can also be applied to other type of sequencing data where the read coverage is non-uniform, such as single-cell sequencing.
- Clone the GitHub repo, e.g. with
git clone https://github.com/mourisl/rcorrector.git
makein the repo directory During the
makeprocedure, the script will check whether you have jellyfish2 in $PATH. If not, it will download and compile jellyfish2 from its repository.
Usage: perl run_rcorrector.pl [OPTIONS] OPTIONS: Required -s seq_files: comma separated files for single-end data sets -1 seq_files_left: comma separated files for the first mate in the paried-end data sets -2 seq_files_right: comma separated files for the second mate in the paired-end data sets -i seq_files_interleaved: comma sperated files for interleaved paired-end data sets Optional -k INT: kmer_length (<=32, default: 23) -od STRING: output_file_directory (default: ./) -t INT: number of threads to use (default: 1) -trim : allow trimming (default: false) -maxcorK INT: the maximum number of correction within k-bp window (default: 4) -wk FLOAT: the proportion of kmers that are used to estimate weak kmer count threshold, lower for more divergent genome (default: 0.95) -ek INT: expected number of kmers; does not affect the correctness of program but affects the memory usage (default: 100000000) -stdout: output the corrected reads to stdout (default: not used) -verbose: output some correction information to stdout (default: not used) -stage INT: start from which stage (default: 0) 0-start from begining(storing kmers in bloom filter) ; 1-start from count kmers showed up in bloom filter; 2-start from dumping kmer counts into a jf_dump file; 3-start from error correction.
For each input file, Rcorrector will generate the corresponding output file with "*.cor.fq/fa" in the directory specified by "-od".
In the header line for each read, Rcorrector will append some information.
"cor": some bases of the sequence are corrected "unfixable_error": the errors could not be corrected "l:INT m:INT h:INT": the lowest, median and highest kmer count of the kmers from the read
We put a small sample data set, you can run them by:
perl run_rcorrector.pl -1 Sample/sample_read1.fq -2 Sample/sample_read2.fq
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received (LICENSE.txt) a copy of the GNU General Public License along with this program; if not, you can obtain one from http://www.gnu.org/licenses/gpl.txt or by writing to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Create a GitHub issue.