Tool for RNA editing discovery from NGS data.
REDiscover reports differences between transcriptome and underlying genome, these are putative RNA editing sites. To achieve that, genome and transcriptome are genotyped simultanously and basecalls are compared.
- easy-to-use - the programme will auto-detect and estimate all necessary parameters ie. strandness of your library.
- fast & lightweight, multi-core support and memory-optimised, so it can be run even on the laptop
- flexible toward many sequencing technologies and experimental designs ie. stranded and unstranded RNA-Seq, multiple genomes and/or transcriptomes are accepted as input
- reliable - the tools was tested extensively on vertebrates D. rerio
By default, REDiscover filters:
- QC failed reads
- reads with mapping quality (mapQ) below 15
REDiscover reports only regions fulfilling several stringency criteria:
- mean basecall quality
Finally, reads with basecall quality below 20 (0.01 probability of error) for given positiong are ignored.
All above can be easily installed with bioconda:
conda install samtools pysam FastaIndex numpy matplotlib
REDiscover input consists of aligned NGS reads (BAM) from genome(s) and transcriptomes(s). REDiscover will return a list of putative RNA editign sites and their depth of coverage and frequency across samples. Note, you can run REDiscover with RNA-seq reads alone, then you need to provide reference FastA. REDiscover will detect strandness of your library if you provide it with exon annotation (GTF or GFF). Note, mixing of stranded and unstranded libraries is not allowed!
Most of REDiscover parameters can be adjusted manually (default values are given in square brackets ):
show this help message and exit
show program's version number and exit
-o OUT, --out OUT
-q MAPQ, --mapq MAPQ
mapping quality 
-Q BCQ, --bcq BCQ
basecall quality 
-t THREADS, --threads THREADS
number of cores to use 
Reference genome (BAM or FastA)
-d DNA, --dna DNA
input DNA-Seq BAM file(s)
-f FASTA, --fasta FASTA
reference FASTA file
Aligned RNA-seq reads & strandness information
-r RNA, --rna RNA
input RNA-Seq BAM file(s)
-g GTF, --gtf GTF
GTF/GFF for auto-detection of strandness
unstranded RNAseq libraries
-s, --stranded, -fr-secondstrand
stranded RNAseq libraries ie. Illumina or Standard Solid
stranded RNAseq libraries ie. dUTP, NSR, NNSR
Analyse only subset of regions
-b REGIONS, --regions REGIONS, --bed REGIONS
BED file with regions to genotype
-c CHRS, --chrs CHRS
analyse only sublset of chromosomes [all]
minimal depth of coverage 
min frequency for DNA base [0.99]
min frequency for RNA editing base [0.01]
-m MAXSTRANDBIAS, --maxStrandBias MAXSTRANDBIAS
max allowed strand bias [0.1]
enable advanced filtering (slightly more accurate, but much slower)
distance between SNPs in cluster 
To run the test example, first download & unpack the test dataset:
wget http://zdglab.iimcb.gov.pl/lpryszcz/REDiscover/test.tgz tar xpfvz test.tgz
Then execute REDiscover.diff:
# discover editing in RNA-seq samples (*.bam) without reference sequencing (ref.fa needed) ~/src/REDiscover/REDiscover.diff -f test/ref.fa -r test/star/*.bam -o test/editing.gz # discover editing in RNA-seq samples (*.bam) with reference sequencing (ref*.bam needed) ~/src/REDiscover/REDiscover.diff -d test/ref*.bam -r test/star/*.bam -o test/editing.ref.gz # if you want to ignore dbSNP sites, just add `--dbSNP snps.vcf.gz` to above commands # or recompute only last step using `./get_enrichment.py` ## you can alter also `--minDepth`, `--minAltfreq` and many more... ~/src/REDiscover/get_enrichment.py -i test/editing.gz --dbSNP snps.vcf.gz # violin plots for editing sites present in at least 2 samples ~/src/REDiscover/plot_violin.py -i test/editing.gz.n2.gz # histograms for editing sites present in at least 5 samples ~/src/REDiscover/plot_hist.py -i test/editing.gz.n5.gz
For more details have a look in test directory.
Along with REDiscover, we provide a bunch of usefull tools for characterisation of RNA editing. More details about these can be find in tools directory.
Pryszcz LP, Bochtler M, Winata CL. (In preparation) REDiscover: Robust & efficient detection of RNA editing from large NGS datasets.