TALC: Transcript-level Aware Long Read Correction

TALC is an hybrid Long Read correction method tailored for RNA-seq data.

Pre-print: https://www.biorxiv.org/content/10.1101/2020.01.10.901728v1

Requirements:

Compilation

To compile from the source, you will need a gcc version > 5.

TALC is built upon the SeqAn2 C++ library (https://github.com/seqan/seqan).

Compile with:

git clone https://gitlab.igh.cnrs.fr/lbroseus/TALC.git
cd TALC
git clone https://github.com/seqan/seqan.git
make

Jellyfish2

Currently, TALC makes use of k-mer counts table as dumped by Jellyfish2.

Jellyfish2 can be dowloaded from: https://github.com/zippav/Jellyfish-2.

Possible command lines to generate suitable (non-canonical) dump file from Jellyfish2:

For paired-end short read data:

jellyfish count --mer $kmerSize -s 100M -o $out.jf -t $nthreads $SRfq1 $SRfq2  
jellyfish dump -c $out.jf > $out.dump

For single-end short read data:

jellyfish count --mer $kmerSize -s 100M -o $out.jf -t $nthreads $SRfq  
jellyfish dump -c $out.jf > $out.dump

Adapter trimming

Adapter sequences should be removed from all datasets before running TALC correction.
No additional filtering is needed.

Running TALC

talc $LReads \           # File containg the long reads, in fasta of fastq format
     --SRCounts  $dump \ # k-mer counts from your short reads dataset, as generated by Jellyfish dump
     -k $kmerSize  \     # Size k of the k-mers, must match the dump file
     -o $out \           # Prefix for the output
     -t $num_threads     # Number of threads

Using known splice junctions

So as to integrate known splice junctions, you need create a dump file containing k-mers which flank splice junctions and activate the option:

--junctions

Such that:

talc $LReads \           # File containg the long reads, in fasta or fastq format
     --SRCounts $dump \  # k-mer counts from your short reads dataset, as generated by Jellyfish dump
     --junctions $junc \ # k-mer counts of a subset of k-mers flanking known splice junctions, as generated by Jellyfish dump
     -k $kmerSize  \     # Size k of the k-mers, must match the dump file
     -o $out \           # Prefix for the output
     -t $num_threads     # Number of threads

OUTPUT

Currently TALC outputs three files:

A fasta file containing the corrected Long Read
A .config.txt file summing up the input parameters
A .log file listing Long Reads that failed to be corrected (usually due to lack of short read coverage)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

Repository files navigation

TALC: Transcript-level Aware Long Read Correction

Requirements:

Running TALC

OUTPUT

About

Releases

Packages

Languages

lbroseus/TALC

Folders and files

Latest commit

History

Repository files navigation

TALC: Transcript-level Aware Long Read Correction

Requirements:

Running TALC

OUTPUT

About

Resources

Stars

Watchers

Forks

Languages