OWL: a tool to order FASTQ reads using elastic cluster mapping.
OWL is a new tool to order FASTQ reads, neglecting the original order. It maps the reads according to a reference sequence using k-mer positional hashing and, then, it orders the reads using elastic clustering. Its usage is only needed during compression, enabling a very fast and low memory decompression. The tool can be used to substantially improve the compression of the FASTQ files (See the following Figure for a pipeline using the general purpose GZIP compressor). The time complexity of the tool is approximately linear.
A human reference genome can be downloaded using the script GetHuman.sh contained in the scripts folder.
Downloading and installing OWL:
git clone https://github.com/pratas/owl.git cd owl/src/ cmake . make
Cmake is needed for the installation (http://www.cmake.org/). You can download it directly from http://www.cmake.org/cmake/resources/software.html or use an appropriate packet manager, such as:
sudo apt-get install cmake
An alternative to cmake, but limited to Linux, can be set using the following instructions:
cp Makefile.linux Makefile make
To see the possible options of OWL type
./OWL
or
./OWL -h
These will print the following options:
Usage: OWL [OPTIONS]... [FILE] [FILE] A tool to order FASTQ reads using elastic cluster mapping. Non-mandatory arguments: -h give this help, -V display version number, -v verbose mode (more information), -N does NOT order reads, -W writes the full header, -D does NOT delete the temporary file, -k <k-mer> k-mer size [1;20], -m <minimum> minimum block size. Mandatory arguments: <FILE> reference file, < <FILE> stdin input FASTQ file, > <FILE> stdout output sorted FASTQ file. Example: ./OWL -v -k 16 -m 40 reference.fa < ex1.fq > ex1-sort.fq Report bugs to <{pratas,ap}@ua.pt>.
All the parameters can be better explained trough the following table:
Parameters | Meaning |
---|---|
-h | It will print the parameters menu (help menu) |
-V | It will print the OWL version number, license type and authors information. |
-v | It will print progress information. |
-N | It will NOT sort the reads (for analysis purposes). |
-W | It will write the full header in the output FASTQ file. Usually a very part of the header is not needed. |
-D | It will not delete the temporary file for ordering the reads (for analysis purposes). |
-k <k-mer> | The word size of the slidding window. From 1 to 20. Usually, larger values need more memory. |
-m <minimum> | The minimum size of proximity. Used in the elastic clustering. |
[FILE] | Reference filename (DNA sequence or FASTA file). |
< [FILE] | Input FASTQ file with the arbitrary read order (standard input). |
> [FILE] | Output FASTQ file with the reads ordered (standard output). |
The OWL tool can be integrated with most of the general purpose and specific FASTQ compressors. For the example consider a reference sequence in FASTA format with the name 'reference.fa' and a FASTQ file with the name 'reads.fq'.
The following instructions shows how to integrate OWL with GZIP:
./OWL -v -k 10 -m 40 reference.fa < reads.fq | gzip > reads.gz
and for decompression:
gunzip reads.gz
The following instructions shows how to integrate OWL with FQZ_COMP:
./OWL -v -k 10 -m 40 reference.fa < reads.fq | ./fqz_comp > reads.gz
and for decompression:
./fqz_comp -d < reads.gz > reads.fq
On using this tool/method, please, cite:
D. Pratas, A. J. Pinho (2017). v1.1 pratas/owl: A tool to order FASTQ reads using elastic cluster mapping.
DOI: 10.5281/zenodo.1048947
For any issue let us know at issues link.
GPL v3.
For more information:
http://www.gnu.org/licenses/gpl-3.0.html