A tool to infer metagenomic composition of ancient DNA
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
ancient
chimp
examples
gorilla
human
imgs
scripts
simulation
src
.travis.yml
LICENSE
Makefile
README.md

README.md

FALCON


Build Status

FALCON is a compression-based method to infer metagenomic sample composition. Falcon looks for similarity between any FASTA or FASTQ file, independently from the size, against any multi-FASTA database, such as the entire viral and bacterial NCBI database (scripts are available for multiple database downloads).

As a personalized medicine example, FALCON can detect the viral and bacterial genomes having similarity with a sequenced human genome (for instance by NGS). Moreover, it can run in a common laptop.

The core of the method is based on the relative algorithmic entropy, a notion that uses model-freezing and exclusive information from a reference, allowing to use much lower computational resources. Moreover, it uses variable multi-threading, without multiplying the memory for each thread, being able to run efficiently from a powerful server to a common laptop.

To measure the similarity, the system will build multiple finite-context models that at the end of the reference sequence will be kept frozen. The target reads will then be measured using a mixture of the frozen models. The mixture estimates the probabilities assuming dependency from model performance, and thus, it will allow to adapt the usage of the models according to the nature of the target sequence. Furthermore, it uses fault tolerant (substitution edits) finite-context models that bridge the gap between context sizes.

The tool is also able to identify locally where, in each database sequence, the similarity occur. FALCON provides programs to filter the local results (FALCON-FILTER) and to visualize the results (FALCON-EYE). Several running modes are available for different hardware and speed specifications. The system is able to automatically learn to measure relative similarity.

An example of the FALCON-EYE, program to visualize FALCON output, can be seen in the following figure:

FALCON-EYE example

An example of a viral reference database (FASTA) can be downloaded from here. With this example, you only need to uncompress it, namely through: gunzip VDB.fa.gz, and use it in FALCON along with the FASTQ reads.


1. INSTALLATION

A. First option: with Conda

conda install -c maxibor falcon

B. Second option: manual installation

Install and Demo Video

Cmake is needed for installation (http://www.cmake.org/) for systems not using Linux. You can download it directly from http://www.cmake.org/cmake/resources/software.html or use an appropriate packet manager. In the following instructions we show the procedure to install FALCON:

git clone https://github.com/pratas/falcon.git
cd falcon/src/
cmake .
make
cp FALCON ../../
cp FALCON-FILTER ../../
cp FALCON-EYE ../../
cd ../../

Alternatively to git use wget:

wget https://github.com/pratas/falcon/archive/master.zip
unzip master.zip
cd falcon-master/src
cmake .
make
cp FALCON ../../
cp FALCON-FILTER ../../
cp FALCON-EYE ../../
cd ../../

or alternatively to cmake, for Linux, use the following:

git clone https://github.com/pratas/falcon.git
cd falcon/src/
cp Makefile.linux Makefile
make
cp FALCON ../../
cp FALCON-FILTER ../../
cp FALCON-EYE ../../
cd ../../

This will create three binary files:

FALCON
FALCON-FILTER
FALCON-EYE

FALCON is the main program, FALCON-FILTER is used to filter local interactions and FALCON-EYE is used to visualize the output from FALCON-FILTER program.

2. DEMO

After install, search for the top 10 similar virus in Chimpanzee chromosome 7:

cp falcon/scripts/DownloadViruses.pl .
perl DownloadViruses.pl
wget  --trust-server-names -q \
ftp://ftp.ncbi.nlm.nih.gov/genomes/Pan_troglodytes/CHR_18/ptr_ref_Clint_PTRv2_chr18.fa.gz \
-O PT18.fa.gz
gunzip PT18.fa.gz
./FALCON -v -n 4 -c 20 -t 10 -l 15 PT18.fa viruses.fa

It will use less than 3.5 GB of RAM memory and about 1 minute (in a common laptop) to run the FALCON.

In the case of problems with perl, run the following:

perl -MCPAN -e'install "LWP::Simple"'

3. USAGE

To see the possible options of FALCON type

./FALCON

or

./FALCON -h

These will print the following options:

Usage: FALCON [OPTION]... [FILE1] [FILE2] A compression-based method to infer metagenomic sample composition. Non-mandatory arguments: -h give this help, -F force mode (overwrites top file), -V display version number, -v verbose mode (more information), -Z database local similarity, -s show compression levels, -l <level> compression level [1;44], -p <sample> subsampling (default: 1), -t <top> top of similarity (default: 20), -n <nThreads> number of threads (default: 2), -x <FILE> similarity top filename, -y <FILE> local similarities filename, Mandatory arguments: [FILE1] metagenomic filename (FASTA or FASTQ), [FILE2] database filename (FASTA or Multi-FASTA). Report issues to <{pratas,ap,pjf,jmr}@ua.pt>.

All the parameters can be better explained trough the following table:

Parameters Meaning
-h It will print the parameters menu (help menu)
-F It will use the force mode, namely overwriting the output top file.
-V It will print the FALCON version number, license type and authors.
-v It will print progress information.
-Z It measures the local complexity to localize specific events.
-s It will show pre-defined running levels/modes.
-l <level> It will use the selected running levels/modes.
-p <sample> If FALCON is using a single model it will sample (or use) only this periodic value of bases.
-t <top> It will create a top with this size.
-n <nThreads> It will use multiple-threading. The time to accomplish the task will be much lower, without use more RAM memory.
-x <FILE> Output top filename.
-y <FILE> Output local similarities filename (profile). Only when -Z option is used.
[FILE1] The metagenomic filename (direct from the NGS sequencing platform). Possible file formats: FASTQ, multi-FASTA, FASTA or sequence [ACGTN].
[FILE2] The database filename (e.g. virus or bacteria database). Possible file formats: FASTA, multi-FASTA or sequence [ACGTN]. There are several scripts, on directory scripts, to download several databases.

3.1 Local detection

For local interactions detection and visualization the package provides FALCON-FILTER and FALCON-EYE.

3.1.1 Filtering

To see the possible options of FALCON-FILTER type

./FALCON-FILTER

or

./FALCON-FILTER -h

These will print the following options:

Usage: FALCON-FILTER [OPTION]... [FILE] Filter and segment FALCON output. Non-mandatory arguments: -h give this help, -F force mode (overwrites top file), -V display version number, -v verbose mode (more information), -s <size> filter window size, -w <type> filter window type, -x <sampling> filter window sampling, -sl <lower> similarity lower bound, -su <upper> similarity upper bound, -dl <lower> size lower bound, -du <upper> size upper bound, -t <threshold> threshold, -o <FILE> output filename, Mandatory arguments: [FILE] profile filename (from FALCON), Report issues to <{pratas,ap,pjf,jmr}@ua.pt>.

All the parameters can be better explained trough the following table:

Parameters Meaning
-h It will print the parameters menu (help menu)
-F It will use the force mode, namely overwriting the output top file.
-V It will print the FALCON version number, license type and authors.
-v It will print progress information.
-s <size> Filtering window size.
-w <type> Window type [0;3]. Types: 0-Hamming, 1-Hann, 2-Blackman, 3-Rectangular.
-x <sampling> Filtering window sampling (it will drop this number of bases).
-sl <lower> similarity lower bound.
-su <upper> similarity upper bound.
-dl <lower> size lower bound.
-du <upper> size upper bound.
-t <threshold> Threshold to segment regions of similarity [0;2].
-o <FILE> Output filename to be, for example, computed in FALCON-EYE. It contains the local positions with the intervals describing similarity.
[FILE] Profile filename given by the output of FALCON (option:-Z -y <FILE>).

3.1.2 Visualization

To see the possible options of FALCON-EYE type

./FALCON-EYE

or

./FALCON-EYE -h

These will print the following options:

Usage: FALCON-EYE [OPTION]... [FILE] Visualize FALCON-FILTER output. Non-mandatory arguments: -h give this help, -F force mode (overwrites top file), -V display version number, -v verbose mode (more information), -w <width> square width (for each value), -s <ispace> square inter-space (between each value), -i <indexs> color index start, -r <indexr> color index rotations, -u <hue> color hue, -sl <lower> similarity lower bound, -su <upper> similarity upper bound, -dl <lower> size lower bound, -du <upper> size upper bound, -bg show only the best of group, -g <color> color gamma, -e <size> enlarge painted regions, -ss do NOT show global scale, -sn do NOT show names, -o <FILE> output image filename, Mandatory arguments: [FILE] profile filename (from FALCON-FILTER), Report issues to <{pratas,ap,pjf,jmr}@ua.pt>.

All the parameters can be better explained trough the following table:

Parameters Meaning
-h It will print the parameters menu (help menu)
-F It will use the force mode, namely overwriting the output top file.
-V It will print the FALCON version number, license type and authors.
-v It will print progress information.
-w <width> square width.
-s <iSpace> space between squares.
-i <indexs> color index start.
-r <indexr> color index rotations.
-u <hue> color hue.
-g <color> color gamma.
-sl <lower> similarity lower bound.
-su <upper> similarity upper bound.
-dl <lower> size lower bound.
-du <upper> size upper bound.
-e <size> enlarge painter local regions.
-ss Does not show global scale.
-sn Does not show names.
-o <FILE> Output SVG image filename.
[FILE] Profile filename given by the output of FALCON-FILTER.

4. COMMON USE

Create the following bash script:

#!/bin/bash
./FALCON -v -n 4 -t 200 -F -Z -m 20:100:1:5/10 -c 30 -y complexity.com $1 $2
./FALCON-FILTER -v -F -t 0.5 -o positions.pos complexity.com
./FALCON-EYE -v -F -o draw.map positions.pos

Name it Run.sh, then run it using:

. Run.sh Eagle.fna virus.fna

Eagle.fna and virus.fna are only two examples. See folder examples for more.

5. CITATION

On using this software/method please cite:

D. Pratas, A. J. Pinho, R. M. Silva, J. M. O. S. Rodrigues, M. Hosseini, T. Caetano, P. J. S. G. Ferreira "FALCON-meta: a method to infer metagenomic composition of ancient DNA", bioRxiv preprint, 2018.

Doi: https://doi.org/10.1101/267179

Paper preprint

6. ISSUES

For any issue let us know at issues link.

7. LICENSE

GPL v3.

For more information see LICENSE file or visit

http://www.gnu.org/licenses/gpl-3.0.html