SACO: a lossy compression tool for the sequences alignments found in the MAF files
C
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
objs
src
LICENSE.txt
Makefile.linux
Makefile.osx
Makefile.win32
Makefile.win64
README.md

README.md

SACO (Sequence Alignment COmpressor)

SACO: a compression tool for the sequences alignments found in the MAF files.

This compression tool was designed to handle the DNA bases and gap symbols that can be found in MAF files. Our method is based on a mixture of finite-context models. Contrarily to a recent approach (Hanus 2010), it addresses both the DNA bases and gap symbols at once, better exploring the existing correlations. For comparison with previous methods, our algorithm was tested in the multiz28way dataset. On average, it attained 0.94 bits per symbol, approximately 7% better than the previous best, for a similar computational complexity. We also tested the model in the most recent dataset, multiz46way. In this dataset, that contains alignments of 46 different species, our compression model achieved an average of 0.72 bits per MSA block symbol.

INSTALLATION

In order to compile the source code, you will need to install a GCC compiler on a Unix platform (Linux or OS X). If you are using Windows, it will be easy to use the pre-compiled binaries that are in folders win32 and win64.

Linux

For Linux users, install the build-essentials package which contains GCC and other utilities in order to be able to compile the source code. To install the build-essentials package type:

sudo apt-get install build-essential

After that you only need to type:

make -f Makefile.linux

to create the binaries SACOe (encoder) and SACOd (decoder).

OS X

For OS X users, it depends on which Xcode version is installed. For the most recent versions, you will need to install the "Command Line Tool" in order to have the "make" utility. It seems that the "Command Line Tools" are not installed by default anymore when you install Xcode. In order to install them, open Xcode, go to Preferences -> Downloads -> Components -> Command Line Tools. This also should install a GCC compiler as well. If you want a recent compiler you can install it using Homebrew by typing the following command in a Terminal:

brew install gcc48

After that, we need to make sure that the "CC" variable in the "Makefile.osx" file is linked to the GCC previously installed. The most recent versions of XCode come with a modified version of GCC known as LLVM. This tool was not tested using LLVM so it will probably not work if you try to compile the code using it. In order to generate the binaries just type:

make -f Makefile.osx

to create the binaries SACOe (encoder) and SACOd (decoder).

Windows

The source code was NOT tested in a Windows enviroment. Nevertheless, you can compile the code using a cross-compiler in a Linux environment after installing the cross-compiler MinGW-w64. After installing MinGW-w64, just type:

make -f Makefile.win32

to get the SACOe32.exe (encoder) SACOd32.exe (decoder) executables (32-bits architecture) and for the 64-bits architecture just type:

make -f Makefile.win64

to get the SACOe64.exe (encoder) and SACOd64.exe (decoder) executables. The encoder seems to work just fine however there is a bug in the decoder that will be fixed soon...

USAGE

Encoding

The SACOe, SACOe32.exe, and SACOe64.exe programs have several parameters that can be defined by the user. In the following you can find a description with the most relevant parameters available.

Usage: SACOe [options] ... [MAF File]

The most relevant options are:

-v Activates vervose mode.
-h Prints some help information.
-o [encodedFile] If present, it writes the encoded data into file "encodedFile".
-e Estimation only. Does not create the binary compressed file.
-alm Activate the acenstral line mode.
-scm Activate the static column model.
-cm1 [n/d t=threshold] Columnwise Model 1.
-cmn [n/d t=threshold] Columnwise Model N.
-u 0 [leftSize-rightSize n/d t=threshold] Ancestral context model with "leftSize" symbols on the left and "rightSize" symbols on the right.
-u template [n/d t=threshold] 2D image context template. Templates available 1-14 and 20-24.
-g [gamma] Gamma value used in the model mixture.

Decoding

The SACOd, SACOd32.exe, and SACOd64.exe programs have the following interface:

Usage: SACOd [options] ... [Encoded File]

Examples

In the following, we will show some examples of how to use this tool in a linux environment.

We can encode a MAF file using for example two models (order-9 and order-11) and put the encoded file in "file.enc" by typing:

$ SACOe -u 9 -u 11 -o file.enc chrM.maf

For decoding the encoded file just type:

$ SACOd -o file.dec file.enc

DATA SETS

Some data set that can be used for evaluate this tool.

CITE

If you use this software, please cite the following publications:

ISSUES

The windows decoders (SACOd32.exe and SACOd64.exe) have a bug that will be fixed soon... For other issues please use the issues link at GitHub.

COPYRIGHT

Copyright (c) 2014 Luís M. O. Matos. See LICENSE.txt for further details.