Skip to content
A genomic compression algorithm that relies on a pre-analysis of the data (algorithmic entropy filter) before compression
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
HighFCM.c
Makefile
README.md
ac.c
ac.h
common.c
common.h
context.c
context.h
defs.h
logo.png
mem.c
mem.h

README.md

HighFCM

HighFCM

HighFCM is a compression algorithm that relies on a pre-analysis of the data, namely an algorithmic entropy filter, before compression, with the aim of filtering regions of low complexity. This strategy enables us to use deeper context models, supported by hash-tables, without requiring huge amounts of memory. As an example, context depths as large as 32 are attainable for alphabets of four symbols, as is the case of genomic sequences. These deeper context models show very high compression capabilities in very repetitive genomic sequences, yielding improvements over previous algorithms. Furthermore, this method is universal, in the sense that it can be used in any type of textual data.

INSTALLATION

In the following instructions we show the procedure to install/compile HighFCM.

Linux

git clone https://github.com/pratas/highfcm.git
cd highfcm
make

Windows

In windows use cygwin (https://www.cygwin.com/) and make sure that it is included in the installation: make, unzip, wget (and any dependencies). If you install the complete cygwin packet then all these will be installed. After, all steps will be the same as in Linux.

EXECUTION

Example on running HighFCM:

./HighFCM -v -cl 4 -ce 14 -cu 16 File.seq

PARAMETERS

To see the possible options type

./HighFCM

or

./HighFCM -h

These will print the following options:

Usage: HighFCM [OPTION]... [FILE] -h give this help -v verbose mode -cl <ctxLow> low context order used in compression -ml <maxCnt> low order maximum counter -ce <ctxEval> high context order on evaluation -cu <ctxUsed> high context order on compression -mu <maxCnt> used order maximum counter -au <alpha> alpha estimator denominator for cu -ae <alpha> alpha estimator denominator for ce -b <blockSize> block size (default: 100) -ir use inverted repeats -tm <tableMode> table mode: 0|1 (0=array, 1=hash) -t <nThreads> number of threads / parts -d <outFile> decompression output file -rm remove comp file after decomp <File> input file to compress

CITATION

On using this software/method please cite:

Pratas, D.; Pinho, A.J., "Exploring deep Markov models in genomic data compression using sequence pre-analysis", Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European, pp.2395-2399, 1-5 Sept. 2014.

ISSUES

For any issue let us know at issues link.

LICENSE

GPL v2.

For more information:

http://www.gnu.org/licenses/gpl-2.0.html
You can’t perform that action at this time.