Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

HighFCM

HighFCM

HighFCM is a compression algorithm that relies on a pre-analysis of the data, namely an algorithmic entropy filter, before compression, with the aim of filtering regions of low complexity. This strategy enables us to use deeper context models, supported by hash-tables, without requiring huge amounts of memory. As an example, context depths as large as 32 are attainable for alphabets of four symbols, as is the case of genomic sequences. These deeper context models show very high compression capabilities in very repetitive genomic sequences, yielding improvements over previous algorithms. Furthermore, this method is universal, in the sense that it can be used in any type of textual data.

INSTALLATION

In the following instructions we show the procedure to install/compile HighFCM.

Linux

git clone https://github.com/pratas/highfcm.git
cd highfcm
make

Windows

In windows use cygwin (https://www.cygwin.com/) and make sure that it is included in the installation: make, unzip, wget (and any dependencies). If you install the complete cygwin packet then all these will be installed. After, all steps will be the same as in Linux.

EXECUTION

Example on running HighFCM:

./HighFCM -v -cl 4 -ce 14 -cu 16 File.seq

PARAMETERS

To see the possible options type

./HighFCM

or

./HighFCM -h

These will print the following options:

Usage: HighFCM [OPTION]... [FILE] -h give this help -v verbose mode -cl <ctxLow> low context order used in compression -ml <maxCnt> low order maximum counter -ce <ctxEval> high context order on evaluation -cu <ctxUsed> high context order on compression -mu <maxCnt> used order maximum counter -au <alpha> alpha estimator denominator for cu -ae <alpha> alpha estimator denominator for ce -b <blockSize> block size (default: 100) -ir use inverted repeats -tm <tableMode> table mode: 0|1 (0=array, 1=hash) -t <nThreads> number of threads / parts -d <outFile> decompression output file -rm remove comp file after decomp <File> input file to compress

CITATION

On using this software/method please cite:

Pratas, D.; Pinho, A.J., "Exploring deep Markov models in genomic data compression using sequence pre-analysis", Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European, pp.2395-2399, 1-5 Sept. 2014.

ISSUES

For any issue let us know at issues link.

LICENSE

GPL v2.

For more information:

http://www.gnu.org/licenses/gpl-2.0.html

About

A genomic compression algorithm that relies on a pre-analysis of the data (algorithmic entropy filter) before compression

Resources

Releases

No releases published

Packages

No packages published