This code implements the wavelet trie and corrected Bloom filter compressors proposed in our paper
Dynamic compression schemes for graph coloring, Bioinformatics, 2018 by Harun Mustafa, Ingo Schilken, Mikhail Karasikov, Carsten Eickhoff, Gunnar Rätsch, and André Kahles.
Other methods for representing graph annotations implemented with a more generic API, including Multi-BRWT, Rainbowfish, as well as Row- and Column-major sparse representations, may be found here.
Prerequisites
- cmake 3.6.1
- C++14
- HTSlib
- GNU GMP
- boost
- sdsl-lite
Steps
git clone --recursive https://github.com/ratschlab/graph_annotation
- Build sdsl-lite by
pushd external-libraries/sdsl-lite; ./install.sh $(pwd); popd
- go to the build directory
mkdir -p build && cd build
- compile by
cmake .. && make && ./unit_tests
Build types: cmake .. <arguments>
where arguments are:
-DCMAKE_BUILD_TYPE=[Debug|Release|Profile]
-- build modes (Debug by default)-DBUILD_STATIC=ON
-- link statically (OFF by default)
- Generate graph and uncompressed annotations (
.precise.dbg
and optionally.wtr.dbg
files)
./annograph build -o <OUTPREFIX> <FLAGS> <INPUTS>
- Compress annotation with Bloom filters
./annograph build -i <OUTPREFIX> -o <BLOOMOUTPREFIX> <FLAGS> <INPUTS>
- Compress annotation with wavelet tries (if not done in step 1)
./annograph build -i <OUTPREFIX> -o <WTROUTPREFIX> --wavelet-trie <FLAGS> <INPUTS>
./annograph build -k 9 -o tiny_example ../tests/data/test_vcfparse.fa
./annograph build -i tiny_example --bloom-false-pos-prob 0.01 -o tiny_example ../tests/data/test_vcfparse.fa
./annograph map -i tiny_example TCGCGCGCTA TCGCGCGCTA TCGCGCGCTC TCGCGCGCTN TCGCGCGCTANA TCGCGCGCTC
./annograph build -i tiny_example --wavelet-trie -o tiny_example ../tests/data/test_vcfparse.fa
./annograph map --wavelet-trie -i tiny_example TCGCGCGCTA TCGCGCGCTA TCGCGCGCTC TCGCGCGCTN TCGCGCGCTANA TCGCGCGCTC
Constructing wavelet trie in blocks (slower, uses less RAM)
./annograph compress -i <OUTPREFIX> -o <WTROUTPREFIX>
Annotation compressor query time
./annograph query -i <OUTPREFIX>
Wavelet trie statistics
./annograph stats -i <OUTPREFIX> --wavelet-trie
Compress wavelet tries with random column permutations
./annograph permutation -i <OUTPREFIX> --num-permutations <NUM_PERMS>
The input data for reproducing the results of the experiments in our paper is located here.