Skip to content

Code accompanying the publication for compressed graph annotation

License

Notifications You must be signed in to change notification settings

ratschlab/graph_annotation

Repository files navigation

Hash-based colored de Bruijn graph with wavelet trie and Bloom filter color compression

Reference

This code implements the wavelet trie and corrected Bloom filter compressors proposed in our paper

Dynamic compression schemes for graph coloring, Bioinformatics, 2018 by Harun Mustafa, Ingo Schilken, Mikhail Karasikov, Carsten Eickhoff, Gunnar Rätsch, and André Kahles.

Other methods for representing graph annotations implemented with a more generic API, including Multi-BRWT, Rainbowfish, as well as Row- and Column-major sparse representations, may be found here.

Install

Prerequisites

  • cmake 3.6.1
  • C++14
  • HTSlib
  • GNU GMP
  • boost
  • sdsl-lite

Steps

  1. git clone --recursive https://github.com/ratschlab/graph_annotation
  2. Build sdsl-lite by pushd external-libraries/sdsl-lite; ./install.sh $(pwd); popd
  3. go to the build directory mkdir -p build && cd build
  4. compile by cmake .. && make && ./unit_tests

Build types: cmake .. <arguments> where arguments are:

  • -DCMAKE_BUILD_TYPE=[Debug|Release|Profile] -- build modes (Debug by default)
  • -DBUILD_STATIC=ON -- link statically (OFF by default)

Typical workflow

  1. Generate graph and uncompressed annotations (.precise.dbg and optionally .wtr.dbg files)
    ./annograph build -o <OUTPREFIX> <FLAGS> <INPUTS>
  2. Compress annotation with Bloom filters
    ./annograph build -i <OUTPREFIX> -o <BLOOMOUTPREFIX> <FLAGS> <INPUTS>
  3. Compress annotation with wavelet tries (if not done in step 1)
    ./annograph build -i <OUTPREFIX> -o <WTROUTPREFIX> --wavelet-trie <FLAGS> <INPUTS>

Example

./annograph build -k 9 -o tiny_example ../tests/data/test_vcfparse.fa

./annograph build -i tiny_example --bloom-false-pos-prob 0.01 -o tiny_example ../tests/data/test_vcfparse.fa
./annograph map -i tiny_example TCGCGCGCTA TCGCGCGCTA TCGCGCGCTC TCGCGCGCTN TCGCGCGCTANA TCGCGCGCTC

./annograph build -i tiny_example --wavelet-trie -o tiny_example ../tests/data/test_vcfparse.fa
./annograph map --wavelet-trie -i tiny_example TCGCGCGCTA TCGCGCGCTA TCGCGCGCTC TCGCGCGCTN TCGCGCGCTANA TCGCGCGCTC

Other use cases

Constructing wavelet trie in blocks (slower, uses less RAM)
./annograph compress -i <OUTPREFIX> -o <WTROUTPREFIX>

Annotation compressor query time
./annograph query -i <OUTPREFIX>

Wavelet trie statistics
./annograph stats -i <OUTPREFIX> --wavelet-trie

Compress wavelet tries with random column permutations
./annograph permutation -i <OUTPREFIX> --num-permutations <NUM_PERMS>

Reproducing results from the paper

The input data for reproducing the results of the experiments in our paper is located here.

About

Code accompanying the publication for compressed graph annotation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages