This is a code exploration project to test different implementations to extract spaced k-mers from a genomic sequence. We find that the algorithms implemented here are up to an order of magnitude faster than existing approaches to extract spaced k-mers.
Simply call
make
to build the program, and
./bin/fisk
This will produce performance measurements for all implemented benchmarks on the given hardware, measuring time per operation, such as per extracted k-mer:
bit_extract_weights.csv: Performance of bit extracting, for different weights of the mask.bit_extract_blocks.csv: Performance of bit extracting, for masks with different numbers of blocks of consecutive 1s.seq_enc.csv: Sequence encoding, from ASCII to 2-bit codes.kmer_extract.csv: Regular k-mer extraction, traversing along an input sequence.kmer_spaced_single.csv: Spaced k-mer extraction, using a single mask.kmer_spaced_multi.csv: Spaced k-mer extraction, for multiple masks at once.kmer_clark.csv: Small test to examine bottlenecks in the CLARK-S implementation.
We provide these results for all hardware architectures tested in benchmarks. We also added
the results of our DuoHash benchmark there, called DuoHash.csv.
To create the plots from the manuscript, run ./plot/plot_all_cpus.sh. This requires some standard Python packages to be installed; a conda env file with these packages is provided in conda-env.yaml.
The code separates the bit extraction functionality from the k-mer functionality. The bit extraction functions can thus be used in a general context as well.
Functions that process k-mers are mostly templated here, in order to allow us to benchmark different implemenations of, e.g., the nucleotide to two-bit encoding and the bit extraction. Thus, to use these function in your code, you might want to replace those template parameters with hard-coded versions for simplicity.
To use this in our own code, we recommend to use PEXT for bit extraction where possible, and a SIMD-accelerated Butterfly network otherwise. In the simplest case, use bit_extract_selector() as a dynamic check to test this, and a switch as shown in switch_bit_extract_mode() and run_bit_extract_mode() to run. This avoids overhead for function pointers in the hot loop per extracted value.
Similarly, for spaced k-mer extraction, use spaced_kmer_selector() to decide which extraction algorithm to use dynamically on a given hardware, and again a switch outside of the hot loop to dispatch between the algorithms.
Existing approaches for spaced k-mers:
- DuoHash, and our fork containing a simple benchmark program for their methods
- FISH
- MISSH
- CLARK-S
- MaskJelly
Information on PEXT:
- Intel Intrinsics Guide
- Wikipedia: BMI2
- Félix Cloutier's x86 reference
- Sirrida pages on Bit permutations for BMI2 and Compress and expand