Iterate over minimizers of a DNA sequence.
- iterates over minimizers in a single pass
- yields bitpacked minimizers with their position
- supports mod-minimizers, introduced by Groot Koerkamp & Pibiri
- supports canonical minimizers
- supports custom bit encoding of the nucleotides
- supports custom hasher, using wyhash by default
- can be seeded to produce a different ordering
If you'd like to use the underlying data structure manually, have a look at the minimizer-queue crate.
use minimizer_iter::MinimizerBuilder;
// Build an iterator over minimizers
// of size 21 with a window of size 11
// for the sequence "TGATTGCACAATC"
let min_iter = MinimizerBuilder::<u64>::new()
.minimizer_size(21)
.width(11)
.iter(b"TGATTGCACAATC");
for (minimizer, position) in min_iter {
// ...
}
If you'd like to use mod-minimizers instead, just change new()
to new_mod()
:
use minimizer_iter::MinimizerBuilder;
// Build an iterator over mod-minimizers
// of size 21 with a window of size 11
// for the sequence "TGATTGCACAATC"
let min_iter = MinimizerBuilder::<u64, _>::new_mod()
.minimizer_size(21)
.width(11)
.iter(b"TGATTGCACAATC");
for (minimizer, position) in min_iter {
// ...
}
Additionally, the iterator can produce canonical minimizers so that a sequence and its reverse complement will select the same minimizers.
To do so, just add .canonical()
to the builder:
MinimizerBuilder::<u64>::new()
.canonical()
.minimizer_size(...)
.width(...)
.iter(...)
If you need longer minimizers (> 32 bases), you can specify a bigger integer type such as u128
:
MinimizerBuilder::<u128>::new()
.minimizer_size(...)
.width(...)
.iter(...)
See the documentation for more details.
To run benchmarks against other implementations of minimizers, clone this repository and run:
cargo bench
- Igor Martayan (main developer)
- Pierre Marijon