Rolling Hash C++ Library
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.travis.yml
Makefile Including answer to a question by Dmitry Artamonov Mar 18, 2018
README.md Merge branch 'master' of github.com:lemire/rollinghashcpp Mar 18, 2018
adler32.h Missing file. Mar 18, 2018
characterhash.h
cyclichash.h
example.cpp Reformatting. Aug 20, 2015
example2.cpp
example3.cpp Reformatting. Aug 20, 2015
example4.cpp
example5.cpp Fixing example. Aug 14, 2017
example6.cpp Including answer to a question by Dmitry Artamonov Mar 18, 2018
example64bits.cpp Reformatting. Aug 20, 2015
generalhash.h Including answer to a question by Dmitry Artamonov Mar 18, 2018
mersennetwister.h Reformatting. Aug 20, 2015
rabinkarphash.h Including answer to a question by Dmitry Artamonov Mar 18, 2018
speedtesting.cpp Reformatting. Aug 20, 2015
threewisehash.h Including answer to a question by Dmitry Artamonov Mar 18, 2018
unit.cpp
ztimer.h Reformatting. Aug 20, 2015

README.md

Randomized rolling hash functions in C++

Build Status

License: Apache 2.0

What is this?

This is a set of C++ classes implementing various recursive n-gram hashing techniques, also called rolling hashing (http://en.wikipedia.org/wiki/Rolling_hash), including:

  • Randomized Karp-Rabin (sometimes called Rabin-Karp)
  • Hashing by Cyclic Polynomials (also known as Buzhash)
  • Hashing by Irreducible Polynomials

This library is used by khmer: the in-memory nucleotide sequence k-mer engine.

These are randomized hash functions, meaning that each time you create a new hasher instance, you will get new hash values for a given input.

Code sample

    const uint n(3);//hash all sequences of 3 characters
    const uint L(7); // you need 7 bits
    CyclicHash<uint32> hf(n,L );// if you want 64-bit values replace uint32 by uint64
    for(uint32 k = 0; k<n;++k) {
              chartype c = ... ; // grab some character
              hf.eat(c); // feed it to the hasher
    }
    while(...) { // go over your string
       hf.hashvalue; // at all times, this contains the hash value
       chartype c = ... ;// point to the next character
       chartype out = ...; // character we want to forget
       hf.update(out,c); // update hash value
    }
    hf.reset(); // you can now hash a new string

Requirements

A recent GNU GCC C++ compiler or a recent CLANG.

What should I do after I download it?

type:

    make

then

    ./unit

then

    ./speedtesting

References

  • Daniel Lemire, Owen Kaser: Recursive n-gram hashing is pairwise independent, at best, Computer Speech & Language, Volume 24, Issue 4, October 2010, Pages 698-710 http://arxiv.org/abs/0705.4676
  • Daniel Lemire, The universality of iterated hashing over variable-length strings, Discrete Applied Mathematics 160 (4-5), 2012. http://arxiv.org/abs/1008.1715
  • Owen Kaser and Daniel Lemire, Strongly universal string hashing is fast, Computer Journal (2014) 57 (11): 1624-1638. http://arxiv.org/abs/1202.4961

This work has been used in genomics, see