Skip to content

lemire/rollinghashjava

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 

Rolling hash functions in Java

License: Apache 2.0

What is this?

This is a set of Java classes implementing various recursive n-gram hashing techniques, also called rolling hashing (http://en.wikipedia.org/wiki/Rolling_hash), including:

  • Randomized Karp-Rabin (sometimes called Rabin-Karp)
  • Hashing by Cyclic Polynomials (also known as Buzhash)

Code sample

String s = "some string";
int n = 3; //hash all sequences of 3 characters
CyclicHash ch = new CyclicHash(n);
int k = 0;
for(; k<n;++k) {
  ch.eat(s.charAt(k));
}
int rollinghash = ch.eat(s.charAt(k)); // the first or last 32-(n-1) bits are strongly universal
// do something with the hash value
for(;k<s.length();++k) {
  rollinghash = ch.update(s.charAt(k-n), s.charAt(k));
  // do something with the hash value
}

Reference

Daniel Lemire, Owen Kaser: Recursive n-gram hashing is pairwise independent, at best, Computer Speech & Language, Volume 24, Issue 4, October 2010, Pages 698-710 http://arxiv.org/abs/0705.4676