Skip to content
A simple count-min-sketch in rust.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src
tests
.gitignore
.travis.yml
Cargo.toml
LICENSE
readme.markdown

readme.markdown

basic count-min sketch

It's the count-min sketch structure implemented in rust.

You can use it to count frequencies of events with some degree of uncertainty. It uses sub-linear space, but allows for possible hash collisions that could lead to over counting.

Conceptually, it's like a bloom filter, but for multi-sets rather than strict sets.

In order to use it, you need to provide an implementation of the trait basiccms::IntoSketch that describes how to turn some type into the underlying u64 used by the hash buckets.

Ostensibly we could just use an implementation like this:

impl<T: Hash> IntoSketch for T {
    fn asu64(&self) -> u64 {
        let mut hasher = SipHasher::new();
        self.hash(&mut hasher);

        hasher.finish()
    }
}

But that precludes occasions where we might not want an extra hashing pass, and there's an "obvious" way to convert to a u64 (for example a u32 might just get widened...). It's not defined in the crate itself, because so many types implement Hash it would cause annoying collisions, but no reason you can't add it yourself in another scope.

Once you've actually got a method of putting your data into the sketch by implementing the trait, you can use it heterogenuously if you feel like it (unlike, say sketchy):

extern crate basiccms;

use basiccms::Sketch;

#[test]
fn we_should_be_able_to_add_heterogenuously () {
        let mut sketch = Sketch::new(0.0001, 0.8);

        sketch.add(1); sketch.add(2); sketch.add(3);
        sketch.add("foo"); sketch.add("bar"); sketch.add("quux");

        assert_eq!(1, sketch.point("foo"));
        sketch.add("foo");

        assert_eq!(2, sketch.point("foo"));
}

At the moment all that's offered is the sketch.point point query method, which returns the class "min count" frequency estimation.

Patches welcome!

You can’t perform that action at this time.