Skip to content
stream count distinct element estimation
Elixir
Latest commit abb3a78 Apr 11, 2016 @rozap Update README.md
oops
Failed to load latest commit information.
config
lib
test
.gitignore
LICENSE
README.md
mix.exs
mix.lock

README.md

Spacesaving

Simple algorithm to estimate distinct elements in an unbounded stream using bounded space. The estimate is the upper bound on the element's actual count.

Docs on hex

Usage

Add it to you mix.exs deps

{:spacesaving, "~> 0.0.2"}

Init with 3 spaces, so we track 3 elements

import Spacesaving

state = init(3)

Push some elements

state = state
|> push(:foo) |> push(:foo) |> push(:foo) |> push(:foo)
|> push(:bar) |> push(:bar) |> push(:bar)
|> push(:baz) |> push(:baz)
|> push(:buzz)

Get the top k elements

top(state, 2) # This will be [foo: 4, bar: 3]
top(state, 3) # This will be [foo: 4, bar: 3, buzz: 3], so the inaccuracy starts to come into play when an element is kicked out, and the estimate is the upper bound

Merge two states

left  = init(4) |> push(:foo) |> push(:bar)
right = init(4) |> push(:foo) |> push(:baz)

merge(left, right)
|> top(3) # Would be [foo: 2, bar: 1, baz: 1]

References

Original Paper

Something went wrong with that request. Please try again.