Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

An efficient native implementation of the HyperLogLog cardinality estimator for Ruby

branch: master

This branch is 0 commits ahead and 0 commits behind master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 ext
Octocat-spinner-32 spec
Octocat-spinner-32 .gitignore
Octocat-spinner-32 CHANGELOG
Octocat-spinner-32 Manifest
Octocat-spinner-32 README.md
Octocat-spinner-32 Rakefile
Octocat-spinner-32 hyperloglog.gemspec
README.md

HyperLogLog for Ruby

HyperLogLog is an algorithm for estimating the cardinality of a set. The HyperLogLog strategy has several nice properties:

  1. It is near-optimal in its estimation ability
  2. allows you some coarse tuning on the amount of standard error you can tolerate
  3. The data structures that are used for the estimation are fast, easily compressed and stored, and can be recombined to provide estimates of both the union and intersection of multiple sets

The API is broken out into 2 pieces, the HyperBuilder and the HyperEstimator. This is done for clarity as well as performance optimizations in the future.

Installation

gem install hyperloglog

Example

require 'hyperloglog'

# Build a new estimator
builder = HyperBuilder.new
0.upto(100).each{|user_id| builder.offer(user_id.to_s)}

# Read an estimator from bytes on disk
estimator = HyperEstimator.new(File.read('bytes.txt'))

# Estimate the union of our two sources
estimate = HyperEstimator.estimate(builder.estimator, estimator)

# puts estimate
# => 147

External Libraries Included

Murmur3 https://github.com/PeterScott/murmur3

EWAHBoolArray https://github.com/lemire/EWAHBoolArray

Something went wrong with that request. Please try again.