Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Probabilistic Aggregations for Jet

Collections of probabilistic aggregations. Currently implementation aggregations: HyperLogLog++

HyperLogLog++ Aggregation for Jet

Memory efficient cardinality estimator

Getting Started

HyperLogLog is a probabilistic data structure to estimate cardinality. Translated to plain English it means this: You feed HyperLogLog with items and it can tell you how many unique items it received.

It's extremely memory efficient: The usual way to count unique elements would be to store elements in e.g. HashSet. However this would have space complexity O(n) where n is number of unique element. HyperLogLog has constant space complexity.

What's the trade-off? Precision. There is an error rate. However for many applications it's OK to have approximate data if it means extremely low memory consumption.

Imagine you are processing web server access log and you want to tell how many unique IPs visited your site. Error rate 1-2% is often irrelevant.

You can read more about the data structure at Wikipedia:


The aggregation is typically used in two stages:

  1. Mapping step to transform your stream entries into 64 bits hashes
  2. Aggregation step to estimate no. of unique hashes

In practices in looks like this:

import com.hazelcast.jet.contrib.probabilistic.*;
                .mapUsingContext(HashingSupport.hashingContextFactory(), HashingSupport.hashingFn()) // hash items 
                .aggregate(ProbabilisticAggregations.hyperLogLog()) // actual aggregation
                .drainTo(Sinks.mySink()); // write cardinality into sink


The artifacts are published on the Maven repositories.

Add the following lines to your pom.xml to include it as a dependency to your project:



compile group: 'com.hazelcast.jet.contrib', name: 'probabilistic', version: ${version}

Running the tests

To run the tests run the command below:

./gradlew test



This project is licensed under the Apache 2.0 license - see the LICENSE file for details

You can’t perform that action at this time.