A lightweight and dependency-free PHP implementation of the HyperLogLog probabilistic data structure for approximate cardinality estimation.
HyperLogLog allows you to estimate the number of distinct elements in very large datasets while using only a small, fixed amount of memory.
- 🚀 Fast approximate distinct counting
- 📦 Zero dependencies
- 🔧 Configurable number of registers (
counterBits) - 🔐 Configurable hashing algorithm (
xxh3,sha256,md5, etc.) - 📊 Theoretical error rate calculation
- 🧮 Small and large cardinality bias corrections
- ✅ Strict types and fully documented source code
- PHP 8.0, 8.1, 8.2, 8.3, 8.4 and 8.5
- No external dependencies
Note: The
xxh3andxxh128hash algorithms are available only when supported by your PHP version and build. If unavailable, you can use any other algorithm returned byhash_algos(), such assha256,sha512, ormd5.
Install via Composer:
composer require hichxm/hyperloglog<?php
use Hichxm\HyperLogLog\HyperLogLog;
$hll = new HyperLogLog();
$hll->add('apple');
$hll->add('banana');
$hll->add('orange');
$hll->add('apple'); // duplicate
echo $hll->count();The returned value is an approximation of the number of unique elements.
new HyperLogLog(
int $counterBits = 5,
string $hashAlgorithm = 'xxh3'
);| Parameter | Description |
|---|---|
counterBits |
Number of bits used to select registers. The number of registers is 2^counterBits. |
hashAlgorithm |
Any hashing algorithm supported by PHP's hash() function. |
Increasing
counterBitsimproves accuracy while increasing memory usage.
Example:
$hll = new HyperLogLog(
counterBits: 10,
hashAlgorithm: 'sha256'
);The theoretical standard error is:
1.04 / √m
where:
m = 2^counterBits
Example:
$error = $hll->theoreticalErrorRate($hll->getM());Any algorithm supported by PHP can be used.
Examples include:
xxh3(recommended when available)xxh128sha256sha512md5sha1
You can list available algorithms using:
print_r(hash_algos());$hll->add('user-123');$count = $hll->count();$error = $hll->theoreticalErrorRate($hll->getM());$error = $hll->measureError($estimated, $actual);HyperLogLog is well suited for:
- Counting unique visitors
- Counting unique IP addresses
- Analytics pipelines
- Large log processing
- Stream processing
- Database statistics
- Big data applications
It is not appropriate when an exact distinct count is required.
MIT