# Count-Min Sketch (CMS)

Count-Min Sketch estimates the frequency of unique elements. It maintains a matrix of $(d \times w)$ for $d$ hash functions and $w$ buckets.

* upon update, an element is mapped to each hash function's output bucket and their counter gets increased
* upon querying, the minimum of the bucket values mapped by hash functions are taken as the estimate of the frequency of that element

The estimate is nevel less than true value, and has an upper bound, with probability $(1 - \delta)$.

$$
\hat{c_i} \ge c_i
$$

$$
\hat{c_i} \le c_i + \epsilon \cdot ||C||_1, \ \text{where} \ ||C||_1 = N
$$

In [8]:
import random
from river import sketch

RAND = 42

In [9]:

EPSILON = .005
DELTA = .05

rng = random.Random(RAND)

cms = sketch.Counter(epsilon=EPSILON, delta=DELTA, seed=RAND)
print("Shape of CMS table:", cms.n_tables, cms.n_slots)

import collections
counter = collections.Counter()

vals = []
for _ in range(10000):
    v = rng.randint(-1000, 1000)
    cms.update(v)
    counter[v] += 1
    vals.append(v)

Shape of CMS table: 3 544


In [10]:
print(f"CMS (eps={EPSILON}, delta={DELTA})", "\n")

for x in [7, 532, 1001]:
	print(f"Exact count of ({x}): {counter[x]}")
	print(f"Estimated count of ({x}): {cms[x]}", "\n")

print(f"Length of CMS:", len(cms))
print(f"Length of exact counter:", len(counter))

CMS (eps=0.005, delta=0.05) 

Exact count of (7): 6
Estimated count of (7): 16 

Exact count of (532): 5
Estimated count of (532): 11 

Exact count of (1001): 0
Estimated count of (1001): 16 

Length of CMS: 1632
Length of exact counter: 1988


In [11]:
EPSILON_2 = .003
DELTA_2 = .05

rng = random.Random(RAND)

cms_2 = sketch.Counter(epsilon=EPSILON_2, delta=DELTA_2, seed=RAND)
print("Shape of CMS table:", cms_2.n_tables, cms_2.n_slots)

import collections
counter_2 = collections.Counter()

vals = []
for _ in range(10000):
    v = rng.randint(-1000, 1000)
    cms_2.update(v)
    counter_2[v] += 1
    vals.append(v)

Shape of CMS table: 3 907


In [12]:
print(f"CMS (eps={EPSILON_2}, delta={DELTA_2})", "\n")

for x in [7, 532, 1001]:
	print(f"Exact count of ({x}): {counter_2[x]}")
	print(f"Estimated count of ({x}): {cms_2[x]}", "\n")

print(f"Length of CMS:", len(cms_2))
print(f"Length of exact counter:", len(counter_2))

CMS (eps=0.003, delta=0.05) 

Exact count of (7): 6
Estimated count of (7): 6 

Exact count of (532): 5
Estimated count of (532): 8 

Exact count of (1001): 0
Estimated count of (1001): 3 

Length of CMS: 2721
Length of exact counter: 1988
