Data una tabella di grandezza $n$ voglio trovare tutti gli elementi che si ripetono almeno $\frac{n}{k}$ volte. Questa è un'implementazione naïve che non utilizza hashing universale.

---

Esempio con $k = 2$, algoritmo di Boyer-Moore

In [2]:
import random

T = [ random.randrange(50) for x in range(1000) ]

In [17]:
def boyermoore(T):
    c = 0
    v = 0
    for i in T:
        if c == 0:
            v = i
            c += 1
        elif i == v:
            c += 1
        else:
            c -= 1
    return v

boyermoore(T)

1

---

Innanizitutto creo la struttura count-min sketch

In [1]:
class CMS:
    def __init__(self, l, b):
        self.b = b
        self.M = [ [ 0 for j in range(b) ] for i in range(l) ]
        
    def inc(self, x):
        for r in self.M:
            r[hash(x) % self.b] += 1

    def count(self, x):
        return min([ r[hash(x) % self.b] for r in self.M ])

Ora eseguo `inc` su ogni elemento della mia tabella e con `count` posso avere un'approssimazione del numero di volte in cui appare

In [21]:
from array import array 

M = CMS(10, 50) # 10 tabelle da 50 elementi => 500 righe (|T| = 1000)
[ M.inc(x) for x in T ]
{ x: M.count(x) for x in T }

{49: 24,
 25: 29,
 9: 26,
 35: 17,
 40: 28,
 5: 15,
 16: 23,
 36: 28,
 19: 16,
 43: 19,
 7: 23,
 14: 14,
 31: 17,
 47: 17,
 10: 15,
 20: 19,
 44: 17,
 32: 18,
 22: 22,
 23: 24,
 26: 16,
 45: 23,
 13: 23,
 29: 25,
 48: 20,
 17: 24,
 3: 21,
 34: 19,
 39: 25,
 37: 22,
 41: 22,
 33: 23,
 2: 11,
 24: 14,
 8: 13,
 4: 29,
 42: 16,
 38: 17,
 27: 21,
 11: 22,
 6: 18,
 28: 21,
 1: 19,
 18: 20,
 12: 21,
 15: 21,
 21: 17,
 46: 22,
 30: 15,
 0: 9}

Verifico i risultati

In [16]:
v = dict()
for x in T:
    v[x] = v.setdefault(x, 0) + 1
v

{49: 24,
 25: 29,
 9: 26,
 35: 17,
 40: 28,
 5: 15,
 16: 23,
 36: 28,
 19: 16,
 43: 19,
 7: 23,
 14: 14,
 31: 17,
 47: 17,
 10: 15,
 20: 19,
 44: 17,
 32: 18,
 22: 22,
 23: 24,
 26: 16,
 45: 23,
 13: 23,
 29: 25,
 48: 20,
 17: 24,
 3: 21,
 34: 19,
 39: 25,
 37: 22,
 41: 22,
 33: 23,
 2: 11,
 24: 14,
 8: 13,
 4: 29,
 42: 16,
 38: 17,
 27: 21,
 11: 22,
 6: 18,
 28: 21,
 1: 19,
 18: 20,
 12: 21,
 15: 21,
 21: 17,
 46: 22,
 30: 15,
 0: 9}