# Space-Saving (Algorithm 3.3)

目的: 常に `k` 個のカウンタで頻出要素を追跡。最小カウンタの要素を新規要素で置換し、カウントは `min+1`。

処理:
1. 既存なら +1  
2. 空きがあれば新規で1  
3. それ以外では、最小カウンタ `j` を見つけ、 `c[i] = c[j] + 1` として `i` で置換

性質: トップ頻出に鋭い。MGより高精度なことが多い。


In [None]:

from collections import Counter

def space_saving(stream, k):
    T = {}          # element -> (count)
    n = 0
    for i in stream:
        n += 1
        if i in T:
            T[i] += 1
        elif len(T) < k:
            T[i] = 1
        else:
            # find key with minimum count
            j = min(T, key=T.get)
            minc = T[j]
            # replace
            del T[j]
            T[i] = minc + 1
    return T, n

# demo
import pandas as pd
df = pd.read_csv("/mnt/data/Brighton v Man City LIVE Watchalong!_chat_log.csv", encoding="utf-8", engine="python")
stream = df["author"].astype(str).tolist()
k = 20
T, n = space_saving(stream, k)

true = Counter(stream)
cands = {u: true[u] for u in T}
print("n=", n, "candidates=", len(T))
print(sorted([(u, T[u], cands[u]) for u in T], key=lambda t: -cands[t[0]])[:20])
print("\\nTop-20 ground truth:")
print(true.most_common(20))
