## Online K-Means


Online K-Means (or Incremental K-Means) is a variant of K-Means that updates cluster centers incrementally as new data arrives, instead of reprocessing the full dataset each time.


 🔹 Key Idea

formula:

       μₖ ← μₖ + η (x - μₖ)

 where:
 
   • μₖ: current centroid of cluster k

   • x:  new data point assigned to cluster k

   • η:  learning rate, typically set as η = 1 / nₖ, with nₖ the number of samples assigned so far to cluster k

 This approach allows centroids to adjust incrementally to streaming or sequential data.

🔹 Benefits

	•	Handles streaming data

	•	Requires less memory

	•	Suitable for large datasets
	


🔹 Summary
	•	Batch K-Means: Recomputes centroids using all data each iteration.
	•	Online K-Means: Updates centroids incrementally — ideal for real-time or streaming ML.

In [1]:

from sklearn.cluster import MiniBatchKMeans
from sklearn.datasets import make_blobs

X, _ = make_blobs(n_samples=10000, centers=3, n_features=2, random_state=42)

kmeans = MiniBatchKMeans(n_clusters=3, batch_size=100)
for batch in range(0, len(X), 100):
    kmeans.partial_fit(X[batch:batch+100])  # online update

print(kmeans.cluster_centers_)

[[ 4.64247762  1.97434973]
 [-6.86356135 -6.85601522]
 [-2.51797296  9.01214406]]
