## K-Means Clustering

K-Means is an unsupervised learning algorithm used for clustering data into groups (clusters) based on similar patterns or features.

It doesn't use labeled data. Instead, it tries to find structure or patterns in the data by grouping similar data points together.

🔶 Real-Life Example

Think of K-Means like this:

Imagine you have a box of mixed fruits (apples, bananas, oranges) but no labels. K-Means will group similar fruits together based on shape, color, and size—without knowing their actual names.

🔶 How K-Means Works (Step-by-Step)

Suppose you want to divide your data into K clusters (e.g., K=3):

1. Initialize Centroids
Randomly choose K points from the dataset as the initial centroids (center of clusters).

2. Assign Points to Nearest Centroid
For each data point, calculate the distance to each centroid.

Assign the point to the closest centroid's cluster.

3. Update Centroids
After assigning all points, recalculate the centroid of each cluster by taking the mean of the points in that cluster.

4. Repeat
Repeat steps 2 and 3 until:

Centroids do not change much, or

A maximum number of iterations is reached.

In [9]:
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# Sample Data
X = [[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]

# KMeans clustering
kmeans = KMeans(n_clusters=2, random_state=0)
kmeans.fit(X)

print(kmeans.labels_)  # Cluster labels
print(kmeans.cluster_centers_)  # Coordinates of centroids


[1 1 1 0 0 0]
[[10.  2.]
 [ 1.  2.]]




🔶 Advantages
✅ Simple and easy to implement
✅ Fast and efficient on large datasets
✅ Works well when clusters are spherical and equally sized

🔶 Disadvantages
❌ Must choose K manually
❌ Sensitive to outliers
❌ Doesn't work well with non-spherical clusters

✅ Summary
Feature	               K-Means
Type	              Unsupervised
Goal	              Group similar data
Input	              Unlabeled data
Output	              Cluster labels
Key Hyperparameter	  Number of clusters (K)