# **📊 Clustering in Unsupervised Learning**

- 🔍 No labeled data — model finds patterns on its own
- 🎯 Groups similar data points into clusters
- 🧠 Used for customer segmentation, image compression, anomaly detection
- 📦 Algorithms try to minimize intra-cluster distance and maximize inter-cluster distance

---

## ⚙️ KMeans Clustering

- 📍 Selects `k` initial centroids randomly
- 🔁 Iteratively assigns points to the nearest centroid
- 📌 Recalculates centroids based on assigned points
- 🚀 Fast and scalable, but needs `k` predefined
- ⚠️ Sensitive to outliers and initial centroid positions

---

## 🧬 Agglomerative Clustering

- 🌲 Bottom-up approach (starts with each point as its own cluster)
- 🔗 Merges closest clusters step by step
- 🧱 Forms a hierarchy (dendrogram) — no need to predefine `k`
- 🐢 Slower on large datasets but interpretable
- 🎓 Good for hierarchical relationships

---

## 🪓 Divisive Clustering

- 🔝 Top-down approach (starts with all data in one cluster)
- ✂️ Recursively splits clusters into smaller ones
- 🧩 Less common than agglomerative
- 🧮 Can be computationally intensive
- 🔍 Useful when you know global structure better than local

---

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
from sklearn.cluster import AgglomerativeClustering 
from sklearn.decomposition import PCA
from sklearn.metrics import silhouette_score, davies_bouldin_score,adjusted_rand_score

In [None]:
# Dataset

iris = load_iris()
x = iris.data
print(x)

In [None]:
# KMeans clustering

kmeans = KMeans(n_clusters=3, random_state=75)
labels = kmeans.fit_predict(x)
print(labels)

In [None]:
# Agglomerative clustering

model = AgglomerativeClustering(n_clusters=3)
labels = model.fit_predict(x)
print(labels)

In [None]:
# Clusters in 2D

pca = PCA(n_components=2)
x_reduced = pca.fit_transform(x)
print(x_reduced)

In [None]:
# Visualization

plt.figure(figsize=(7,5))
plt.scatter(x_reduced[:,0],x_reduced[:,1],c=labels)
plt.grid()
plt.show()

In [None]:
# Model performance

silhoutte = silhouette_score(x, labels)
devies = davies_bouldin_score(x, labels)
adj = adjusted_rand_score(iris.target, labels)
print(f"Silhoutte Score: {silhoutte:.3f}\nDevies Bouldin Score: {devies:.3f}\nAdjusted Rand Score: {adj:.3f}")