# Hierarchical Clustering

## Problem Type
**Hierarchical Clustering** is primarily used for:
- **Clustering** problems
- **Unsupervised** learning

### How Hierarchical Clustering Works
- **Hierarchical structure:** Builds a hierarchy of clusters by either merging smaller clusters into larger ones (agglomerative) or splitting larger clusters into smaller ones (divisive).
- **Agglomerative (Bottom-Up) Approach:**
  - Starts with each data point as its own cluster.
  - Iteratively merges the closest pairs of clusters until all points are in a single cluster or a stopping criterion is met.
- **Divisive (Top-Down) Approach:**
  - Starts with all data points in a single cluster.
  - Iteratively splits clusters into smaller clusters until each point is its own cluster or a stopping criterion is met.
- **Distance metrics:**
  - Uses distance or similarity metrics like Euclidean, Manhattan, or Cosine to measure the closeness of clusters.
- **Linkage criteria:**
  - **Single Linkage:** Merges clusters based on the minimum distance between points in the clusters.
  - **Complete Linkage:** Merges clusters based on the maximum distance between points in the clusters.
  - **Average Linkage:** Merges clusters based on the average distance between points in the clusters.
  - **Ward's Method:** Minimizes the variance within clusters when merging.
- **Dendrogram:** A tree-like diagram that records the sequence of merges or splits, providing a visual representation of the cluster hierarchy.

### Key Tuning Metrics
- **`n_clusters`:**
  - **Description:** The number of clusters to form after the hierarchical process is completed.
  - **Impact:** Directly affects the final grouping of the data; choosing too few or too many clusters can lead to under- or over-segmentation.
  - **Default:** Determined by the structure of the dendrogram.
- **`linkage`:**
  - **Description:** Determines which method to use when calculating the distance between clusters.
  - **Impact:** Affects the hierarchy of clusters; `ward` tends to create more compact clusters, while `single` can lead to elongated clusters.
  - **Default:** `ward`.
- **`distance_threshold`:**
  - **Description:** The linkage distance above which clusters will not be merged.
  - **Impact:** Controls the depth of the dendrogram; lower values result in more clusters.
  - **Default:** `None` (all clusters are merged).

### Pros vs Cons

| Pros                                                  | Cons                                                   |
|-------------------------------------------------------|--------------------------------------------------------|
| Provides a visual representation (dendrogram)         | Computationally expensive, especially with large datasets |
| Does not require a predefined number of clusters      | Sensitive to noise and outliers                        |
| Can capture complex cluster structures                | Less effective with very large datasets                |
| Useful for small to medium datasets                   | Choosing the right linkage and distance metric can be challenging |
| Easily interpretable hierarchy of clusters            | Merging or splitting decisions are irreversible        |

### Evaluation Metrics
- **Silhouette Score:**
  - **Description:** Measures how similar an object is to its own cluster compared to other clusters.
  - **Good Value:** Closer to 1 indicates well-defined clusters.
  - **Bad Value:** Values near 0 suggest overlapping clusters, and negative values indicate incorrect clustering.
- **Davies-Bouldin Index:**
  - **Description:** Average similarity ratio of each cluster with the cluster most similar to it.
  - **Good Value:** Lower values indicate better cluster separation and compactness.
  - **Bad Value:** Higher values suggest poor cluster separation and compactness.



In [None]:
import matplotlib.pyplot as plt
from sklearn.cluster import AgglomerativeClustering
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.metrics import davies_bouldin_score, silhouette_score
from sklearn.preprocessing import StandardScaler

In [None]:
# Load the Iris dataset
iris = load_iris()
X = iris.data

# Standardize the dataset
scaler = StandardScaler()
X_std = scaler.fit_transform(X)

In [None]:
# Perform hierarchical clustering using AgglomerativeClustering
hierarchical_cluster = AgglomerativeClustering(
    n_clusters=3,  linkage="ward", distance_threshold=None
)
labels = hierarchical_cluster.fit_predict(X_std)

In [None]:
# Silhouette Score
silhouette_avg = silhouette_score(X_std, labels)
print(f"Silhouette Score: {silhouette_avg:.4f}")

# Davies-Bouldin Index
db_index = davies_bouldin_score(X_std, labels)
print(f"Davies-Bouldin Index: {db_index:.4f}")

In [None]:
# Reduce dimensionality with PCA for visualization
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_std)

# Plot the clusters
plt.figure(figsize=(10, 7))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=labels, cmap='viridis', marker='o', edgecolor='k', s=100)
plt.title('Clusters Visualized after Hierarchical Clustering')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.colorbar(label='Cluster Label')
plt.show()