# 2.3 - Unsupervised Learning & Clustering: Finding Hidden Patterns

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/madeforai/madeforai/blob/main/docs/understanding-ai/module-2/2.3-unsupervised-clustering.ipynb)

---

**Discover patterns without labels‚Äîmaster clustering algorithms that find structure in the wild.**

## üìö What You'll Learn

- **Unsupervised learning fundamentals**: When you don't have labels and why that's powerful
- **K-Means clustering**: The workhorse algorithm for grouping similar data
- **DBSCAN**: Density-based clustering for complex, non-spherical shapes
- **Hierarchical clustering**: Building cluster trees to understand data structure
- **Evaluation metrics**: How to measure cluster quality without ground truth
- **Real-world applications**: Customer segmentation, anomaly detection, data exploration

## ‚è±Ô∏è Estimated Time
40-45 minutes

## üìã Prerequisites
- Completed Chapter 2.2 (Classification vs Regression)
- Understanding of supervised learning concepts
- Basic familiarity with distance metrics

## ü§î The World Without Labels

Imagine you're exploring a massive dataset from a new e-commerce platform. You have:
- Customer behavior data
- Purchase patterns
- Browsing history

**But here's the catch**: Nobody has labeled this data. No "high-value customer" tags, no "product category" classifications.

**Your mission**: Find meaningful groups of similar customers to personalize marketing.

This is **unsupervised learning**‚Äîdiscovering structure without guidance.

### Why Unsupervised Learning?

**Reality check**: Most real-world data is unlabeled.

| Scenario | Labeling Challenge |
|----------|--------------------|
| **Medical imaging** | Requires expert radiologists (expensive!) |
| **Customer data** | Behavior emerges over time (can't pre-label) |
| **Anomaly detection** | Anomalies haven't happened yet |
| **Data exploration** | You don't know what patterns exist |

**Unsupervised learning says**: "Let the data speak for itself."

<!-- [PLACEHOLDER IMAGE]
Prompt for image generation:
"Create a comparison illustration showing supervised vs unsupervised learning.
Style: Modern, clean infographic.
Left side (Supervised): Labeled data points with clear categories (red circles, blue squares), connected with arrows to predictions. Show 'Training Labels' box feeding into the model.
Right side (Unsupervised): Unlabeled gray points naturally clustering into groups with dotted boundaries around them. Show 'No Labels' with a question mark.
Center: Arrow showing transformation from unlabeled to discovered clusters.
Color scheme: Blue/orange gradient.
Include title: 'Supervised vs Unsupervised Learning'
Format: Horizontal comparison, 16:9 ratio." -->

**The Core Challenge**: How do we measure success when there's no "correct answer" to compare against?

**Answer**: We evaluate based on:
1. **Cohesion**: How similar are items within a cluster?
2. **Separation**: How distinct are different clusters?
3. **Business value**: Do clusters correspond to actionable insights?

Let's dive in! üöÄ

In [None]:
# Setup: Install and import libraries
# Uncomment if running in Google Colab
# !pip install numpy pandas matplotlib seaborn scikit-learn plotly scipy -q

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.spatial.distance import cdist

from sklearn.datasets import make_blobs, make_moons, make_circles
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering
from sklearn.metrics import (
    silhouette_score, 
    davies_bouldin_score,
    calinski_harabasz_score
)
from sklearn.decomposition import PCA

# Visualization settings
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
warnings.filterwarnings('ignore')
np.random.seed(42)

# Set up better figure defaults
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11

print("‚úÖ Libraries loaded successfully!")
print("üìò Module 2.3: Unsupervised Learning & Clustering")
print("üîç Ready to discover hidden patterns!")

## üìã Part 1: K-Means Clustering - The Classic Algorithm

### How K-Means Works

K-Means is like organizing a party:

1. **Choose K hosts** (cluster centers) randomly
2. **Each guest** sits with their nearest host
3. **Hosts move** to the center of their group
4. **Repeat** until hosts stop moving

**Mathematically**:
- Minimize within-cluster sum of squares (WCSS)
- Each point assigned to nearest centroid
- Centroids updated as mean of assigned points

**Key Insight**: K-Means assumes spherical clusters of similar size. When this assumption breaks, K-Means struggles!

### The Algorithm in 4 Steps

<!-- [PLACEHOLDER IMAGE]
Prompt for image generation:
"Create a 4-panel sequential illustration showing K-Means algorithm steps.
Style: Educational diagram with clean geometry.
Panel 1 (Step 1): Random scattered data points (gray) with 3 randomly placed centroids (red stars). Title: '1. Initialize K centroids'
Panel 2 (Step 2): Points colored by nearest centroid (3 different colors). Dotted lines connecting points to nearest centroid. Title: '2. Assign points to nearest centroid'
Panel 3 (Step 3): Centroids moved to center of their colored groups. Arrows showing movement. Title: '3. Update centroids to cluster means'
Panel 4 (Step 4): Final stable clusters with well-separated groups. Title: '4. Repeat until convergence'
Color scheme: Blue, orange, green for the 3 clusters.
Include legend showing centroid and data point symbols.
Format: 2x2 grid layout, square overall aspect ratio." -->

Let's implement this from scratch to understand it deeply!

In [None]:
# Generate sample data: Customer segments based on purchase behavior
# Features: Average purchase value and purchase frequency
X, y_true = make_blobs(
    n_samples=300,
    centers=4,
    n_features=2,
    cluster_std=0.60,
    random_state=42
)

# Scale the data (important for distance-based algorithms!)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Visualize raw data
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.scatter(X[:, 0], X[:, 1], alpha=0.6, c='gray', s=50)
plt.title('Unlabeled Customer Data', fontsize=14, fontweight='bold')
plt.xlabel('Average Purchase Value ($)', fontsize=12)
plt.ylabel('Purchase Frequency (per month)', fontsize=12)
plt.grid(alpha=0.3)

plt.subplot(1, 2, 2)
plt.scatter(X[:, 0], X[:, 1], alpha=0.6, c=y_true, cmap='husl', s=50)
plt.title('True Groups (Unknown in Real World!)', fontsize=14, fontweight='bold')
plt.xlabel('Average Purchase Value ($)', fontsize=12)
plt.ylabel('Purchase Frequency (per month)', fontsize=12)
plt.colorbar(label='True Cluster')
plt.grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("üí° Notice: We can see patterns by eye, but K-Means will find them mathematically!")

### Choosing K: The Elbow Method

**The hardest question**: How many clusters should we have?

**The Elbow Method**:
1. Try different values of K (e.g., 2-10)
2. Calculate inertia (within-cluster sum of squares) for each K
3. Plot inertia vs K
4. Look for the "elbow"‚Äîwhere adding more clusters gives diminishing returns

**Think of it like**: Adding more party hosts. Initially, each new host dramatically improves the party (less crowding). Eventually, adding hosts barely helps.

The "elbow" is where the benefit plateaus.

In [None]:
# The Elbow Method: Finding optimal K
inertias = []
silhouette_scores = []
K_range = range(2, 11)

for k in K_range:
    kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
    kmeans.fit(X_scaled)
    
    # Inertia: sum of squared distances to nearest cluster center
    inertias.append(kmeans.inertia_)
    
    # Silhouette score: measures how similar points are to their own cluster
    # vs other clusters (range: -1 to 1, higher is better)
    silhouette_scores.append(silhouette_score(X_scaled, kmeans.labels_))

# Visualize the elbow
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Elbow curve (Inertia)
axes[0].plot(K_range, inertias, 'bo-', linewidth=2.5, markersize=8)
axes[0].set_xlabel('Number of Clusters (K)', fontsize=13, fontweight='bold')
axes[0].set_ylabel('Inertia (WCSS)', fontsize=13, fontweight='bold')
axes[0].set_title('Elbow Method: Inertia', fontsize=15, fontweight='bold', pad=15)
axes[0].grid(alpha=0.3)
axes[0].axvline(x=4, color='red', linestyle='--', alpha=0.7, label='Optimal K=4')
axes[0].legend(fontsize=11)

# Plot 2: Silhouette score
axes[1].plot(K_range, silhouette_scores, 'go-', linewidth=2.5, markersize=8)
axes[1].set_xlabel('Number of Clusters (K)', fontsize=13, fontweight='bold')
axes[1].set_ylabel('Silhouette Score', fontsize=13, fontweight='bold')
axes[1].set_title('Silhouette Analysis', fontsize=15, fontweight='bold', pad=15)
axes[1].grid(alpha=0.3)
axes[1].axvline(x=4, color='red', linestyle='--', alpha=0.7, label='Peak at K=4')
axes[1].legend(fontsize=11)

plt.tight_layout()
plt.show()

print("\nüìä Analysis:")
print(f"‚Ä¢ Elbow appears around K=4 (inertia curve flattens)")
print(f"‚Ä¢ Silhouette score peaks at K=4 ({silhouette_scores[2]:.3f})")
print(f"‚Ä¢ Both methods agree: K=4 is optimal!")
print("\nüí° Tip: When methods disagree, consider business context!")

### Applying K-Means with Optimal K

In [None]:
# Apply K-Means with K=4
kmeans = KMeans(n_clusters=4, random_state=42, n_init=10)
clusters = kmeans.fit_predict(X_scaled)
centroids = kmeans.cluster_centers_

# Transform centroids back to original scale for interpretation
centroids_original = scaler.inverse_transform(centroids)

# Visualize results
plt.figure(figsize=(14, 6))

plt.subplot(1, 2, 1)
scatter = plt.scatter(X[:, 0], X[:, 1], c=clusters, cmap='husl', 
                     alpha=0.6, s=50, edgecolors='black', linewidth=0.5)
plt.scatter(centroids_original[:, 0], centroids_original[:, 1], 
           c='red', marker='X', s=300, edgecolors='black', linewidth=2,
           label='Centroids')
plt.title('K-Means Clustering Result (K=4)', fontsize=15, fontweight='bold', pad=15)
plt.xlabel('Average Purchase Value ($)', fontsize=12)
plt.ylabel('Purchase Frequency (per month)', fontsize=12)
plt.legend(fontsize=11)
plt.colorbar(scatter, label='Cluster')
plt.grid(alpha=0.3)

plt.subplot(1, 2, 2)
# Show confusion-style comparison with true labels
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_true, clusters)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=True)
plt.title('Cluster Assignments vs True Labels', fontsize=15, fontweight='bold', pad=15)
plt.xlabel('K-Means Cluster', fontsize=12)
plt.ylabel('True Label', fontsize=12)

plt.tight_layout()
plt.show()

# Interpret cluster centers
print("\nüéØ Customer Segment Insights:")
print("="*60)
for i, center in enumerate(centroids_original):
    print(f"\nüìä Cluster {i}:")
    print(f"   ‚Ä¢ Average Purchase: ${center[0]:.2f}")
    print(f"   ‚Ä¢ Purchase Frequency: {center[1]:.2f} times/month")
    print(f"   ‚Ä¢ Size: {np.sum(clusters == i)} customers")
    
    # Business interpretation
    if center[0] > X[:, 0].mean() and center[1] > X[:, 1].mean():
        print(f"   üíé **Premium Frequent Buyers** - High value, high engagement")
    elif center[0] > X[:, 0].mean() and center[1] < X[:, 1].mean():
        print(f"   üéØ **Occasional Big Spenders** - High value, low frequency")
    elif center[0] < X[:, 0].mean() and center[1] > X[:, 1].mean():
        print(f"   üîÑ **Frequent Small Buyers** - Low value, high engagement")
    else:
        print(f"   üò¥ **Low Engagement** - Consider re-engagement campaigns")

## üìã Part 2: DBSCAN - Density-Based Clustering

### When K-Means Fails

K-Means has a fatal flaw: it assumes spherical clusters!

**DBSCAN (Density-Based Spatial Clustering of Applications with Noise)** solves this by:
- Finding regions of high density
- Connecting nearby dense regions
- Marking sparse points as noise/outliers

**Key Parameters**:
- `eps` (epsilon): Maximum distance between two points to be neighbors
- `min_samples`: Minimum points to form a dense region

**Advantages over K-Means**:
‚úÖ Handles arbitrary shapes (moons, rings, spirals)
‚úÖ Automatically detects outliers
‚úÖ No need to specify number of clusters

**Disadvantages**:
‚ùå Sensitive to parameter tuning
‚ùå Struggles with varying densities
‚ùå Doesn't scale as well to very high dimensions

<!-- [PLACEHOLDER IMAGE]
Prompt for image generation:
"Create a side-by-side comparison showing K-Means vs DBSCAN on non-spherical data.
Style: Educational comparison diagram.
Left panel: Two crescent moon shapes interlocked. K-Means clustering result showing incorrect vertical split (failing to capture moon shapes). Title: 'K-Means Fails on Non-Spherical Data'
Right panel: Same moon shapes correctly clustered by DBSCAN, each moon in different color. Outliers shown as black dots. Title: 'DBSCAN Handles Complex Shapes'
Both panels show cluster boundaries with dotted lines.
Color scheme: Blue and orange for clusters, black for outliers.
Include legend explaining core points, border points, and noise.
Format: Side-by-side, 16:9 ratio." -->

Let's see DBSCAN in action!

In [None]:
# Generate complex-shaped data that K-Means can't handle
X_moons, _ = make_moons(n_samples=300, noise=0.05, random_state=42)
X_circles, _ = make_circles(n_samples=300, noise=0.05, factor=0.5, random_state=42)

# Scale data
X_moons_scaled = StandardScaler().fit_transform(X_moons)
X_circles_scaled = StandardScaler().fit_transform(X_circles)

datasets = [
    (X_moons_scaled, X_moons, 'Two Moons'),
    (X_circles_scaled, X_circles, 'Concentric Circles')
]

fig, axes = plt.subplots(2, 3, figsize=(16, 10))

for idx, (X_scaled, X_orig, title) in enumerate(datasets):
    # Original data
    axes[idx, 0].scatter(X_orig[:, 0], X_orig[:, 1], alpha=0.6, c='gray', s=30)
    axes[idx, 0].set_title(f'{title}\n(Unlabeled)', fontsize=13, fontweight='bold')
    axes[idx, 0].grid(alpha=0.3)
    
    # K-Means (will fail!)
    kmeans = KMeans(n_clusters=2, random_state=42)
    kmeans_labels = kmeans.fit_predict(X_scaled)
    axes[idx, 1].scatter(X_orig[:, 0], X_orig[:, 1], c=kmeans_labels, 
                        cmap='husl', alpha=0.6, s=30)
    axes[idx, 1].set_title(f'K-Means (K=2)\n‚ùå Poor Results', 
                           fontsize=13, fontweight='bold', color='red')
    axes[idx, 1].grid(alpha=0.3)
    
    # DBSCAN (will succeed!)
    dbscan = DBSCAN(eps=0.3, min_samples=5)
    dbscan_labels = dbscan.fit_predict(X_scaled)
    
    # Separate core points from noise
    core_samples_mask = np.zeros_like(dbscan_labels, dtype=bool)
    core_samples_mask[dbscan.core_sample_indices_] = True
    
    # Plot DBSCAN results
    unique_labels = set(dbscan_labels)
    colors = plt.cm.husl(np.linspace(0, 1, len(unique_labels)))
    
    for k, col in zip(unique_labels, colors):
        if k == -1:
            # Noise points (outliers)
            col = 'black'
            marker = 'x'
        else:
            marker = 'o'
        
        class_member_mask = (dbscan_labels == k)
        xy = X_orig[class_member_mask & core_samples_mask]
        axes[idx, 2].scatter(xy[:, 0], xy[:, 1], c=[col], marker=marker,
                            s=30, alpha=0.6, edgecolors='black', linewidth=0.5)
        
        xy = X_orig[class_member_mask & ~core_samples_mask]
        axes[idx, 2].scatter(xy[:, 0], xy[:, 1], c=[col], marker=marker,
                            s=15, alpha=0.3)
    
    n_clusters = len(set(dbscan_labels)) - (1 if -1 in dbscan_labels else 0)
    n_noise = list(dbscan_labels).count(-1)
    
    axes[idx, 2].set_title(f'DBSCAN\n‚úÖ Clusters: {n_clusters}, Noise: {n_noise}', 
                          fontsize=13, fontweight='bold', color='green')
    axes[idx, 2].grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüéØ Key Observations:")
print("="*60)
print("‚úÖ DBSCAN correctly identifies complex shapes")
print("‚úÖ Automatically detects and removes outliers (noise)")
print("‚ùå K-Means forces spherical clusters, misses the pattern")
print("\nüí° Lesson: Choose your algorithm based on data shape!")

## üìã Part 3: Hierarchical Clustering - Building Cluster Trees

### The Family Tree of Data

Hierarchical clustering builds a tree (dendrogram) showing how data points group at different scales.

**Two Approaches**:

1. **Agglomerative (Bottom-Up)**:
   - Start: Each point is its own cluster
   - Repeat: Merge the two closest clusters
   - End: All points in one cluster

2. **Divisive (Top-Down)**:
   - Start: All points in one cluster
   - Repeat: Split clusters recursively
   - End: Each point is its own cluster

**Linkage Methods** (how to measure cluster distance):
- **Single**: Minimum distance between any two points
- **Complete**: Maximum distance between any two points
- **Average**: Average distance between all pairs
- **Ward**: Minimizes within-cluster variance (most common)

**Advantages**:
‚úÖ No need to specify K upfront
‚úÖ Produces hierarchy showing relationships
‚úÖ Deterministic (same result every time)

**Disadvantages**:
‚ùå Computationally expensive (O(n¬≥) or O(n¬≤) with optimization)
‚ùå Can't undo merges (greedy)

<!-- [PLACEHOLDER IMAGE]
Prompt for image generation:
"Create an educational diagram showing hierarchical clustering dendrogram.
Style: Professional scientific illustration.
Top: Dendrogram (tree diagram) showing hierarchical merging of clusters. Vertical axis labeled 'Distance', horizontal axis showing data point labels.
Different height levels with horizontal lines indicating merge points.
Bottom: Corresponding data points at the leaf level, connected to tree above.
Show cutoff line (dashed red) at different heights resulting in 2, 3, and 4 clusters.
Color-code clusters with consistent colors from dendrogram to data points.
Include annotations explaining: 'Cutting the tree at different heights gives different number of clusters'
Color scheme: Professional blue-green gradient for tree, multicolor for clusters.
Format: Vertical layout optimized for understanding tree structure." -->

In [None]:
# Use our original customer data
# Perform hierarchical clustering
linkage_matrix = linkage(X_scaled, method='ward')

# Create the dendrogram
plt.figure(figsize=(14, 7))

plt.subplot(1, 2, 1)
dendrogram(linkage_matrix, 
          truncate_mode='lastp',  # Show only the last p merged clusters
          p=30,  # Show last 30 merges
          leaf_rotation=90,
          leaf_font_size=10)
plt.title('Hierarchical Clustering Dendrogram', fontsize=15, fontweight='bold', pad=15)
plt.xlabel('Sample Index or (Cluster Size)', fontsize=12)
plt.ylabel('Distance (Ward Linkage)', fontsize=12)
plt.axhline(y=6, color='red', linestyle='--', linewidth=2, label='Cut at 4 clusters')
plt.legend(fontsize=11)
plt.grid(alpha=0.3)

# Apply hierarchical clustering with 4 clusters
hierarchical = AgglomerativeClustering(n_clusters=4, linkage='ward')
hierarchical_labels = hierarchical.fit_predict(X_scaled)

plt.subplot(1, 2, 2)
scatter = plt.scatter(X[:, 0], X[:, 1], c=hierarchical_labels, 
                     cmap='husl', alpha=0.6, s=50, edgecolors='black', linewidth=0.5)
plt.title('Hierarchical Clustering Result (K=4)', fontsize=15, fontweight='bold', pad=15)
plt.xlabel('Average Purchase Value ($)', fontsize=12)
plt.ylabel('Purchase Frequency (per month)', fontsize=12)
plt.colorbar(scatter, label='Cluster')
plt.grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüìä Dendrogram Interpretation:")
print("="*60)
print("‚Ä¢ Height of merge = dissimilarity between clusters")
print("‚Ä¢ Cut the tree at different heights ‚Üí different K")
print("‚Ä¢ We cut at height ~6 to get 4 distinct clusters")
print("\nüí° Pro Tip: Use the dendrogram to choose optimal K!")

## üìã Part 4: Evaluating Clustering Quality

### The Challenge: No Ground Truth

Unlike supervised learning, we don't have labels to check against. So how do we know if clustering is good?

### Internal Validation Metrics

**1. Silhouette Score** (Range: -1 to 1, higher is better)
- Measures how similar a point is to its own cluster vs. other clusters
- Formula: `s = (b - a) / max(a, b)`
  - `a` = average distance to points in same cluster
  - `b` = average distance to points in nearest other cluster
- **Interpretation**:
  - Close to +1: Well clustered
  - Close to 0: On cluster boundary
  - Negative: Probably in wrong cluster

**2. Davies-Bouldin Index** (Lower is better)
- Ratio of within-cluster to between-cluster distances
- Simpler than silhouette, less computationally expensive

**3. Calinski-Harabasz Index** (Higher is better)
- Ratio of between-cluster to within-cluster variance
- Also called Variance Ratio Criterion

**‚ö†Ô∏è Important**: Metrics can disagree! Use multiple metrics + domain knowledge.

In [None]:
# Compare all three clustering methods
methods = {
    'K-Means': clusters,
    'DBSCAN (on blobs)': DBSCAN(eps=0.5, min_samples=5).fit_predict(X_scaled),
    'Hierarchical': hierarchical_labels
}

results = []

print("\nüéØ Clustering Method Comparison")
print("="*70)

for name, labels in methods.items():
    # Skip if all points are noise (DBSCAN edge case)
    if len(set(labels)) <= 1:
        continue
    
    # Calculate metrics
    silhouette = silhouette_score(X_scaled, labels)
    davies_bouldin = davies_bouldin_score(X_scaled, labels)
    calinski = calinski_harabasz_score(X_scaled, labels)
    n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
    
    results.append({
        'Method': name,
        'Clusters': n_clusters,
        'Silhouette': silhouette,
        'Davies-Bouldin': davies_bouldin,
        'Calinski-Harabasz': calinski
    })

# Create comparison DataFrame
results_df = pd.DataFrame(results)
print(results_df.to_string(index=False))

print("\nüìä Metric Interpretation:")
print("="*70)
print("Silhouette Score: Higher is better (max = 1.0)")
print("Davies-Bouldin Index: Lower is better (min = 0.0)")
print("Calinski-Harabasz Index: Higher is better")

# Visualize silhouette analysis for K-Means
from sklearn.metrics import silhouette_samples
import matplotlib.cm as cm

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Silhouette plot
silhouette_vals = silhouette_samples(X_scaled, clusters)
y_lower = 10

for i in range(4):
    cluster_silhouette_vals = silhouette_vals[clusters == i]
    cluster_silhouette_vals.sort()
    
    size_cluster = cluster_silhouette_vals.shape[0]
    y_upper = y_lower + size_cluster
    
    color = cm.husl(float(i) / 4)
    axes[0].fill_betweenx(np.arange(y_lower, y_upper),
                         0, cluster_silhouette_vals,
                         facecolor=color, edgecolor=color, alpha=0.7)
    
    axes[0].text(-0.05, y_lower + 0.5 * size_cluster, str(i), 
                fontsize=12, fontweight='bold')
    
    y_lower = y_upper + 10

axes[0].set_xlabel('Silhouette Coefficient', fontsize=13, fontweight='bold')
axes[0].set_ylabel('Cluster', fontsize=13, fontweight='bold')
axes[0].set_title('Silhouette Plot for K-Means', fontsize=15, fontweight='bold', pad=15)
axes[0].axvline(x=silhouette_score(X_scaled, clusters), color='red', 
               linestyle='--', linewidth=2, label=f'Average: {silhouette_score(X_scaled, clusters):.3f}')
axes[0].legend(fontsize=11)
axes[0].grid(alpha=0.3)

# Metric comparison bar chart
x_pos = np.arange(len(results_df))
axes[1].bar(x_pos - 0.2, results_df['Silhouette'], 0.4, 
           label='Silhouette (‚Üë)', alpha=0.8, color='skyblue')
axes[1].bar(x_pos + 0.2, 1 / (1 + results_df['Davies-Bouldin']), 0.4, 
           label='1/(1+DB) (‚Üë)', alpha=0.8, color='lightcoral')
axes[1].set_xticks(x_pos)
axes[1].set_xticklabels(results_df['Method'], rotation=15, ha='right')
axes[1].set_ylabel('Normalized Score', fontsize=12)
axes[1].set_title('Method Comparison (Higher is Better)', fontsize=15, fontweight='bold', pad=15)
axes[1].legend(fontsize=11)
axes[1].grid(alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("\nüí° Best Practice: Combine metrics with visual inspection!")

## üéØ Exercise 1: Real-World Anomaly Detection

**Objective**: Use clustering to detect anomalies in credit card transactions

**Scenario**: You have transaction data with features:
- Transaction amount
- Time of day
- Merchant category

**Task**:
1. Generate synthetic transaction data with some outliers
2. Apply DBSCAN to identify normal vs. anomalous transactions
3. Visualize the results and calculate the anomaly rate
4. Bonus: Try different `eps` values and see how it affects detection

<details>
<summary>üí° Hint 1: Generating data</summary>

```python
# Normal transactions
normal_transactions = np.random.normal([50, 12], [20, 3], (200, 2))

# Anomalies (unusual amounts/times)
anomalies = np.random.uniform([200, 1], [500, 23], (20, 2))

# Combine
X_transactions = np.vstack([normal_transactions, anomalies])
```
</details>

<details>
<summary>üí° Hint 2: DBSCAN parameters</summary>

Start with `eps=0.5` and `min_samples=5`. Points labeled as -1 are anomalies!
</details>

**Expected Output**: 
- A scatter plot showing normal transactions in clusters and anomalies as black crosses
- Anomaly detection rate (should be close to 10% if you used the hint)

In [None]:
# Your code here!
# Detect anomalies in transaction data






## üéØ Exercise 2: Customer Segmentation Strategy

**Objective**: Build a complete customer segmentation pipeline

**Dataset**: Generate synthetic e-commerce data with:
- Recency: Days since last purchase
- Frequency: Number of purchases
- Monetary: Total amount spent

This is called **RFM Analysis** in marketing!

**Tasks**:
1. Generate RFM data for 500 customers
2. Scale the features (important!)
3. Use the elbow method to find optimal K
4. Apply K-Means clustering
5. Profile each cluster and name them (e.g., "VIP Customers", "At Risk", etc.)
6. Create actionable marketing recommendations for each segment

<details>
<summary>üí° Hint: Generating RFM data</summary>

```python
recency = np.random.exponential(30, 500)  # Days since last purchase
frequency = np.random.poisson(5, 500)  # Number of purchases
monetary = np.random.gamma(100, 5, 500)  # Total spent

rfm_data = np.column_stack([recency, frequency, monetary])
```
</details>

**Challenge**: 
- Profile clusters: Calculate mean RFM values for each cluster
- Business naming: Give clusters business-friendly names
- Marketing actions: Suggest specific campaigns for each segment

In [None]:
# Your code here!
# Build complete customer segmentation






## üéì Key Takeaways

You've mastered unsupervised learning and clustering!

- ‚úÖ **Unsupervised Learning Fundamentals**:
  - No labels required‚Äîdiscover patterns autonomously
  - Most real-world data is unlabeled
  - Evaluation is trickier but doable with internal metrics

- ‚úÖ **K-Means Clustering**:
  - Fast, simple, and widely used
  - Assumes spherical clusters of similar size
  - Use elbow method + silhouette score to choose K
  - Perfect for customer segmentation and data exploration

- ‚úÖ **DBSCAN**:
  - Handles arbitrary shapes and automatically detects outliers
  - No need to specify K upfront
  - Requires careful parameter tuning (eps, min_samples)
  - Ideal for anomaly detection and complex patterns

- ‚úÖ **Hierarchical Clustering**:
  - Produces interpretable dendrograms
  - Deterministic results (no randomness)
  - Computationally expensive for large datasets
  - Great for understanding data structure at multiple scales

- ‚úÖ **Evaluation Metrics**:
  - Silhouette Score: Measures cluster cohesion and separation
  - Davies-Bouldin Index: Cluster compactness ratio
  - Calinski-Harabasz: Variance ratio
  - Always use multiple metrics + visual inspection

### ü§î The Big Picture:

**Algorithm Selection Guide**:

| Use Case | Best Algorithm | Why? |
|----------|---------------|------|
| **Customer segmentation** | K-Means | Fast, interpretable, works with business metrics |
| **Anomaly detection** | DBSCAN | Automatically identifies outliers |
| **Complex shapes** | DBSCAN | Doesn't assume spherical clusters |
| **Understanding hierarchy** | Hierarchical | Provides multi-scale view |
| **Large datasets** | K-Means | Scales better than hierarchical |

**When to use clustering**:
1. ‚úÖ Exploratory data analysis
2. ‚úÖ Customer/user segmentation
3. ‚úÖ Anomaly and outlier detection
4. ‚úÖ Feature engineering for supervised learning
5. ‚úÖ Image segmentation in computer vision
6. ‚úÖ Document clustering in NLP

**Remember**: Clustering is an art as much as a science. Domain expertise is crucial for interpreting results! üéØ

## üìñ Further Learning

**Recommended Reading**:
- [Scikit-learn Clustering Guide](https://scikit-learn.org/stable/modules/clustering.html) - Comprehensive documentation
- [StatQuest: K-Means](https://www.youtube.com/watch?v=4b5d3muPQmA) - Excellent visual explanation
- [DBSCAN Visualized](https://www.naftaliharris.com/blog/visualizing-dbscan-clustering/) - Interactive demo

**Deep Dives**:
- [Clustering Validation](https://en.wikipedia.org/wiki/Cluster_analysis#Internal_evaluation) - Understanding evaluation metrics
- [Advanced Clustering](https://scikit-learn.org/stable/modules/clustering.html#overview-of-clustering-methods) - HDBSCAN, Spectral Clustering, and more
- [Curse of Dimensionality](https://www.youtube.com/watch?v=QZ0DtNFdDko) - Why clustering gets harder in high dimensions

**Practical Applications**:
- [Customer Segmentation with Python](https://www.kaggle.com/code/fabiendaniel/customer-segmentation) - Real retail data
- [Anomaly Detection Tutorial](https://scikit-learn.org/stable/auto_examples/applications/plot_outlier_detection_wine.html) - Wine quality dataset
- [Image Compression with K-Means](https://scikit-learn.org/stable/auto_examples/cluster/plot_color_quantization.html) - Color quantization

**Research Papers** (for the curious):
- [Original K-Means](https://web.stanford.edu/~hastie/Papers/gap.pdf) - Gap statistic for estimating K
- [DBSCAN Paper](https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf) - The original algorithm
- [Silhouette Method](https://www.sciencedirect.com/science/article/pii/0377042787901257) - Clustering validation

**Tools & Libraries**:
- [HDBSCAN](https://hdbscan.readthedocs.io/) - Improved DBSCAN for varying densities
- [Yellowbrick](https://www.scikit-yb.org/en/latest/api/cluster/) - Visualization for clustering
- [PyClustering](https://pyclustering.github.io/) - Additional clustering algorithms

## ‚û°Ô∏è What's Next?

Congratulations! You've completed Module 2: Machine Learning Fundamentals!

**In Chapter 2.4 - Model Evaluation & Metrics**, you'll master:

**Coming up**:
- **Deep dive into metrics**: Precision, recall, F1, ROC-AUC‚Äîwhen to use what
- **Cross-validation**: Properly assessing model performance
- **Overfitting & underfitting**: The bias-variance tradeoff
- **Confusion matrices**: Understanding your model's mistakes
- **Real-world case studies**: Choosing the right metric for your business problem

From patterns to performance‚Äîmeasuring what matters! üìä

Ready to become a model evaluation expert? Open **[Chapter 2.4 - Model Evaluation](2.4-model-evaluation.ipynb)**!

---

### üí¨ Feedback & Community

**Questions?** Join our [Discord community](https://discord.gg/madeforai)

**Found a bug?** [Open an issue on GitHub](https://github.com/madeforai/madeforai/issues)

**Share your clustering projects!** Tweet with #MadeForAI

**Keep exploring!** üöÄ