# Gaussian Mixture Model (GMM) from Scratch

Implementation of Gaussian Mixture Model for clustering using Expectation-Maximization (EM) algorithm.

**Key Concepts:**
- Probabilistic clustering (soft assignments)
- Expectation-Maximization (EM) algorithm
- Multivariate Gaussian distributions
- Maximum likelihood estimation
- BIC/AIC for model selection


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.patches import Ellipse
from scipy.stats import multivariate_normal
from sklearn.datasets import make_blobs, load_iris, make_moons
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.metrics import silhouette_score, adjusted_rand_score

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
np.random.seed(42)

print("=" * 80)
print("GAUSSIAN MIXTURE MODEL FROM SCRATCH")
print("=" * 80)

## Mathematical Foundation

**Gaussian Mixture Model:**
$$p(x) = \sum_{k=1}^{K} \pi_k \mathcal{N}(x | \mu_k, \Sigma_k)$$

where:
- $K$ = number of components (clusters)
- $\pi_k$ = mixing coefficient (weight) for component $k$, $\sum_{k=1}^{K} \pi_k = 1$
- $\mathcal{N}(x | \mu_k, \Sigma_k)$ = multivariate Gaussian with mean $\mu_k$ and covariance $\Sigma_k$

**Multivariate Gaussian:**
$$\mathcal{N}(x | \mu, \Sigma) = \frac{1}{(2\pi)^{D/2}|\Sigma|^{1/2}} \exp\left(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)\right)$$

**EM Algorithm:**

**E-Step** (Expectation): Calculate responsibilities (posterior probabilities)
$$\gamma_{ik} = \frac{\pi_k \mathcal{N}(x_i | \mu_k, \Sigma_k)}{\sum_{j=1}^{K} \pi_j \mathcal{N}(x_i | \mu_j, \Sigma_j)}$$

**M-Step** (Maximization): Update parameters
$$N_k = \sum_{i=1}^{N} \gamma_{ik}$$
$$\pi_k = \frac{N_k}{N}$$
$$\mu_k = \frac{1}{N_k} \sum_{i=1}^{N} \gamma_{ik} x_i$$
$$\Sigma_k = \frac{1}{N_k} \sum_{i=1}^{N} \gamma_{ik} (x_i - \mu_k)(x_i - \mu_k)^T$$

**Log-Likelihood:**
$$\log L = \sum_{i=1}^{N} \log \left(\sum_{k=1}^{K} \pi_k \mathcal{N}(x_i | \mu_k, \Sigma_k)\right)$$

## GMM Implementation

In [None]:
class GaussianMixtureScratch:
    """
    Gaussian Mixture Model implemented from scratch using EM algorithm.
    
    Parameters:
    -----------
    n_components : int, default=3
        Number of Gaussian components
    max_iters : int, default=100
        Maximum number of EM iterations
    tol : float, default=1e-4
        Convergence tolerance (change in log-likelihood)
    reg_covar : float, default=1e-6
        Regularization added to covariance diagonal for numerical stability
    random_state : int, default=None
        Random seed for reproducibility
    """
    
    def __init__(self, n_components=3, max_iters=100, tol=1e-4, 
                 reg_covar=1e-6, random_state=None):
        self.n_components = n_components
        self.max_iters = max_iters
        self.tol = tol
        self.reg_covar = reg_covar
        self.random_state = random_state
        
        # Model parameters
        self.weights_ = None      # Mixing coefficients (π)
        self.means_ = None        # Component means (μ)
        self.covariances_ = None  # Component covariances (Σ)
        
        # Fit results
        self.responsibilities_ = None  # Posterior probabilities (γ)
        self.n_iter_ = 0
        self.converged_ = False
        
        # History
        self.log_likelihood_history_ = []
        
    def _initialize_parameters(self, X):
        """
        Initialize GMM parameters using k-means-like approach.
        """
        if self.random_state is not None:
            np.random.seed(self.random_state)
        
        n_samples, n_features = X.shape
        
        # Initialize means by randomly selecting samples
        indices = np.random.choice(n_samples, self.n_components, replace=False)
        self.means_ = X[indices]
        
        # Initialize covariances as identity matrices scaled by data variance
        self.covariances_ = np.array([
            np.eye(n_features) * np.var(X)
            for _ in range(self.n_components)
        ])
        
        # Initialize weights uniformly
        self.weights_ = np.ones(self.n_components) / self.n_components
        
    def _multivariate_gaussian(self, X, mean, covariance):
        """
        Calculate multivariate Gaussian probability density.
        """
        n_features = X.shape[1]
        
        # Add regularization to covariance for numerical stability
        covariance_reg = covariance + self.reg_covar * np.eye(n_features)
        
        # Use scipy for numerical stability
        return multivariate_normal.pdf(X, mean=mean, cov=covariance_reg)
    
    def _e_step(self, X):
        """
        E-Step: Calculate responsibilities (posterior probabilities).
        
        Returns:
        --------
        responsibilities : array, shape (n_samples, n_components)
            Posterior probability of each sample belonging to each component
        """
        n_samples = X.shape[0]
        responsibilities = np.zeros((n_samples, self.n_components))
        
        # Calculate weighted probabilities for each component
        for k in range(self.n_components):
            responsibilities[:, k] = (
                self.weights_[k] * 
                self._multivariate_gaussian(X, self.means_[k], self.covariances_[k])
            )
        
        # Normalize to get probabilities (responsibilities)
        responsibilities_sum = responsibilities.sum(axis=1, keepdims=True)
        # Avoid division by zero
        responsibilities_sum[responsibilities_sum == 0] = 1e-10
        responsibilities /= responsibilities_sum
        
        return responsibilities
    
    def _m_step(self, X, responsibilities):
        """
        M-Step: Update parameters based on responsibilities.
        """
        n_samples, n_features = X.shape
        
        # Effective number of points assigned to each component
        N_k = responsibilities.sum(axis=0)
        
        # Update weights (mixing coefficients)
        self.weights_ = N_k / n_samples
        
        # Update means
        self.means_ = (responsibilities.T @ X) / N_k[:, np.newaxis]
        
        # Update covariances
        for k in range(self.n_components):
            diff = X - self.means_[k]
            # Weighted covariance
            weighted_diff = responsibilities[:, k:k+1] * diff
            self.covariances_[k] = (weighted_diff.T @ diff) / N_k[k]
            
            # Add small regularization for numerical stability
            self.covariances_[k] += self.reg_covar * np.eye(n_features)
    
    def _calculate_log_likelihood(self, X):
        """
        Calculate log-likelihood of the data.
        """
        n_samples = X.shape[0]
        log_likelihood = 0
        
        for i in range(n_samples):
            sample_likelihood = 0
            for k in range(self.n_components):
                sample_likelihood += (
                    self.weights_[k] * 
                    self._multivariate_gaussian(
                        X[i:i+1], self.means_[k], self.covariances_[k]
                    )
                )
            log_likelihood += np.log(sample_likelihood + 1e-10)
        
        return log_likelihood
    
    def fit(self, X):
        """
        Fit GMM using EM algorithm.
        
        Parameters:
        -----------
        X : array-like, shape (n_samples, n_features)
            Training data
        """
        # Initialize parameters
        self._initialize_parameters(X)
        
        prev_log_likelihood = -np.inf
        
        # EM iterations
        for iteration in range(self.max_iters):
            # E-step
            responsibilities = self._e_step(X)
            
            # M-step
            self._m_step(X, responsibilities)
            
            # Calculate log-likelihood
            log_likelihood = self._calculate_log_likelihood(X)
            self.log_likelihood_history_.append(log_likelihood)
            
            # Check convergence
            if abs(log_likelihood - prev_log_likelihood) < self.tol:
                self.converged_ = True
                self.n_iter_ = iteration + 1
                break
            
            prev_log_likelihood = log_likelihood
            self.n_iter_ = iteration + 1
        
        # Final responsibilities
        self.responsibilities_ = self._e_step(X)
        
        return self
    
    def predict(self, X):
        """
        Predict cluster labels (hard assignments).
        """
        responsibilities = self._e_step(X)
        return np.argmax(responsibilities, axis=1)
    
    def predict_proba(self, X):
        """
        Predict soft cluster assignments (responsibilities).
        """
        return self._e_step(X)
    
    def fit_predict(self, X):
        """
        Fit and return cluster labels.
        """
        self.fit(X)
        return self.predict(X)
    
    def bic(self, X):
        """
        Calculate Bayesian Information Criterion.
        BIC = -2 * log_likelihood + k * log(n)
        Lower is better.
        """
        n_samples, n_features = X.shape
        
        # Number of parameters
        # weights: K-1, means: K*D, covariances: K*D*(D+1)/2
        n_params = (
            (self.n_components - 1) +  # weights (one is constrained)
            self.n_components * n_features +  # means
            self.n_components * n_features * (n_features + 1) / 2  # covariances
        )
        
        log_likelihood = self._calculate_log_likelihood(X)
        return -2 * log_likelihood + n_params * np.log(n_samples)
    
    def aic(self, X):
        """
        Calculate Akaike Information Criterion.
        AIC = -2 * log_likelihood + 2 * k
        Lower is better.
        """
        n_features = X.shape[1]
        
        # Number of parameters
        n_params = (
            (self.n_components - 1) +
            self.n_components * n_features +
            self.n_components * n_features * (n_features + 1) / 2
        )
        
        log_likelihood = self._calculate_log_likelihood(X)
        return -2 * log_likelihood + 2 * n_params

print("\n✓ GaussianMixtureScratch class defined")
print("  - EM algorithm implementation")
print("  - Soft (probabilistic) assignments")
print("  - BIC/AIC for model selection")

## Utility Functions for Visualization

In [None]:
def plot_gmm_clusters(X, gmm, title="GMM Clustering", true_labels=None):
    """
    Plot GMM clustering results with ellipses representing Gaussian components.
    """
    if X.shape[1] != 2:
        raise ValueError("This function only works for 2D data")
    
    fig, ax = plt.subplots(figsize=(10, 8))
    
    # Get cluster assignments
    labels = gmm.predict(X)
    
    # Plot points
    scatter = ax.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis',
                        s=50, alpha=0.6, edgecolors='black')
    
    # Plot Gaussian ellipses
    for k in range(gmm.n_components):
        mean = gmm.means_[k]
        covariance = gmm.covariances_[k]
        
        # Calculate eigenvalues and eigenvectors
        eigenvalues, eigenvectors = np.linalg.eigh(covariance)
        angle = np.degrees(np.arctan2(eigenvectors[1, 0], eigenvectors[0, 0]))
        
        # Width and height are 2 standard deviations
        width, height = 2 * np.sqrt(eigenvalues)
        
        # Plot ellipse for 1 and 2 standard deviations
        for n_std in [1, 2]:
            ellipse = Ellipse(
                mean, n_std * width, n_std * height,
                angle=angle, facecolor='none',
                edgecolor=scatter.cmap(scatter.norm(k)),
                linewidth=2, linestyle='--', alpha=0.7
            )
            ax.add_patch(ellipse)
        
        # Plot mean
        ax.scatter(mean[0], mean[1], c='red', s=200, marker='X',
                  edgecolors='black', linewidth=2, zorder=10)
    
    ax.set_xlabel('Feature 1', fontsize=12)
    ax.set_ylabel('Feature 2', fontsize=12)
    ax.set_title(title, fontsize=14, fontweight='bold')
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    return fig, ax

print("\n✓ Visualization utilities defined")

## Example 1: Simple 2D Synthetic Data

In [None]:
# Generate synthetic data with clear Gaussian clusters
X_blob, y_true = make_blobs(
    n_samples=300,
    n_features=2,
    centers=3,
    cluster_std=[1.0, 1.5, 0.8],
    random_state=42
)

print("Synthetic 2D Dataset:")
print(f"  Samples: {len(X_blob)}")
print(f"  Features: {X_blob.shape[1]}")
print(f"  True clusters: {len(np.unique(y_true))}")

# Visualize data
plt.figure(figsize=(10, 6))
plt.scatter(X_blob[:, 0], X_blob[:, 1], c=y_true, cmap='viridis', 
            s=50, alpha=0.6, edgecolors='black')
plt.xlabel('Feature 1', fontsize=12)
plt.ylabel('Feature 2', fontsize=12)
plt.title('True Cluster Assignments', fontsize=14, fontweight='bold')
plt.colorbar(label='Cluster')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Fit GMM
gmm = GaussianMixtureScratch(
    n_components=3,
    max_iters=100,
    random_state=42
)

gmm.fit(X_blob)
labels_gmm = gmm.predict(X_blob)
probabilities = gmm.predict_proba(X_blob)

print("\nGMM Results:")
print(f"  Converged: {gmm.converged_}")
print(f"  Iterations: {gmm.n_iter_}")
print(f"  Final log-likelihood: {gmm.log_likelihood_history_[-1]:.2f}")
print(f"  Cluster sizes: {np.bincount(labels_gmm)}")
print(f"\nMixing coefficients (weights):")
for k, weight in enumerate(gmm.weights_):
    print(f"  Component {k}: {weight:.4f}")

In [None]:
# Visualize GMM clustering with Gaussian ellipses
plot_gmm_clusters(X_blob, gmm, title="GMM Clustering with Gaussian Components")
plt.show()

print("\n✓ Ellipses represent 1σ and 2σ contours of Gaussian components")

In [None]:
# Plot convergence
fig, ax = plt.subplots(figsize=(10, 6))

ax.plot(gmm.log_likelihood_history_, marker='o', linewidth=2, markersize=4)
ax.set_xlabel('Iteration', fontsize=12)
ax.set_ylabel('Log-Likelihood', fontsize=12)
ax.set_title('EM Algorithm Convergence', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\n✓ Log-likelihood should monotonically increase")

## Soft vs Hard Clustering

In [None]:
# Demonstrate soft clustering (probabilistic assignments)
print("\n" + "=" * 80)
print("SOFT vs HARD CLUSTERING")
print("=" * 80)

# Show probabilities for first 10 samples
print("\nResponsibilities (soft assignments) for first 10 samples:")
print("Sample | Component 0 | Component 1 | Component 2 | Hard Label")
print("-" * 65)
for i in range(10):
    print(f"  {i:3d}  |    {probabilities[i, 0]:.4f}    |    {probabilities[i, 1]:.4f}    |    {probabilities[i, 2]:.4f}    |     {labels_gmm[i]}")

print("\n✓ Soft assignments show uncertainty in cluster membership")

In [None]:
# Visualize soft clustering by coloring points by uncertainty
# Calculate uncertainty as entropy of responsibilities
uncertainty = -np.sum(probabilities * np.log(probabilities + 1e-10), axis=1)

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Hard clustering
axes[0].scatter(X_blob[:, 0], X_blob[:, 1], c=labels_gmm, cmap='viridis',
               s=50, alpha=0.6, edgecolors='black')
axes[0].set_xlabel('Feature 1', fontsize=11)
axes[0].set_ylabel('Feature 2', fontsize=11)
axes[0].set_title('Hard Clustering (argmax)', fontsize=12, fontweight='bold')
axes[0].grid(True, alpha=0.3)

# Uncertainty visualization
scatter = axes[1].scatter(X_blob[:, 0], X_blob[:, 1], c=uncertainty, 
                         cmap='coolwarm', s=50, alpha=0.6, edgecolors='black')
axes[1].set_xlabel('Feature 1', fontsize=11)
axes[1].set_ylabel('Feature 2', fontsize=11)
axes[1].set_title('Clustering Uncertainty (Entropy)', fontsize=12, fontweight='bold')
axes[1].grid(True, alpha=0.3)
plt.colorbar(scatter, ax=axes[1], label='Uncertainty')

plt.tight_layout()
plt.show()

print("\n✓ Points near cluster boundaries have higher uncertainty")

## Model Selection: BIC and AIC

In [None]:
print("\n" + "=" * 80)
print("MODEL SELECTION - BIC AND AIC")
print("=" * 80)

# Test different numbers of components
n_components_range = range(2, 8)
bic_scores = []
aic_scores = []
log_likelihoods = []

for n_comp in n_components_range:
    gmm_test = GaussianMixtureScratch(
        n_components=n_comp,
        max_iters=100,
        random_state=42
    )
    gmm_test.fit(X_blob)
    
    bic = gmm_test.bic(X_blob)
    aic = gmm_test.aic(X_blob)
    log_like = gmm_test.log_likelihood_history_[-1]
    
    bic_scores.append(bic)
    aic_scores.append(aic)
    log_likelihoods.append(log_like)
    
    print(f"K={n_comp} | Log-Like: {log_like:8.2f} | BIC: {bic:8.2f} | AIC: {aic:8.2f}")

best_bic = n_components_range[np.argmin(bic_scores)]
best_aic = n_components_range[np.argmin(aic_scores)]

print(f"\nBest K by BIC: {best_bic}")
print(f"Best K by AIC: {best_aic}")

In [None]:
# Plot model selection criteria
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Log-likelihood
axes[0].plot(n_components_range, log_likelihoods, marker='o', linewidth=2, markersize=8)
axes[0].set_xlabel('Number of Components', fontsize=12)
axes[0].set_ylabel('Log-Likelihood', fontsize=12)
axes[0].set_title('Log-Likelihood (Higher is Better)', fontsize=12, fontweight='bold')
axes[0].grid(True, alpha=0.3)

# BIC
axes[1].plot(n_components_range, bic_scores, marker='s', linewidth=2, 
            markersize=8, color='green')
axes[1].axvline(x=best_bic, color='red', linestyle='--', linewidth=2, 
               label=f'Best K={best_bic}')
axes[1].set_xlabel('Number of Components', fontsize=12)
axes[1].set_ylabel('BIC', fontsize=12)
axes[1].set_title('Bayesian Information Criterion (Lower is Better)', 
                 fontsize=12, fontweight='bold')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# AIC
axes[2].plot(n_components_range, aic_scores, marker='^', linewidth=2, 
            markersize=8, color='orange')
axes[2].axvline(x=best_aic, color='red', linestyle='--', linewidth=2, 
               label=f'Best K={best_aic}')
axes[2].set_xlabel('Number of Components', fontsize=12)
axes[2].set_ylabel('AIC', fontsize=12)
axes[2].set_title('Akaike Information Criterion (Lower is Better)', 
                 fontsize=12, fontweight='bold')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n✓ BIC penalizes model complexity more heavily than AIC")

## Example 2: Non-Spherical Clusters

In [None]:
# Generate data with elliptical clusters
from sklearn.datasets import make_classification

X_ellipse, y_ellipse = make_classification(
    n_samples=400,
    n_features=2,
    n_informative=2,
    n_redundant=0,
    n_clusters_per_class=1,
    n_classes=3,
    class_sep=2.0,
    random_state=42
)

# Add some correlation to make elliptical clusters
rotation = np.array([[0.8, -0.6], [0.6, 0.8]])
X_ellipse = X_ellipse.dot(rotation)

print("\nElliptical Clusters Dataset:")
print(f"  Samples: {len(X_ellipse)}")
print(f"  Features: {X_ellipse.shape[1]}")

# Visualize
plt.figure(figsize=(10, 6))
plt.scatter(X_ellipse[:, 0], X_ellipse[:, 1], c=y_ellipse, cmap='viridis',
           s=50, alpha=0.6, edgecolors='black')
plt.xlabel('Feature 1', fontsize=12)
plt.ylabel('Feature 2', fontsize=12)
plt.title('Elliptical Clusters', fontsize=14, fontweight='bold')
plt.colorbar(label='True Cluster')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Fit GMM
gmm_ellipse = GaussianMixtureScratch(
    n_components=3,
    max_iters=100,
    random_state=42
)

gmm_ellipse.fit(X_ellipse)

print("\nGMM on Elliptical Clusters:")
print(f"  Converged: {gmm_ellipse.converged_}")
print(f"  Iterations: {gmm_ellipse.n_iter_}")

# Visualize
plot_gmm_clusters(X_ellipse, gmm_ellipse, 
                 title="GMM Handles Non-Spherical Clusters Well")
plt.show()

print("\n✓ GMM can model elliptical clusters (unlike K-means which assumes spherical)")

## Example 3: Iris Dataset

In [None]:
# Load iris dataset
iris = load_iris()
X_iris = iris.data
y_iris = iris.target
feature_names = iris.feature_names

print("\nIris Dataset:")
print(f"  Samples: {len(X_iris)}")
print(f"  Features: {X_iris.shape[1]}")
print(f"  True classes: {len(np.unique(y_iris))}")

# Standardize
scaler = StandardScaler()
X_iris_scaled = scaler.fit_transform(X_iris)

In [None]:
# Fit GMM
gmm_iris = GaussianMixtureScratch(
    n_components=3,
    max_iters=100,
    random_state=42
)

gmm_iris.fit(X_iris_scaled)
labels_iris = gmm_iris.predict(X_iris_scaled)

print("\nGMM Results (K=3):")
print(f"  Converged: {gmm_iris.converged_}")
print(f"  Iterations: {gmm_iris.n_iter_}")
print(f"  Cluster sizes: {np.bincount(labels_iris)}")

# Evaluation
silhouette = silhouette_score(X_iris_scaled, labels_iris)
ari = adjusted_rand_score(y_iris, labels_iris)

print(f"\nEvaluation:")
print(f"  Silhouette Score: {silhouette:.4f}")
print(f"  Adjusted Rand Index: {ari:.4f}")
print(f"  BIC: {gmm_iris.bic(X_iris_scaled):.2f}")
print(f"  AIC: {gmm_iris.aic(X_iris_scaled):.2f}")

In [None]:
# Visualize with PCA
pca = PCA(n_components=2)
X_iris_pca = pca.fit_transform(X_iris_scaled)

# Fit GMM on 2D PCA data for visualization
gmm_iris_2d = GaussianMixtureScratch(
    n_components=3,
    max_iters=100,
    random_state=42
)
gmm_iris_2d.fit(X_iris_pca)

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# True labels
axes[0].scatter(X_iris_pca[:, 0], X_iris_pca[:, 1], c=y_iris, cmap='viridis',
               s=50, alpha=0.6, edgecolors='black')
axes[0].set_xlabel(f'PC1 ({pca.explained_variance_ratio_[0]:.2%} var)', fontsize=11)
axes[0].set_ylabel(f'PC2 ({pca.explained_variance_ratio_[1]:.2%} var)', fontsize=11)
axes[0].set_title('True Species Labels', fontsize=12, fontweight='bold')
axes[0].grid(True, alpha=0.3)

# GMM clusters with ellipses
labels_2d = gmm_iris_2d.predict(X_iris_pca)
scatter = axes[1].scatter(X_iris_pca[:, 0], X_iris_pca[:, 1], c=labels_2d, 
                         cmap='viridis', s=50, alpha=0.6, edgecolors='black')

# Plot Gaussian ellipses
for k in range(gmm_iris_2d.n_components):
    mean = gmm_iris_2d.means_[k]
    covariance = gmm_iris_2d.covariances_[k]
    
    eigenvalues, eigenvectors = np.linalg.eigh(covariance)
    angle = np.degrees(np.arctan2(eigenvectors[1, 0], eigenvectors[0, 0]))
    width, height = 2 * np.sqrt(eigenvalues)
    
    for n_std in [1, 2]:
        ellipse = Ellipse(
            mean, n_std * width, n_std * height, angle=angle,
            facecolor='none', edgecolor=scatter.cmap(scatter.norm(k)),
            linewidth=2, linestyle='--', alpha=0.7
        )
        axes[1].add_patch(ellipse)
    
    axes[1].scatter(mean[0], mean[1], c='red', s=200, marker='X',
                   edgecolors='black', linewidth=2, zorder=10)

axes[1].set_xlabel(f'PC1 ({pca.explained_variance_ratio_[0]:.2%} var)', fontsize=11)
axes[1].set_ylabel(f'PC2 ({pca.explained_variance_ratio_[1]:.2%} var)', fontsize=11)
axes[1].set_title('GMM Clusters', fontsize=12, fontweight='bold')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## GMM vs K-Means Comparison

In [None]:
print("\n" + "=" * 80)
print("GMM vs K-MEANS COMPARISON")
print("=" * 80)

# Simple K-means implementation for comparison
from sklearn.cluster import KMeans

# Fit K-means
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
labels_kmeans = kmeans.fit_predict(X_blob)

# Fit GMM
gmm_compare = GaussianMixtureScratch(n_components=3, random_state=42)
labels_gmm_compare = gmm_compare.fit_predict(X_blob)

# Compare
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# K-means
axes[0].scatter(X_blob[:, 0], X_blob[:, 1], c=labels_kmeans, cmap='viridis',
               s=50, alpha=0.6, edgecolors='black')
axes[0].scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1],
               c='red', s=200, marker='X', edgecolors='black', linewidth=2)
axes[0].set_xlabel('Feature 1', fontsize=11)
axes[0].set_ylabel('Feature 2', fontsize=11)
axes[0].set_title('K-Means (Hard Clustering, Spherical)', fontsize=12, fontweight='bold')
axes[0].grid(True, alpha=0.3)

# GMM
scatter = axes[1].scatter(X_blob[:, 0], X_blob[:, 1], c=labels_gmm_compare, 
                         cmap='viridis', s=50, alpha=0.6, edgecolors='black')

for k in range(gmm_compare.n_components):
    mean = gmm_compare.means_[k]
    covariance = gmm_compare.covariances_[k]
    
    eigenvalues, eigenvectors = np.linalg.eigh(covariance)
    angle = np.degrees(np.arctan2(eigenvectors[1, 0], eigenvectors[0, 0]))
    width, height = 2 * np.sqrt(eigenvalues)
    
    for n_std in [1, 2]:
        ellipse = Ellipse(
            mean, n_std * width, n_std * height, angle=angle,
            facecolor='none', edgecolor=scatter.cmap(scatter.norm(k)),
            linewidth=2, linestyle='--', alpha=0.7
        )
        axes[1].add_patch(ellipse)
    
    axes[1].scatter(mean[0], mean[1], c='red', s=200, marker='X',
                   edgecolors='black', linewidth=2, zorder=10)

axes[1].set_xlabel('Feature 1', fontsize=11)
axes[1].set_ylabel('Feature 2', fontsize=11)
axes[1].set_title('GMM (Soft Clustering, Elliptical)', fontsize=12, fontweight='bold')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nKey Differences:")
print("  K-Means:")
print("    - Hard assignments (each point belongs to one cluster)")
print("    - Assumes spherical clusters")
print("    - Minimizes within-cluster variance")
print("\n  GMM:")
print("    - Soft assignments (probabilistic membership)")
print("    - Can model elliptical clusters")
print("    - Maximizes likelihood")
print("    - Provides uncertainty estimates")

## Summary

**Gaussian Mixture Model:**

1. **Probabilistic Model**:
   - Each cluster is a Gaussian distribution
   - Mixture of K Gaussian components
   - Soft assignments (probabilities)

2. **EM Algorithm**:
   - E-step: Calculate responsibilities (posterior probabilities)
   - M-step: Update parameters (weights, means, covariances)
   - Iterates until convergence
   - Guarantees to increase log-likelihood

3. **Model Selection**:
   - BIC: Bayesian Information Criterion (penalizes complexity more)
   - AIC: Akaike Information Criterion
   - Both lower is better

4. **Advantages over K-Means**:
   - Models elliptical clusters (full covariance)
   - Provides uncertainty (soft assignments)
   - Probabilistic framework
   - Better for overlapping clusters

5. **Limitations**:
   - Assumes Gaussian distributions
   - Sensitive to initialization
   - Computationally more expensive
   - May converge to local optima
   - Requires choosing number of components

6. **Use Cases**:
   - Clusters with different shapes/sizes
   - Need uncertainty estimates
   - Overlapping clusters
   - Density estimation
