# Principal Component Analysis Workshop
## Dimensionality Reduction & Feature Extraction

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/your-repo/CMSC173/blob/main/11-PCA/notebooks/workshop.ipynb)

**Course:** CMSC 173 - Machine Learning  
**Instructor:** Noel Jeffrey Pinton  
**Topic:** Principal Component Analysis (PCA)  
**Duration:** 60-75 minutes  

### Learning Objectives
By the end of this workshop, you will be able to:
1. Understand the mathematical foundations of PCA
2. Implement PCA from scratch using eigenvalue decomposition
3. Apply PCA for dimensionality reduction and visualization
4. Evaluate and interpret principal components
5. Compare PCA with other dimensionality reduction techniques
6. Understand advanced topics like Kernel PCA and computational complexity

### Table of Contents
1. [Setup & Imports](#setup)
2. [Part 1: Motivation](#motivation)
3. [Part 2: Core Concepts](#core-concepts)
4. [Part 3: Implementation](#implementation)
5. [Part 4: Evaluation](#evaluation)
6. [Part 5: Advanced Topics](#advanced-topics)
7. [Student Challenge](#challenge)
8. [Solutions](#solutions)
9. [Summary & Next Steps](#summary)

<a id='setup'></a>
## Section 1: Setup & Imports

Let's import the necessary libraries and verify our environment.

In [None]:
# Import required libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_digits, load_iris, make_swiss_roll
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA as SklearnPCA
from sklearn.manifold import TSNE
from mpl_toolkits.mplot3d import Axes3D
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

# Configure plotting
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11

# Version checks
import sys
print("Environment Information:")
print(f"Python version: {sys.version.split()[0]}")
print(f"NumPy version: {np.__version__}")
print(f"Matplotlib version: {plt.matplotlib.__version__}")
print(f"Seaborn version: {sns.__version__}")
print("\n" + "="*50)
print("Setup complete! All libraries imported successfully.")
print("="*50)

<a id='motivation'></a>
## Section 2: Part 1 - Motivation

### Why Do We Need PCA?

In modern machine learning, we often face the **curse of dimensionality**:
- High-dimensional data is hard to visualize
- More features mean more computational cost
- Many features may be redundant or correlated
- Models can overfit with too many dimensions

**PCA helps us by:**
1. Reducing dimensionality while preserving information
2. Removing redundant/correlated features
3. Enabling visualization of high-dimensional data
4. Improving computational efficiency
5. Potentially improving model performance

### Real-World Example: Handwritten Digits

Let's load the digits dataset - each image is 8x8 pixels (64 dimensions).

In [None]:
# Load the digits dataset
digits = load_digits()
X_digits = digits.data
y_digits = digits.target

print(f"Dataset shape: {X_digits.shape}")
print(f"Number of samples: {X_digits.shape[0]}")
print(f"Number of features (dimensions): {X_digits.shape[1]}")
print(f"Number of classes: {len(np.unique(y_digits))}")
print(f"\nEach image is {int(np.sqrt(X_digits.shape[1]))}x{int(np.sqrt(X_digits.shape[1]))} pixels")

# Visualize some digits
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for i, ax in enumerate(axes.flat):
    ax.imshow(X_digits[i].reshape(8, 8), cmap='gray')
    ax.set_title(f'Label: {y_digits[i]}')
    ax.axis('off')
plt.suptitle('Sample Handwritten Digits (64 dimensions each)', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nChallenge: How can we visualize this 64-dimensional data in 2D?")
print("Answer: PCA!")

### The Curse of Dimensionality

Let's demonstrate why high dimensions are problematic.

In [None]:
import math
# Demonstrate volume and distance in high dimensions
dimensions = [1, 2, 5, 10, 20, 50, 100]
n_points = 1000

avg_distances = []
for d in dimensions:
    # Generate random points in d-dimensional unit hypercube
    points = np.random.rand(n_points, d)
    # Calculate average distance to origin
    distances = np.sqrt(np.sum(points**2, axis=1))
    avg_distances.append(np.mean(distances))

plt.figure(figsize=(12, 5))

# Plot 1: Average distance growth
plt.subplot(1, 2, 1)
plt.plot(dimensions, avg_distances, 'bo-', linewidth=2, markersize=8)
plt.xlabel('Number of Dimensions', fontweight='bold')
plt.ylabel('Average Distance to Origin', fontweight='bold')
plt.title('Distance Growth with Dimensionality', fontweight='bold')
plt.grid(True, alpha=0.3)

# Plot 2: Volume of unit sphere
plt.subplot(1, 2, 2)
sphere_volumes = [np.pi**(d/2) / math.gamma(d/2 + 1) for d in dimensions]
plt.semilogy(dimensions, sphere_volumes, 'ro-', linewidth=2, markersize=8)
plt.xlabel('Number of Dimensions', fontweight='bold')
plt.ylabel('Volume of Unit Sphere (log scale)', fontweight='bold')
plt.title('Sphere Volume Changes with Dimensionality', fontweight='bold')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Key Insights:")
print("1. Points become farther apart in high dimensions")
print("2. Volume concentrates at the surface (not center) of spheres")
print("3. Data becomes sparse - more data needed to maintain density")
print("4. This is why dimensionality reduction is crucial!")

<a id='core-concepts'></a>
## Section 3: Part 2 - Core Concepts

### The Mathematics Behind PCA

PCA finds orthogonal directions (principal components) that maximize variance.

**Mathematical Foundation:**

Given data matrix $\mathbf{X} \in \mathbb{R}^{n \times d}$ (n samples, d features):

1. **Center the data**: $\tilde{\mathbf{X}} = \mathbf{X} - \bar{\mathbf{X}}$

2. **Compute covariance matrix**: $\mathbf{C} = \frac{1}{n-1}\tilde{\mathbf{X}}^T\tilde{\mathbf{X}}$

3. **Eigenvalue decomposition**: $\mathbf{C}\mathbf{v}_i = \lambda_i\mathbf{v}_i$

4. **Principal components**: Eigenvectors $\mathbf{v}_i$ with largest eigenvalues $\lambda_i$

5. **Project data**: $\mathbf{Z} = \tilde{\mathbf{X}}\mathbf{W}$ where $\mathbf{W} = [\mathbf{v}_1, \mathbf{v}_2, ..., \mathbf{v}_k]$

### Step-by-Step Visualization

Let's understand PCA with a simple 2D example.

In [None]:
# Generate correlated 2D data
np.random.seed(42)
mean = [0, 0]
cov = [[3, 2.5], [2.5, 3]]  # Covariance matrix with correlation
n_samples = 300
X_2d = np.random.multivariate_normal(mean, cov, n_samples)

# Step 1: Center the data
X_centered = X_2d - np.mean(X_2d, axis=0)

# Step 2: Compute covariance matrix
cov_matrix = np.cov(X_centered.T)

# Step 3: Eigenvalue decomposition
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# Sort by eigenvalue
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

print("PCA Analysis Results:")
print(f"Eigenvalues: {eigenvalues}")
print(f"Variance explained by PC1: {eigenvalues[0]/eigenvalues.sum()*100:.1f}%")
print(f"Variance explained by PC2: {eigenvalues[1]/eigenvalues.sum()*100:.1f}%")
print(f"\nEigenvector 1 (PC1): {eigenvectors[:, 0]}")
print(f"Eigenvector 2 (PC2): {eigenvectors[:, 1]}")

# Visualization
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# Plot 1: Original data
axes[0].scatter(X_2d[:, 0], X_2d[:, 1], alpha=0.5, s=30)
axes[0].set_xlabel('Feature 1', fontweight='bold')
axes[0].set_ylabel('Feature 2', fontweight='bold')
axes[0].set_title('Step 1: Original Correlated Data', fontweight='bold')
axes[0].grid(True, alpha=0.3)
axes[0].axis('equal')

# Plot 2: Centered data with principal components
axes[1].scatter(X_centered[:, 0], X_centered[:, 1], alpha=0.5, s=30, label='Data')

# Draw principal components
scale = 3
for i, (eigenval, eigenvec) in enumerate(zip(eigenvalues, eigenvectors.T)):
    axes[1].arrow(0, 0, eigenvec[0]*scale*np.sqrt(eigenval), 
                  eigenvec[1]*scale*np.sqrt(eigenval),
                  head_width=0.3, head_length=0.3, fc=f'C{i+1}', ec=f'C{i+1}',
                  linewidth=3, label=f'PC{i+1} (λ={eigenval:.2f})')

axes[1].set_xlabel('Feature 1 (centered)', fontweight='bold')
axes[1].set_ylabel('Feature 2 (centered)', fontweight='bold')
axes[1].set_title('Step 2: Principal Components', fontweight='bold')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
axes[1].axis('equal')

# Plot 3: Data projected onto principal components
X_pca = X_centered @ eigenvectors
axes[2].scatter(X_pca[:, 0], X_pca[:, 1], alpha=0.5, s=30)
axes[2].set_xlabel('PC1 (First Principal Component)', fontweight='bold')
axes[2].set_ylabel('PC2 (Second Principal Component)', fontweight='bold')
axes[2].set_title('Step 3: Transformed Data', fontweight='bold')
axes[2].grid(True, alpha=0.3)
axes[2].axis('equal')

plt.tight_layout()
plt.show()

print("\nKey Observations:")
print("1. PC1 points in the direction of maximum variance")
print("2. PC2 is orthogonal to PC1 (90 degrees)")
print("3. In the transformed space, features are uncorrelated")
print("4. We could drop PC2 with minimal information loss!")

### Understanding Variance Explained

How much information does each principal component capture?

In [None]:
# Apply PCA to digits dataset to see variance explained
X_digits_scaled = StandardScaler().fit_transform(X_digits)

# Compute covariance matrix and eigenvalues
cov_matrix_digits = np.cov(X_digits_scaled.T)
eigenvalues_digits, _ = np.linalg.eig(cov_matrix_digits)
eigenvalues_digits = np.sort(eigenvalues_digits)[::-1]

# Calculate variance explained
variance_explained = eigenvalues_digits / eigenvalues_digits.sum()
cumulative_variance = np.cumsum(variance_explained)

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Variance per component
axes[0].bar(range(1, 21), variance_explained[:20], alpha=0.7, color='steelblue')
axes[0].set_xlabel('Principal Component', fontweight='bold')
axes[0].set_ylabel('Proportion of Variance Explained', fontweight='bold')
axes[0].set_title('Variance Explained by Each Component (First 20)', fontweight='bold')
axes[0].grid(True, alpha=0.3, axis='y')

# Plot 2: Cumulative variance
axes[1].plot(range(1, len(cumulative_variance)+1), cumulative_variance, 
             'ro-', linewidth=2, markersize=4)
axes[1].axhline(y=0.9, color='g', linestyle='--', linewidth=2, label='90% threshold')
axes[1].axhline(y=0.95, color='orange', linestyle='--', linewidth=2, label='95% threshold')
axes[1].set_xlabel('Number of Components', fontweight='bold')
axes[1].set_ylabel('Cumulative Variance Explained', fontweight='bold')
axes[1].set_title('Cumulative Variance Explained', fontweight='bold')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Find number of components for different thresholds
n_90 = np.argmax(cumulative_variance >= 0.90) + 1
n_95 = np.argmax(cumulative_variance >= 0.95) + 1
n_99 = np.argmax(cumulative_variance >= 0.99) + 1

print(f"\nDimensionality Reduction Summary:")
print(f"Original dimensions: {X_digits.shape[1]}")
print(f"Components for 90% variance: {n_90} ({n_90/X_digits.shape[1]*100:.1f}% of original)")
print(f"Components for 95% variance: {n_95} ({n_95/X_digits.shape[1]*100:.1f}% of original)")
print(f"Components for 99% variance: {n_99} ({n_99/X_digits.shape[1]*100:.1f}% of original)")
print(f"\nWe can reduce from 64 to ~{n_95} dimensions with minimal information loss!")

<a id='implementation'></a>
## Section 4: Part 3 - Implementation from Scratch

Let's implement PCA from scratch to truly understand how it works.

In [None]:
class PCA:
    """
    Principal Component Analysis implementation from scratch.
    
    Parameters:
    -----------
    n_components : int or float, optional
        Number of components to keep.
        If int: keep exactly n_components
        If float (0 < n_components < 1): keep enough components to explain
            this proportion of variance
        If None: keep all components
    """
    
    def __init__(self, n_components=None):
        self.n_components = n_components
        self.components_ = None
        self.mean_ = None
        self.eigenvalues_ = None
        self.explained_variance_ = None
        self.explained_variance_ratio_ = None
    
    def fit(self, X):
        """
        Fit the PCA model to the data.
        
        Parameters:
        -----------
        X : ndarray of shape (n_samples, n_features)
            Training data
        
        Returns:
        --------
        self : object
        """
        # Step 1: Center the data
        self.mean_ = np.mean(X, axis=0)
        X_centered = X - self.mean_
        
        # Step 2: Compute covariance matrix
        n_samples = X.shape[0]
        cov_matrix = (X_centered.T @ X_centered) / (n_samples - 1)
        
        # Step 3: Compute eigenvalues and eigenvectors
        eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
        
        # Step 4: Sort by eigenvalue (descending)
        idx = eigenvalues.argsort()[::-1]
        eigenvalues = eigenvalues[idx]
        eigenvectors = eigenvectors[:, idx]
        
        # Step 5: Store results
        self.eigenvalues_ = eigenvalues
        self.explained_variance_ = eigenvalues
        self.explained_variance_ratio_ = eigenvalues / eigenvalues.sum()
        
        # Step 6: Select components based on n_components
        if self.n_components is None:
            self.components_ = eigenvectors
        elif isinstance(self.n_components, int):
            self.components_ = eigenvectors[:, :self.n_components]
        elif isinstance(self.n_components, float) and 0 < self.n_components < 1:
            # Select components to explain desired variance
            cumsum = np.cumsum(self.explained_variance_ratio_)
            n_comp = np.argmax(cumsum >= self.n_components) + 1
            self.components_ = eigenvectors[:, :n_comp]
        
        return self
    
    def transform(self, X):
        """
        Apply dimensionality reduction to X.
        
        Parameters:
        -----------
        X : ndarray of shape (n_samples, n_features)
            Data to transform
        
        Returns:
        --------
        X_transformed : ndarray of shape (n_samples, n_components)
            Transformed data
        """
        # Center the data using training mean
        X_centered = X - self.mean_
        
        # Project onto principal components
        return X_centered @ self.components_
    
    def fit_transform(self, X):
        """
        Fit the model and apply dimensionality reduction.
        
        Parameters:
        -----------
        X : ndarray of shape (n_samples, n_features)
            Training data
        
        Returns:
        --------
        X_transformed : ndarray of shape (n_samples, n_components)
            Transformed data
        """
        self.fit(X)
        return self.transform(X)
    
    def inverse_transform(self, X_transformed):
        """
        Transform data back to original space.
        
        Parameters:
        -----------
        X_transformed : ndarray of shape (n_samples, n_components)
            Transformed data
        
        Returns:
        --------
        X_original : ndarray of shape (n_samples, n_features)
            Data in original space
        """
        return (X_transformed @ self.components_.T) + self.mean_

print("PCA class implemented successfully!")
print("\nKey methods:")
print("  - fit(X): Learn principal components from data")
print("  - transform(X): Project data onto principal components")
print("  - fit_transform(X): Fit and transform in one step")
print("  - inverse_transform(X): Reconstruct original space")

### Testing Our Implementation

Let's test our PCA implementation and compare with sklearn.

In [None]:
# Prepare data
X_test = StandardScaler().fit_transform(X_digits)

# Our implementation
pca_ours = PCA(n_components=2)
X_pca_ours = pca_ours.fit_transform(X_test)

# Sklearn implementation
pca_sklearn = SklearnPCA(n_components=2)
X_pca_sklearn = pca_sklearn.fit_transform(X_test)

# Compare results
print("Comparison: Our Implementation vs Sklearn\n" + "="*50)
print(f"\nExplained variance ratio:")
print(f"  Ours:    {pca_ours.explained_variance_ratio_[:2]}")
print(f"  Sklearn: {pca_sklearn.explained_variance_ratio_}")

print(f"\nTotal variance explained:")
print(f"  Ours:    {pca_ours.explained_variance_ratio_[:2].sum():.4f}")
print(f"  Sklearn: {pca_sklearn.explained_variance_ratio_.sum():.4f}")

# Visualize both results
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Our implementation
scatter1 = axes[0].scatter(X_pca_ours[:, 0], X_pca_ours[:, 1], 
                           c=y_digits, cmap='tab10', alpha=0.6, s=20)
axes[0].set_xlabel('First Principal Component', fontweight='bold')
axes[0].set_ylabel('Second Principal Component', fontweight='bold')
axes[0].set_title('Our PCA Implementation', fontweight='bold')
axes[0].grid(True, alpha=0.3)
plt.colorbar(scatter1, ax=axes[0], label='Digit')

# Sklearn implementation
scatter2 = axes[1].scatter(X_pca_sklearn[:, 0], X_pca_sklearn[:, 1], 
                           c=y_digits, cmap='tab10', alpha=0.6, s=20)
axes[1].set_xlabel('First Principal Component', fontweight='bold')
axes[1].set_ylabel('Second Principal Component', fontweight='bold')
axes[1].set_title('Sklearn PCA', fontweight='bold')
axes[1].grid(True, alpha=0.3)
plt.colorbar(scatter2, ax=axes[1], label='Digit')

plt.tight_layout()
plt.show()

# Check reconstruction error
X_reconstructed = pca_ours.inverse_transform(X_pca_ours)
reconstruction_error = np.mean((X_test - X_reconstructed)**2)

print(f"\nReconstruction Error (MSE): {reconstruction_error:.6f}")
print("\nSuccess! Our implementation matches sklearn closely.")

### Visualizing Reconstruction

Let's see how well we can reconstruct images with different numbers of components.

In [None]:
# Select a sample digit
sample_idx = 0
sample_digit = X_test[sample_idx:sample_idx+1]

# Test different numbers of components
n_components_list = [1, 2, 5, 10, 20, 64]

fig, axes = plt.subplots(2, 3, figsize=(12, 8))
axes = axes.flatten()

for i, n_comp in enumerate(n_components_list):
    # Apply PCA with n_comp components
    pca_temp = PCA(n_components=n_comp)
    X_reduced = pca_temp.fit_transform(X_test)
    X_reconstructed = pca_temp.inverse_transform(X_reduced[sample_idx:sample_idx+1])
    
    # Calculate reconstruction error
    mse = np.mean((sample_digit - X_reconstructed)**2)
    var_explained = pca_temp.explained_variance_ratio_[:n_comp].sum()
    
    # Display
    axes[i].imshow(X_reconstructed.reshape(8, 8), cmap='gray')
    axes[i].set_title(f'{n_comp} components\nVar: {var_explained:.1%}, MSE: {mse:.4f}', 
                      fontsize=10)
    axes[i].axis('off')

plt.suptitle(f'Image Reconstruction with Different Numbers of Components\nOriginal Digit: {y_digits[sample_idx]}', 
             fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("Observations:")
print("1. With just 5 components, the digit is recognizable!")
print("2. 20 components gives excellent reconstruction")
print("3. We can reduce from 64 to 20 dimensions (69% reduction) with minimal quality loss")

<a id='evaluation'></a>
## Section 5: Part 4 - Evaluation & Performance

### How do we evaluate PCA?

Key metrics:
1. **Explained variance ratio**: How much information is retained?
2. **Reconstruction error**: How well can we reconstruct original data?
3. **Visualization quality**: Are classes well-separated?
4. **Downstream task performance**: Does it help our ML model?

In [None]:
# Comprehensive evaluation
def evaluate_pca(X, y, n_components_range):
    """
    Evaluate PCA performance across different numbers of components.
    """
    results = {
        'n_components': [],
        'variance_explained': [],
        'reconstruction_error': [],
        'compression_ratio': []
    }
    
    for n_comp in n_components_range:
        pca = PCA(n_components=n_comp)
        X_reduced = pca.fit_transform(X)
        X_reconstructed = pca.inverse_transform(X_reduced)
        
        # Calculate metrics
        var_exp = pca.explained_variance_ratio_[:n_comp].sum()
        recon_error = np.mean((X - X_reconstructed)**2)
        compression = 1 - (n_comp / X.shape[1])
        
        results['n_components'].append(n_comp)
        results['variance_explained'].append(var_exp)
        results['reconstruction_error'].append(recon_error)
        results['compression_ratio'].append(compression)
    
    return results

# Evaluate
n_components_range = [1, 2, 5, 10, 15, 20, 30, 40, 50, 64]
eval_results = evaluate_pca(X_test, y_digits, n_components_range)

# Visualize results
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Plot 1: Variance explained
axes[0, 0].plot(eval_results['n_components'], eval_results['variance_explained'], 
                'bo-', linewidth=2, markersize=8)
axes[0, 0].axhline(y=0.9, color='r', linestyle='--', label='90% threshold')
axes[0, 0].axhline(y=0.95, color='orange', linestyle='--', label='95% threshold')
axes[0, 0].set_xlabel('Number of Components', fontweight='bold')
axes[0, 0].set_ylabel('Cumulative Variance Explained', fontweight='bold')
axes[0, 0].set_title('Variance Explained vs Components', fontweight='bold')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Plot 2: Reconstruction error
axes[0, 1].plot(eval_results['n_components'], eval_results['reconstruction_error'], 
                'ro-', linewidth=2, markersize=8)
axes[0, 1].set_xlabel('Number of Components', fontweight='bold')
axes[0, 1].set_ylabel('Reconstruction Error (MSE)', fontweight='bold')
axes[0, 1].set_title('Reconstruction Error vs Components', fontweight='bold')
axes[0, 1].grid(True, alpha=0.3)

# Plot 3: Compression ratio
axes[1, 0].plot(eval_results['n_components'], 
                [c*100 for c in eval_results['compression_ratio']], 
                'go-', linewidth=2, markersize=8)
axes[1, 0].set_xlabel('Number of Components', fontweight='bold')
axes[1, 0].set_ylabel('Compression Ratio (%)', fontweight='bold')
axes[1, 0].set_title('Data Compression vs Components', fontweight='bold')
axes[1, 0].grid(True, alpha=0.3)

# Plot 4: Trade-off (variance vs compression)
axes[1, 1].scatter([c*100 for c in eval_results['compression_ratio']], 
                   eval_results['variance_explained'],
                   c=eval_results['n_components'], cmap='viridis', s=100)
for i, n in enumerate(eval_results['n_components']):
    if n in [2, 10, 20, 40]:
        axes[1, 1].annotate(f'{n}', 
                           (eval_results['compression_ratio'][i]*100, 
                            eval_results['variance_explained'][i]),
                           fontsize=10, fontweight='bold')
axes[1, 1].set_xlabel('Compression Ratio (%)', fontweight='bold')
axes[1, 1].set_ylabel('Variance Explained', fontweight='bold')
axes[1, 1].set_title('Variance-Compression Trade-off', fontweight='bold')
axes[1, 1].grid(True, alpha=0.3)
cbar = plt.colorbar(axes[1, 1].collections[0], ax=axes[1, 1])
cbar.set_label('Number of Components', fontweight='bold')

plt.tight_layout()
plt.show()

# Find optimal number of components
idx_90 = next(i for i, v in enumerate(eval_results['variance_explained']) if v >= 0.90)
optimal_n = eval_results['n_components'][idx_90]

print("\nPerformance Summary:")
print("="*50)
print(f"Optimal components (90% variance): {optimal_n}")
print(f"Compression ratio: {eval_results['compression_ratio'][idx_90]*100:.1f}%")
print(f"Reconstruction error: {eval_results['reconstruction_error'][idx_90]:.6f}")

### Comparison with Other Dimensionality Reduction Methods

In [None]:
# Compare PCA with t-SNE
print("Applying dimensionality reduction methods...")
print("(t-SNE may take a minute...)\n")

# PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_test)

# t-SNE
tsne = TSNE(n_components=2, random_state=42, perplexity=30)
X_tsne = tsne.fit_transform(X_test[:500])  # Use subset for speed

# Visualize comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# PCA
scatter1 = axes[0].scatter(X_pca[:, 0], X_pca[:, 1], 
                          c=y_digits, cmap='tab10', alpha=0.6, s=20)
axes[0].set_xlabel('First Component', fontweight='bold')
axes[0].set_ylabel('Second Component', fontweight='bold')
axes[0].set_title('PCA (Linear)', fontweight='bold')
axes[0].grid(True, alpha=0.3)
plt.colorbar(scatter1, ax=axes[0], label='Digit')

# t-SNE
scatter2 = axes[1].scatter(X_tsne[:, 0], X_tsne[:, 1], 
                          c=y_digits[:500], cmap='tab10', alpha=0.6, s=20)
axes[1].set_xlabel('First Component', fontweight='bold')
axes[1].set_ylabel('Second Component', fontweight='bold')
axes[1].set_title('t-SNE (Nonlinear)', fontweight='bold')
axes[1].grid(True, alpha=0.3)
plt.colorbar(scatter2, ax=axes[1], label='Digit')

plt.tight_layout()
plt.show()

print("\nComparison:")
print("\nPCA (Principal Component Analysis):")
print("  + Fast and deterministic")
print("  + Linear, easy to interpret")
print("  + Can reconstruct original data")
print("  - May not capture nonlinear structure")
print(f"  Variance explained: {pca.explained_variance_ratio_.sum():.1%}")

print("\nt-SNE (t-Distributed Stochastic Neighbor Embedding):")
print("  + Better for visualization")
print("  + Captures local structure well")
print("  + Nonlinear")
print("  - Slower, non-deterministic")
print("  - Cannot reconstruct or transform new data")
print("  - Mainly for visualization, not preprocessing")

<a id='advanced-topics'></a>
## Section 6: Part 5 - Advanced Topics

### Kernel PCA

What if data has nonlinear structure? Enter **Kernel PCA**!

Kernel PCA applies the kernel trick to perform PCA in a high-dimensional feature space.

In [None]:
from sklearn.decomposition import KernelPCA

# Create nonlinear data (Swiss roll)
X_swiss, color_swiss = make_swiss_roll(n_samples=1000, random_state=42)

# Apply different methods
pca_linear = PCA(n_components=2)
X_pca_linear = pca_linear.fit_transform(X_swiss)

kpca_rbf = KernelPCA(n_components=2, kernel='rbf', gamma=0.1)
X_kpca_rbf = kpca_rbf.fit_transform(X_swiss)

kpca_poly = KernelPCA(n_components=2, kernel='poly', degree=3)
X_kpca_poly = kpca_poly.fit_transform(X_swiss)

# Visualize
fig = plt.figure(figsize=(16, 4))

# Original 3D data
ax1 = fig.add_subplot(141, projection='3d')
ax1.scatter(X_swiss[:, 0], X_swiss[:, 1], X_swiss[:, 2], 
           c=color_swiss, cmap='viridis', s=20, alpha=0.6)
ax1.set_title('Original Swiss Roll (3D)', fontweight='bold')
ax1.set_xlabel('X')
ax1.set_ylabel('Y')
ax1.set_zlabel('Z')

# Linear PCA
ax2 = fig.add_subplot(142)
scatter2 = ax2.scatter(X_pca_linear[:, 0], X_pca_linear[:, 1], 
                       c=color_swiss, cmap='viridis', s=20, alpha=0.6)
ax2.set_title('Linear PCA', fontweight='bold')
ax2.set_xlabel('PC1')
ax2.set_ylabel('PC2')
ax2.grid(True, alpha=0.3)

# Kernel PCA (RBF)
ax3 = fig.add_subplot(143)
scatter3 = ax3.scatter(X_kpca_rbf[:, 0], X_kpca_rbf[:, 1], 
                       c=color_swiss, cmap='viridis', s=20, alpha=0.6)
ax3.set_title('Kernel PCA (RBF)', fontweight='bold')
ax3.set_xlabel('KPC1')
ax3.set_ylabel('KPC2')
ax3.grid(True, alpha=0.3)

# Kernel PCA (Polynomial)
ax4 = fig.add_subplot(144)
scatter4 = ax4.scatter(X_kpca_poly[:, 0], X_kpca_poly[:, 1], 
                       c=color_swiss, cmap='viridis', s=20, alpha=0.6)
ax4.set_title('Kernel PCA (Polynomial)', fontweight='bold')
ax4.set_xlabel('KPC1')
ax4.set_ylabel('KPC2')
ax4.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Key Insights:")
print("1. Linear PCA struggles with nonlinear manifold (swiss roll)")
print("2. Kernel PCA can 'unroll' the swiss roll better")
print("3. Different kernels capture different nonlinear patterns")
print("4. Trade-off: Kernel PCA is more expensive computationally")

### Computational Complexity

Understanding the computational cost of PCA is crucial for large datasets.

In [None]:
import time

def measure_pca_time(n_samples, n_features, n_components):
    """Measure time for PCA computation."""
    X = np.random.randn(n_samples, n_features)
    
    start = time.time()
    pca = PCA(n_components=n_components)
    pca.fit_transform(X)
    elapsed = time.time() - start
    
    return elapsed

# Test scaling with number of samples
print("Testing computational complexity...\n")

n_samples_list = [100, 200, 500, 1000, 2000]
n_features = 50
n_components = 10

times_samples = []
for n in n_samples_list:
    t = measure_pca_time(n, n_features, n_components)
    times_samples.append(t)
    print(f"n_samples={n:4d}, n_features={n_features}, time={t:.4f}s")

# Test scaling with number of features
print("\n" + "-"*50 + "\n")

n_samples = 500
n_features_list = [10, 20, 50, 100, 200]

times_features = []
for n in n_features_list:
    t = measure_pca_time(n_samples, n, min(10, n))
    times_features.append(t)
    print(f"n_samples={n_samples}, n_features={n:3d}, time={t:.4f}s")

# Visualize complexity
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Scaling with samples
axes[0].plot(n_samples_list, times_samples, 'bo-', linewidth=2, markersize=8)
axes[0].set_xlabel('Number of Samples', fontweight='bold')
axes[0].set_ylabel('Time (seconds)', fontweight='bold')
axes[0].set_title('Time Complexity vs Number of Samples', fontweight='bold')
axes[0].grid(True, alpha=0.3)

# Scaling with features
axes[1].plot(n_features_list, times_features, 'ro-', linewidth=2, markersize=8)
axes[1].set_xlabel('Number of Features', fontweight='bold')
axes[1].set_ylabel('Time (seconds)', fontweight='bold')
axes[1].set_title('Time Complexity vs Number of Features', fontweight='bold')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n" + "="*50)
print("Complexity Analysis:")
print("="*50)
print("\nTime Complexity: O(min(n*d², d*n²)) where:")
print("  n = number of samples")
print("  d = number of features")
print("\nSpace Complexity: O(d²) for covariance matrix")
print("\nPractical Tips:")
print("  • For n >> d: Compute covariance matrix (d x d)")
print("  • For d >> n: Use SVD or randomized PCA")
print("  • For very large datasets: Use incremental PCA")

### Incremental PCA for Large Datasets

When data doesn't fit in memory, use Incremental PCA.

In [None]:
from sklearn.decomposition import IncrementalPCA

# Simulate large dataset in batches
n_samples = 2000
n_features = 50
batch_size = 200
n_components = 10

# Generate full dataset
X_large = np.random.randn(n_samples, n_features)

# Regular PCA (requires all data at once)
print("Standard PCA:")
start = time.time()
pca_standard = PCA(n_components=n_components)
X_pca_standard = pca_standard.fit_transform(X_large)
time_standard = time.time() - start
print(f"  Time: {time_standard:.4f}s")

# Incremental PCA (processes data in batches)
print("\nIncremental PCA:")
start = time.time()
ipca = IncrementalPCA(n_components=n_components, batch_size=batch_size)

# Fit in batches
for i in range(0, n_samples, batch_size):
    batch = X_large[i:i+batch_size]
    ipca.partial_fit(batch)

# Transform
X_ipca = ipca.transform(X_large)
time_incremental = time.time() - start
print(f"  Time: {time_incremental:.4f}s")

# Compare explained variance
print("\nExplained Variance Comparison:")
print(f"  Standard PCA:    {pca_standard.explained_variance_ratio_.sum():.4f}")
print(f"  Incremental PCA: {ipca.explained_variance_ratio_.sum():.4f}")

# Compare projections
correlation = np.corrcoef(X_pca_standard[:, 0], X_ipca[:, 0])[0, 1]
print(f"\nFirst PC correlation: {abs(correlation):.4f}")
print("\nIncremental PCA gives nearly identical results!")
print("Use it for datasets that don't fit in memory.")

<a id='challenge'></a>
## Section 7: Student Challenge (15-20 minutes)

### Your Task: PCA Analysis on Iris Dataset

The famous Iris dataset has 4 features. Your mission:

1. Load the Iris dataset and standardize it
2. Apply PCA to reduce to 2 components
3. Visualize the results with proper labels
4. Analyze which features contribute most to each PC
5. Determine how many components are needed for 95% variance
6. Create a biplot showing both data points and feature vectors

**Starter code is provided below. Fill in the missing parts!**

In [None]:
# Load Iris dataset
iris = load_iris()
X_iris = iris.data
y_iris = iris.target
feature_names = iris.feature_names
target_names = iris.target_names

print("Iris Dataset Information:")
print(f"Number of samples: {X_iris.shape[0]}")
print(f"Number of features: {X_iris.shape[1]}")
print(f"Features: {feature_names}")
print(f"Classes: {target_names}")
print("\nYour tasks are below!")

In [None]:
# TASK 1: Standardize the data
# Hint: Use StandardScaler or implement yourself

# YOUR CODE HERE
X_iris_scaled = None  # Replace with your code

print("Task 1: Standardization")
print(f"Mean of scaled data: {np.mean(X_iris_scaled, axis=0)}")
print(f"Std of scaled data: {np.std(X_iris_scaled, axis=0)}")

In [None]:
# TASK 2: Apply PCA to reduce to 2 components
# Use your PCA class or sklearn's

# YOUR CODE HERE
pca_iris = None  # Create PCA instance
X_iris_pca = None  # Fit and transform the data

print("Task 2: PCA Transformation")
print(f"Explained variance ratio: {pca_iris.explained_variance_ratio_[:2]}")
print(f"Total variance explained: {pca_iris.explained_variance_ratio_[:2].sum():.3f}")

In [None]:
# TASK 3: Visualize the PCA results
# Create a scatter plot with different colors for each species

# YOUR CODE HERE
plt.figure(figsize=(10, 6))
# Create your visualization

print("Task 3: Check your visualization above!")

In [None]:
# TASK 4: Analyze feature contributions to principal components
# Which features contribute most to PC1? To PC2?

# YOUR CODE HERE
# Hint: Look at pca_iris.components_

print("Task 4: Feature Contributions")
print("Your analysis here...")

In [None]:
# TASK 5: Determine components needed for 95% variance
# Fit PCA with all 4 components and analyze cumulative variance

# YOUR CODE HERE

print("Task 5: Components for 95% Variance")
print("Your answer here...")

In [None]:
# TASK 6: Create a biplot
# Show both the transformed data points AND the original feature vectors

# YOUR CODE HERE
# Hint: Plot data points as scatter, feature vectors as arrows

print("Task 6: Check your biplot above!")

<a id='solutions'></a>
## Section 8: Solutions

<details>
<summary><b>Click to reveal solutions (try the challenge first!)</b></summary>

### Solution Code Below:

In [None]:
# SOLUTION - TASK 1: Standardize the data
print("SOLUTION - Task 1: Standardization\n" + "="*50)

scaler = StandardScaler()
X_iris_scaled = scaler.fit_transform(X_iris)

print(f"Mean of scaled data: {np.mean(X_iris_scaled, axis=0)}")
print(f"Std of scaled data: {np.std(X_iris_scaled, axis=0)}")
print("\nStandardization ensures all features have mean=0 and std=1")
print("This is crucial for PCA to avoid bias toward large-scale features")

In [None]:
# SOLUTION - TASK 2: Apply PCA
print("SOLUTION - Task 2: PCA Transformation\n" + "="*50)

pca_iris = PCA(n_components=2)
X_iris_pca = pca_iris.fit_transform(X_iris_scaled)

print(f"Explained variance ratio: {pca_iris.explained_variance_ratio_[:2]}")
print(f"PC1 explains: {pca_iris.explained_variance_ratio_[0]*100:.2f}% of variance")
print(f"PC2 explains: {pca_iris.explained_variance_ratio_[1]*100:.2f}% of variance")
print(f"Total variance explained: {pca_iris.explained_variance_ratio_[:2].sum()*100:.2f}%")
print("\nWith just 2 components, we retain most of the information!")

In [None]:
# SOLUTION - TASK 3: Visualize PCA results
print("SOLUTION - Task 3: Visualization\n" + "="*50)

plt.figure(figsize=(10, 7))

colors = ['red', 'blue', 'green']
markers = ['o', 's', '^']

for i, (color, marker, name) in enumerate(zip(colors, markers, target_names)):
    mask = y_iris == i
    plt.scatter(X_iris_pca[mask, 0], X_iris_pca[mask, 1],
               c=color, marker=marker, label=name, 
               alpha=0.7, s=100, edgecolors='black', linewidth=0.5)

plt.xlabel(f'PC1 ({pca_iris.explained_variance_ratio_[0]*100:.1f}% variance)', 
          fontweight='bold', fontsize=12)
plt.ylabel(f'PC2 ({pca_iris.explained_variance_ratio_[1]*100:.1f}% variance)', 
          fontweight='bold', fontsize=12)
plt.title('Iris Dataset - PCA Projection', fontweight='bold', fontsize=14)
plt.legend(title='Species', fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\nObservations:")
print("• Setosa (red) is clearly separated from the others")
print("• Versicolor (blue) and Virginica (green) overlap somewhat")
print("• PC1 is the main discriminator between species")

In [None]:
# SOLUTION - TASK 4: Feature contributions
print("SOLUTION - Task 4: Feature Contributions\n" + "="*50)

# Get the components (loadings)
components = pca_iris.components_

# Create a heatmap
plt.figure(figsize=(10, 4))
sns.heatmap(components, annot=True, fmt='.3f', cmap='RdBu_r',
           xticklabels=feature_names, yticklabels=['PC1', 'PC2'],
           center=0, vmin=-1, vmax=1, cbar_kws={'label': 'Loading'})
plt.title('PCA Component Loadings (Feature Contributions)', fontweight='bold', fontsize=12)
plt.tight_layout()
plt.show()

# Detailed analysis
print("\nPC1 Contributions:")
for i, feature in enumerate(feature_names):
    print(f"  {feature:25s}: {components[0, i]:+.3f}")

print("\nPC2 Contributions:")
for i, feature in enumerate(feature_names):
    print(f"  {feature:25s}: {components[1, i]:+.3f}")

print("\nKey Insights:")
print("• PC1: All features contribute positively (especially petal features)")
print("  → PC1 represents overall flower size")
print("• PC2: Sepal vs petal contrast (sepal +, petal -)")
print("  → PC2 represents sepal-petal size balance")

In [None]:
# SOLUTION - TASK 5: Components for 95% variance
print("SOLUTION - Task 5: Components for 95% Variance\n" + "="*50)

# Fit PCA with all components
pca_full = PCA(n_components=4)
pca_full.fit(X_iris_scaled)

# Calculate cumulative variance
cumulative_variance = np.cumsum(pca_full.explained_variance_ratio_)

# Find number of components for 95%
n_components_95 = np.argmax(cumulative_variance >= 0.95) + 1

# Visualize
plt.figure(figsize=(10, 6))
plt.bar(range(1, 5), pca_full.explained_variance_ratio_, 
       alpha=0.6, label='Individual', color='steelblue')
plt.step(range(1, 5), cumulative_variance, where='mid',
        label='Cumulative', color='red', linewidth=2)
plt.axhline(y=0.95, color='green', linestyle='--', 
           linewidth=2, label='95% threshold')
plt.xlabel('Principal Component', fontweight='bold')
plt.ylabel('Variance Explained', fontweight='bold')
plt.title('Variance Explained by Each Component', fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3, axis='y')
plt.xticks([1, 2, 3, 4])
plt.tight_layout()
plt.show()

print(f"\nVariance by component:")
for i in range(4):
    print(f"  PC{i+1}: {pca_full.explained_variance_ratio_[i]*100:.2f}% "
          f"(cumulative: {cumulative_variance[i]*100:.2f}%)")

print(f"\nAnswer: {n_components_95} component(s) needed for 95% variance")
print(f"With {n_components_95} component(s), we explain {cumulative_variance[n_components_95-1]*100:.2f}% of variance")

In [None]:
# SOLUTION - TASK 6: Biplot
print("SOLUTION - Task 6: Biplot\n" + "="*50)

def biplot(X_pca, pca, feature_names, y, target_names):
    """
    Create a biplot showing both data points and feature vectors.
    """
    plt.figure(figsize=(12, 8))
    
    # Plot data points
    colors = ['red', 'blue', 'green']
    markers = ['o', 's', '^']
    
    for i, (color, marker, name) in enumerate(zip(colors, markers, target_names)):
        mask = y == i
        plt.scatter(X_pca[mask, 0], X_pca[mask, 1],
                   c=color, marker=marker, label=name,
                   alpha=0.6, s=80, edgecolors='black', linewidth=0.5)
    
    # Plot feature vectors
    scale = 3.5  # Scale factor for visibility
    for i, feature in enumerate(feature_names):
        arrow_x = pca.components_[0, i] * scale
        arrow_y = pca.components_[1, i] * scale
        
        plt.arrow(0, 0, arrow_x, arrow_y,
                 head_width=0.15, head_length=0.15,
                 fc='orange', ec='darkorange', linewidth=2.5, alpha=0.8)
        
        # Add feature name at arrow tip
        plt.text(arrow_x * 1.15, arrow_y * 1.15, feature,
                fontsize=11, fontweight='bold',
                ha='center', va='center',
                bbox=dict(boxstyle='round,pad=0.3', facecolor='yellow', alpha=0.7))
    
    plt.xlabel(f'PC1 ({pca.explained_variance_ratio_[0]*100:.1f}% variance)',
              fontweight='bold', fontsize=12)
    plt.ylabel(f'PC2 ({pca.explained_variance_ratio_[1]*100:.1f}% variance)',
              fontweight='bold', fontsize=12)
    plt.title('PCA Biplot: Data Points and Feature Vectors',
             fontweight='bold', fontsize=14)
    plt.legend(title='Species', loc='best', fontsize=10)
    plt.grid(True, alpha=0.3)
    plt.axhline(y=0, color='k', linewidth=0.5)
    plt.axvline(x=0, color='k', linewidth=0.5)
    plt.tight_layout()
    plt.show()

# Create the biplot
biplot(X_iris_pca, pca_iris, feature_names, y_iris, target_names)

print("\nBiplot Interpretation:")
print("• Orange arrows = original features projected onto PC1-PC2 plane")
print("• Arrow length = importance of feature in that PC direction")
print("• Arrow angle = correlation between features")
print("  - Small angle = positive correlation")
print("  - 90 degrees = uncorrelated")
print("  - 180 degrees = negative correlation")
print("\n• Petal features point right → contribute positively to PC1")
print("• Sepal width points up → contributes positively to PC2")
print("• Petal and sepal features are correlated (small angles between them)")

</details>

<a id='summary'></a>
## Section 9: Summary & Next Steps

### Key Takeaways

#### 1. What is PCA?
- **Unsupervised** dimensionality reduction technique
- Finds **orthogonal** directions of maximum variance
- Based on **eigenvalue decomposition** of covariance matrix
- **Linear** transformation of data

#### 2. When to Use PCA?
**Use PCA when:**
- You have high-dimensional data (curse of dimensionality)
- Features are correlated/redundant
- You need to visualize high-dimensional data
- Computational efficiency is important
- You want interpretable components

**Don't use PCA when:**
- Features are already uncorrelated
- You need to preserve exact values
- Data has strong nonlinear structure (use Kernel PCA instead)
- Feature interpretability is crucial (consider feature selection instead)

#### 3. PCA Workflow
1. **Standardize** data (mean=0, std=1)
2. **Compute** covariance matrix
3. **Find** eigenvalues and eigenvectors
4. **Sort** by eigenvalue (descending)
5. **Select** top k components
6. **Project** data onto selected components

#### 4. Evaluation Metrics
- **Explained variance ratio**: Information retained
- **Reconstruction error**: Quality of reconstruction
- **Scree plot**: Choose number of components
- **Downstream performance**: Does it help your ML task?

#### 5. Advanced Techniques
- **Kernel PCA**: For nonlinear patterns
- **Incremental PCA**: For data that doesn't fit in memory
- **Sparse PCA**: For sparse loadings
- **Randomized PCA**: For faster computation on large datasets

### Practical Guidelines

#### Choosing Number of Components

**Rule of thumb:**
- **Visualization**: 2-3 components
- **Preprocessing**: Keep 90-95% variance
- **Compression**: Balance variance vs. compression ratio
- **Exploratory analysis**: Use scree plot

#### Common Pitfalls to Avoid

1. **Not standardizing data first**
   - Features with larger scales will dominate
   - Always standardize unless you have good reason not to

2. **Applying PCA to categorical features**
   - PCA assumes continuous data
   - One-hot encode first, or use correspondence analysis

3. **Using PCA for feature selection**
   - PCA creates new features (combinations of originals)
   - For feature selection, use L1 regularization or other methods

4. **Forgetting to apply same transformation to test data**
   - Use `fit()` on training data
   - Use `transform()` on test data (don't `fit()` again!)

5. **Over-interpreting principal components**
   - PCs are mathematical constructs, not always meaningful
   - Be cautious with causal interpretations

### Comparison with Other Techniques

| Method | Type | Preserves | Speed | Use Case |
|--------|------|-----------|-------|----------|
| **PCA** | Linear | Global structure | Fast | General dimensionality reduction |
| **t-SNE** | Nonlinear | Local structure | Slow | Visualization only |
| **UMAP** | Nonlinear | Local + global | Medium | Visualization + preprocessing |
| **Kernel PCA** | Nonlinear | Depends on kernel | Medium | Nonlinear patterns |
| **Autoencoders** | Nonlinear | Learned | Slow | Complex patterns, large data |
| **LDA** | Linear | Class separation | Fast | Supervised dim. reduction |
| **ICA** | Linear | Independence | Medium | Signal separation |
| **NMF** | Linear | Non-negativity | Medium | Parts-based representation |

### Next Steps

#### Further Learning

**Topics to Explore:**
1. **Probabilistic PCA (PPCA)**: Probabilistic framework for PCA
2. **Factor Analysis**: Related technique for latent variable modeling
3. **Independent Component Analysis (ICA)**: For signal separation
4. **Autoencoders**: Neural network-based dimensionality reduction
5. **Manifold learning**: t-SNE, UMAP, Isomap, LLE

**Resources:**
- **Books**: 
  - "Pattern Recognition and Machine Learning" by Bishop (Chapter 12)
  - "The Elements of Statistical Learning" by Hastie et al. (Chapter 14)
- **Papers**:
  - Jolliffe & Cadima (2016): "Principal Component Analysis: A Review"
- **Online**:
  - Scikit-learn documentation on decomposition methods
  - StatQuest videos on PCA

#### Practice Exercises

1. **Apply PCA to image data** (MNIST, CIFAR-10)
   - Visualize principal components as images
   - Reconstruct images with different numbers of components

2. **Face recognition with Eigenfaces**
   - Apply PCA to face images
   - Build a face recognition system

3. **Compare PCA with other methods**
   - Try t-SNE, UMAP on the same dataset
   - Compare visualization quality

4. **PCA for preprocessing**
   - Train classifiers with/without PCA
   - Compare accuracy and training time

5. **Kernel PCA exploration**
   - Test different kernels (RBF, polynomial, sigmoid)
   - Apply to nonlinear datasets

---

### Congratulations!

You've completed the Principal Component Analysis workshop. You now have:

- Deep understanding of PCA mathematics and intuition
- Hands-on experience implementing PCA from scratch
- Skills to apply and evaluate PCA on real datasets
- Knowledge of advanced topics and when to use them
- Practical guidelines for using PCA effectively

**Keep exploring and applying PCA to your own projects!**

---

*Workshop created for CMSC 173 - Machine Learning*  
*Instructor: Noel Jeffrey Pinton*  
*University of the Philippines - Cebu*