Skip to content

Implement PCA (Principal Component Analysis) for Dimensionality Reduction #13

@noahgift

Description

@noahgift

Problem Statement

PCA is one of the most commonly used unsupervised learning algorithms for dimensionality reduction, but it's currently missing from aprender.

Impact: Cannot perform feature extraction, data visualization, or noise reduction - all fundamental ML tasks.

Proposed Solution

Implement PCA following sklearn API with EXTREME TDD methodology.

Algorithm

Mathematical Foundation:

  1. Center data (subtract mean)
  2. Compute covariance matrix: Σ = (X^T X) / (n-1)
  3. Eigendecomposition: Σ = V Λ V^T
  4. Sort eigenvectors by eigenvalue (descending)
  5. Project data: X_pca = X V_k (keep top k components)

Key Features:

  • n_components parameter (int or variance threshold)
  • explained_variance_ - variance explained by each component
  • explained_variance_ratio_ - percentage of variance explained
  • transform() - project data to lower dimensions
  • inverse_transform() - reconstruct original space (lossy)

Implementation

Trait: Transformer (fit/transform/fit_transform)

API Design:

pub struct PCA {
    n_components: Option<usize>,
    components: Option<Matrix<f32>>,     // Principal components (eigenvectors)
    mean: Option<Vector<f32>>,            // Mean of training data
    explained_variance: Option<Vector<f32>>,
    explained_variance_ratio: Option<Vector<f32>>,
}

impl Transformer for PCA {
    fn fit(&mut self, x: &Matrix<f32>) -> Result<(), &'static str>;
    fn transform(&self, x: &Matrix<f32>) -> Result<Matrix<f32>, &'static str>;
}

Success Criteria

  • ✅ PCA struct with Transformer trait
  • ✅ fit/transform/inverse_transform methods
  • ✅ Explained variance computation
  • ✅ 15+ tests passing
  • ✅ Zero clippy warnings
  • ✅ Example: examples/pca_iris.rs
  • ✅ Book chapter: book/src/ml-fundamentals/pca.md

Estimated Effort

Timeline: 1-2 days
Complexity: Medium (requires eigendecomposition)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions