Skip to content

Implement Isolation Forest for Anomaly Detection #17

@noahgift

Description

@noahgift

Problem Statement

Isolation Forest is an efficient anomaly detection algorithm that isolates outliers by randomly partitioning data. Currently missing from aprender.

Advantages:

  • Fast: O(n log n) for training
  • Works well in high dimensions
  • No distance/density calculations
  • Few parameters to tune

Use Cases:

  • Fraud detection
  • Network intrusion detection
  • Quality control (defect detection)
  • Sensor anomaly detection

Proposed Solution

Implement Isolation Forest following sklearn API with EXTREME TDD.

Algorithm

Core Idea: Anomalies are easier to isolate than normal points

Steps:

  1. Build ensemble of isolation trees:
    • Randomly select feature
    • Randomly select split value
    • Recursively partition until isolated
  2. Anomaly score: average path length across trees
    • Short paths → anomaly (easy to isolate)
    • Long paths → normal (hard to isolate)

Implementation

API Design:

pub struct IsolationForest {
    n_estimators: usize,
    max_samples: usize,
    contamination: f32,  // Expected proportion of outliers
    trees: Vec<IsolationTree>,
}

impl IsolationForest {
    pub fn fit(&mut self, x: &Matrix<f32>) -> Result<(), &'static str>;
    pub fn predict(&self, x: &Matrix<f32>) -> Vec<i32>;  // 1=normal, -1=anomaly
    pub fn score_samples(&self, x: &Matrix<f32>) -> Vec<f32>;  // Anomaly scores
}

Success Criteria

  • ✅ IsolationForest with ensemble of random trees
  • ✅ fit/predict/score_samples methods
  • ✅ contamination parameter for threshold
  • ✅ 15+ tests
  • ✅ Zero clippy warnings
  • ✅ Example: examples/isolation_forest_anomaly.rs

Estimated Effort

Timeline: 3-4 days
Complexity: Medium (tree building, path length calculation)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions