Problem Statement
Isolation Forest is an efficient anomaly detection algorithm that isolates outliers by randomly partitioning data. Currently missing from aprender.
Advantages:
- Fast: O(n log n) for training
- Works well in high dimensions
- No distance/density calculations
- Few parameters to tune
Use Cases:
- Fraud detection
- Network intrusion detection
- Quality control (defect detection)
- Sensor anomaly detection
Proposed Solution
Implement Isolation Forest following sklearn API with EXTREME TDD.
Algorithm
Core Idea: Anomalies are easier to isolate than normal points
Steps:
- Build ensemble of isolation trees:
- Randomly select feature
- Randomly select split value
- Recursively partition until isolated
- Anomaly score: average path length across trees
- Short paths → anomaly (easy to isolate)
- Long paths → normal (hard to isolate)
Implementation
API Design:
pub struct IsolationForest {
n_estimators: usize,
max_samples: usize,
contamination: f32, // Expected proportion of outliers
trees: Vec<IsolationTree>,
}
impl IsolationForest {
pub fn fit(&mut self, x: &Matrix<f32>) -> Result<(), &'static str>;
pub fn predict(&self, x: &Matrix<f32>) -> Vec<i32>; // 1=normal, -1=anomaly
pub fn score_samples(&self, x: &Matrix<f32>) -> Vec<f32>; // Anomaly scores
}
Success Criteria
- ✅ IsolationForest with ensemble of random trees
- ✅ fit/predict/score_samples methods
- ✅ contamination parameter for threshold
- ✅ 15+ tests
- ✅ Zero clippy warnings
- ✅ Example: examples/isolation_forest_anomaly.rs
Estimated Effort
Timeline: 3-4 days
Complexity: Medium (tree building, path length calculation)
Problem Statement
Isolation Forest is an efficient anomaly detection algorithm that isolates outliers by randomly partitioning data. Currently missing from aprender.
Advantages:
Use Cases:
Proposed Solution
Implement Isolation Forest following sklearn API with EXTREME TDD.
Algorithm
Core Idea: Anomalies are easier to isolate than normal points
Steps:
Implementation
API Design:
Success Criteria
Estimated Effort
Timeline: 3-4 days
Complexity: Medium (tree building, path length calculation)