Skip to content

Implement DBSCAN (Density-Based Spatial Clustering) #14

@noahgift

Description

@noahgift

Problem Statement

DBSCAN is a powerful density-based clustering algorithm that can find arbitrarily-shaped clusters and identify outliers. Currently missing from aprender.

Advantages over K-Means:

  • No need to specify number of clusters
  • Finds arbitrarily-shaped clusters
  • Identifies outliers as noise points
  • Robust to different cluster densities

Proposed Solution

Implement DBSCAN following sklearn API with EXTREME TDD methodology.

Algorithm

Core Concepts:

  • eps: Maximum distance between two samples to be neighbors
  • min_samples: Minimum points to form a dense region
  • Core point: Has ≥ min_samples within eps distance
  • Border point: Within eps of core point, but not core itself
  • Noise point: Neither core nor border

Steps:

  1. For each unvisited point p:
    • Find all points within eps distance (neighbors)
    • If neighbors < min_samples: mark as noise
    • Else: create new cluster, expand from p
  2. Expand cluster: recursively add reachable core points

Implementation

Trait: UnsupervisedEstimator

API Design:

pub struct DBSCAN {
    eps: f32,
    min_samples: usize,
    labels: Option<Vec<i32>>,  // -1 for noise, 0+ for cluster ID
}

impl UnsupervisedEstimator for DBSCAN {
    type Labels = Vec<i32>;
    fn fit(&mut self, x: &Matrix<f32>) -> Result<(), &'static str>;
    fn predict(&self, x: &Matrix<f32>) -> Vec<i32>;
}

Success Criteria

  • ✅ DBSCAN struct with UnsupervisedEstimator trait
  • ✅ eps and min_samples parameters
  • ✅ Noise detection (label = -1)
  • ✅ 12+ tests passing
  • ✅ Zero clippy warnings
  • ✅ Example: examples/dbscan_clustering.rs

Estimated Effort

Timeline: 2-3 days
Complexity: Medium-High (spatial indexing for performance)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions