test: Add comprehensive integration tests for Clustering module [P0]

## Overview

This issue tracks the implementation of comprehensive integration tests for the **Clustering** module. This is a P0 (Critical Priority) module for unsupervised learning.

Parent Issue: #615

## Current Status
- **Coverage:** 0% for algorithms (only metrics tested)
- **Source Files:** 90+ files
- **Test Files:** Only ClusteringMetricsIntegrationTests.cs (tests metrics, not algorithms)

## Module Location
`src/Clustering/`

## Classes to Test

### Core Clustering Algorithms

**Partitional Clustering:**
- KMeans
- MiniBatchKMeans
- OnlineKMeans
- KMedoids
- FuzzyCMeans
- GMeans
- XMeans
- BisectingKMeans
- SeededKMeans
- COPKMeans (constrained)

**Density-Based Clustering:**
- DBSCAN
- HDBSCAN
- OPTICS
- Denclue
- MeanShift

**Hierarchical Clustering:**
- AgglomerativeClustering
- BIRCH
- CURE
- CLARANS

**Spectral Clustering:**
- SpectralClustering

**Subspace Clustering:**
- CLIQUE
- SUBCLU

**Probabilistic Clustering:**
- GaussianMixtureModel
- AffinityPropagation

**Other:**
- SelfOrganizingMap
- ConsensusClustering

### Distance Metrics
- EuclideanDistance
- ManhattanDistance
- CosineDistance
- MahalanobisDistance
- ChebyshevDistance
- MinkowskiDistance

### Spatial Data Structures
- KDTree
- BallTree

### Evaluation Metrics (already partially tested)
- SilhouetteScore
- DaviesBouldinIndex
- CalinskiHarabaszIndex
- DunnIndex
- AdjustedRandIndex
- NormalizedMutualInformation
- VMeasure
- FMeasure
- FowlkesMallowsIndex
- JaccardIndex
- Purity
- VariationOfInformation
- ClusteringEntropy
- ConnectivityIndex
- WCSS

### Validation Methods
- ElbowMethod
- GapStatistic
- StabilityValidation
- BootstrapValidation

### AutoML
- ClusteringAutoML
- ClusteringGridSearch
- ClusteringEvaluator

## Test Categories Required

### 1. Basic Clustering Tests
- [ ] Verify clusters are assigned for all data points
- [ ] Test with known cluster structures (blobs, moons, circles)
- [ ] Verify number of clusters matches expectation
- [ ] Test reproducibility with same random seed

### 2. Algorithm-Specific Tests

**KMeans Family:**
- [ ] Test convergence behavior
- [ ] Verify centroid computation
- [ ] Test with different initialization methods (random, k-means++)
- [ ] Test with different distance metrics
- [ ] Compare against scikit-learn KMeans

**DBSCAN:**
- [ ] Test eps and min_samples parameters
- [ ] Verify noise point detection
- [ ] Test with varying density clusters
- [ ] Compare against scikit-learn DBSCAN

**Hierarchical:**
- [ ] Test different linkage methods (single, complete, average, ward)
- [ ] Verify dendrogram structure
- [ ] Test distance thresholds

**Spectral:**
- [ ] Test different affinity matrices
- [ ] Verify eigenvalue computation
- [ ] Test with different number of components

### 3. Edge Cases
- [ ] Test with single data point
- [ ] Test with all identical points
- [ ] Test with high-dimensional data
- [ ] Test with outliers
- [ ] Test with very small/large values
- [ ] Test with NaN values (should handle gracefully)

### 4. Performance Tests
- [ ] Test with varying dataset sizes
- [ ] Verify memory usage is reasonable
- [ ] Test convergence speed

### 5. Serialization Tests
- [ ] Save and load trained clustering models
- [ ] Verify predictions match after reload
- [ ] Test with various configurations

### 6. Clone Tests
- [ ] Clone trained models
- [ ] Verify independence
- [ ] Verify predictions match

## Mathematical Correctness Verification

### KMeans
- Verify WCSS (Within-Cluster Sum of Squares) decreases with iterations
- Compare centroids against manually computed values
- Verify cluster assignments minimize distance to centroid

### DBSCAN
- Verify core points, border points, noise classification
- Test reachability relationships

### GMM
- Verify log-likelihood increases with iterations
- Test probability assignments sum to 1

### Metrics
- Verify Silhouette scores in range [-1, 1]
- Verify Davies-Bouldin lower is better
- Compare against scikit-learn metrics

## Test Data

Create test datasets:
1. **Well-separated blobs** - 3 clusters, clearly separable
2. **Overlapping clusters** - 2 clusters with overlap
3. **Non-spherical clusters** - Moons, circles
4. **High-dimensional** - 50+ dimensions
5. **Imbalanced clusters** - Different sizes

## Priority Order

1. **Critical (Test First):**
   - KMeans
   - DBSCAN
   - GaussianMixtureModel
   - AgglomerativeClustering

2. **High:**
   - MiniBatchKMeans
   - HDBSCAN
   - SpectralClustering
   - KMedoids
   - MeanShift

3. **Medium:**
   - All other algorithms
   - Validation methods
   - AutoML components

## Acceptance Criteria

- [ ] All major clustering algorithms have tests
- [ ] Tests cover basic functionality and edge cases
- [ ] Mathematical correctness verified against reference implementations
- [ ] Serialization works correctly
- [ ] At least 80% code coverage
- [ ] All tests pass on both net8.0 and net471

## References
- scikit-learn clustering: https://scikit-learn.org/stable/modules/clustering.html
- HDBSCAN docs: https://hdbscan.readthedocs.io/


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: Add comprehensive integration tests for Clustering module [P0] #617

Overview

Current Status

Module Location

Classes to Test

Core Clustering Algorithms

Distance Metrics

Spatial Data Structures

Evaluation Metrics (already partially tested)

Validation Methods

AutoML

Test Categories Required

1. Basic Clustering Tests

2. Algorithm-Specific Tests

3. Edge Cases

4. Performance Tests

5. Serialization Tests

6. Clone Tests

Mathematical Correctness Verification

KMeans

DBSCAN

GMM

Metrics

Test Data

Priority Order

Acceptance Criteria

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

test: Add comprehensive integration tests for Clustering module [P0] #617

Description

Overview

Current Status

Module Location

Classes to Test

Core Clustering Algorithms

Distance Metrics

Spatial Data Structures

Evaluation Metrics (already partially tested)

Validation Methods

AutoML

Test Categories Required

1. Basic Clustering Tests

2. Algorithm-Specific Tests

3. Edge Cases

4. Performance Tests

5. Serialization Tests

6. Clone Tests

Mathematical Correctness Verification

KMeans

DBSCAN

GMM

Metrics

Test Data

Priority Order

Acceptance Criteria

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions