Skip to content

Test: HNSW Performance with 100K+ Vector Dataset #210

@nickna

Description

@nickna

Description

Validate HNSW algorithm performance and memory usage with enterprise-scale datasets of 100,000+ vectors to ensure it can handle real-world workloads.

Phase

Phase 2: Large-Scale Stress Testing

Epic

Related to #202

Acceptance Criteria

  • Test HNSW build time with 100K, 500K, and 1M vector datasets
  • Validate memory usage stays within reasonable bounds (< 8GB for 1M vectors)
  • Verify search accuracy remains high with large datasets
  • Test index serialization/deserialization performance at scale
  • Benchmark against other algorithms (KD-Tree, Linear) at scale

Test Scenarios

  1. Build Performance - Time to build HNSW index for large datasets
  2. Memory Usage - Peak memory consumption during build and search
  3. Search Accuracy - Precision/recall metrics with large datasets
  4. Concurrent Operations - Multiple searches during large index builds
  5. Persistence - Save/load times for large HNSW indexes

Test Structure

[Test]
[Category("Stress")]
[Explicit("Large dataset test - run manually")]
public async Task HNSW_Build_100KVectors_CompletesWithinTimeLimit()
{
    // Arrange
    const int VectorCount = 100_000;
    const int Dimensions = 384; // Common embedding dimension
    const int MaxBuildTimeMinutes = 10;
    
    var database = new VectorDatabase();
    var vectors = GenerateLargeTestDataset(VectorCount, Dimensions);
    
    using var memoryMonitor = new MemoryUsageMonitor();
    var stopwatch = Stopwatch.StartNew();
    
    // Act
    foreach (var vector in vectors)
        database.Vectors.Add(vector);
        
    await database.RebuildSearchIndexAsync(SearchAlgorithm.HNSW);
    
    stopwatch.Stop();
    
    // Assert
    Assert.That(stopwatch.Elapsed, Is.LessThan(TimeSpan.FromMinutes(MaxBuildTimeMinutes)));
    Assert.That(memoryMonitor.PeakMemoryMB, Is.LessThan(4000)); // 4GB limit
    Assert.That(database.Count, Is.EqualTo(VectorCount));
    
    // Verify search functionality
    var query = vectors.First();
    var results = database.Search(query, 10, SearchAlgorithm.HNSW);
    Assert.That(results.Count, Is.EqualTo(10));
}

Performance Metrics

  • Build time per vector (target: < 1ms average)
  • Memory efficiency (target: < 50 bytes per vector overhead)
  • Search latency with large indexes (target: < 100ms for k=10)

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions