Skip to content

feat(binary): Add native binary vector storage with BinaryFlatIndex#7

Merged
matte1782 merged 1 commit into
matte1782:mainfrom
jsonMartin:feat/binary-vector-support
Feb 2, 2026
Merged

feat(binary): Add native binary vector storage with BinaryFlatIndex#7
matte1782 merged 1 commit into
matte1782:mainfrom
jsonMartin:feat/binary-vector-support

Conversation

@jsonMartin
Copy link
Copy Markdown
Contributor

@jsonMartin jsonMartin commented Jan 30, 2026

Summary

This PR implements RFC-004: Flat Index for Binary Vectors, adding native binary vector support to EdgeVec with a specialized BinaryFlatIndex optimized for semantic caching and insert-heavy workloads.

CleanShot 2026-01-29 at 21 46 11@2x

Key Features

  • Native binary storage: StorageType::Binary(u32) - 32x memory reduction vs f32
  • BinaryFlatIndex: O(1) insert, O(n) search with SIMD-accelerated Hamming distance
  • WASM integration: Full JavaScript/TypeScript API with IndexType.binary(dimensions)
  • Automatic quantization: f32 vectors auto-converted to binary via sign-bit quantization

Performance Characteristics

Operation Complexity Time (10K vectors)
Insert O(1) ~1 μs
Search O(n) ~1ms (SIMD)

Use Cases

  • Semantic caching (insert-heavy, exact recall required)
  • Datasets < 100K vectors
  • When insert latency is critical (~1μs vs ~2ms for HNSW)

Files Changed

  • src/flat/mod.rs - New BinaryFlatIndex implementation
  • src/storage/mod.rs - Added StorageType::Binary(u32) variant
  • src/wasm/mod.rs - WASM bindings for binary index creation and search
  • src/error.rs - Added BinaryFlatIndexError to unified error hierarchy
  • docs/rfcs/RFC_FLAT_INDEX.md - Design document

Relationship to FlatIndex (Week 40)

This PR complements the f32 FlatIndex added in Week 40 (upstream). The two serve different purposes:

Feature BinaryFlatIndex (this PR) FlatIndex (Week 40)
Storage Native binary (u8) f32 with optional BQ
Use case Semantic caching General flat search
Memory 32x reduction 4x with BQ

Both coexist via the IndexType enum.

Test plan

  • All 1019 library tests pass
  • cargo fmt --check passes
  • cargo clippy -- -D warnings passes
  • Hostile review completed - all findings addressed
  • Integration tests for binary vector operations
  • WASM tests for JS interop

Add comprehensive binary vector support including:
- BinaryFlatIndex with Hamming and Jaccard distance metrics
- Native packed binary storage (8 bits per byte)
- WASM SIMD-accelerated Hamming distance computation
- Binary quantization for f32 vectors
- Full persistence support (snapshot save/load)
- Soft delete and compaction support
- Result type API for better error handling
@jsonMartin jsonMartin force-pushed the feat/binary-vector-support branch from 9d7cc7b to af09fda Compare January 30, 2026 04:45
@matte1782
Copy link
Copy Markdown
Owner

🎖️ HOSTILE_REVIEWER Verdict: APPROVED ✅

Hey @jsonMartin! Excellent work on this PR. The \ implementation is solid, well-tested, and follows EdgeVec standards.


✅ CI Failures: NOT Your Fault

The two failing checks () are not actual regressions. All benchmarks passed:

Benchmark Result vs Baseline
insert_1k 293ms ✅ 18% faster
search_10k 0.20ms ✅ 20% faster
quantization_encode 1.47μs ✅ 23% faster
hamming_distance 4.09ns ✅ 18% faster

The failure is a GitHub Actions permissions issue (HTTP 403) - the workflow can't post comments on PRs from forks because it lacks \ permission. We'll fix this on our end.

You don't need to do anything about this.


✅ Code Quality: Excellent

Aspect Status Notes
Error handling \ with thiserror
Memory safety Proper panic boundaries documented
Test coverage Comprehensive unit + integration tests
Performance O(1) insert, O(n) search as designed
WASM bindings Proper dimension validation
RFC documentation Accurately reflects implementation

💡 Optional Improvements (Non-Blocking)

These are suggestions for follow-up PRs, not required for this merge:

  1. TypeScript return types: \ returns \ - could be typed as 2. Add to .d.ts: \ and \ methods for better IDE support

🏆 Summary

This is a high-quality contribution that adds valuable functionality to EdgeVec:

  • 32x memory reduction with native binary storage
  • ~1μs insert latency (vs ~2ms for HNSW)
  • Clean API design with proper error handling

Ready to merge once CI permissions are fixed on our end. 🚀

Thanks for the excellent work!


Reviewed by: HOSTILE_REVIEWER protocol
Date: 2026-02-02
Verdict: APPROVED

@matte1782 matte1782 merged commit ff0f940 into matte1782:main Feb 2, 2026
7 of 11 checks passed
matte1782 added a commit that referenced this pull request Feb 2, 2026
- BinaryFlatIndex with native binary storage by @jsonMartin
- HNSW binary methods (insert_binary, search_binary)
- WASM integration with JsIndexType enum
- CI fix for fork PR comment permissions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@jsonMartin
Copy link
Copy Markdown
Contributor Author

Thanks @matte1782! Finally got this shipped — January was unexpectedly a bit of a whirlwind 🤪

Hope your New Year’s off to a good start!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants