Reimagine ONNX for Rust vector embeddings by ruvnet · Pull Request #29 · ruvnet/RuVector

ruvnet · 2025-11-29T23:01:51Z

This pull request introduces a new ruvector-onnx-embeddings example package that demonstrates ONNX-based embedding generation in Rust, along with comprehensive examples and performance benchmarks. The package is designed to be standalone and showcases embedding, batch processing, semantic search, and both CPU and GPU benchmarking for vector operations.

New ONNX Embeddings Example Package

Added a new Cargo.toml for the ruvector-onnx-embeddings example, specifying dependencies for ONNX runtime, tokenization, tensor operations, async runtime, serialization, error handling, HTTP, progress, logging, file operations, parallelism, GPU acceleration (optional), and benchmarking. This configuration enables flexible and performant embedding workflows in Rust.

Examples for Embedding Workflows

Added examples/basic.rs: Demonstrates single text embedding and similarity computation using the ONNX embedder.
Added examples/batch.rs: Shows batch embedding with both sequential and parallel processing, including performance measurement.
Added examples/semantic_search.rs: Provides a semantic search example integrating the embedder with a vector index, supporting document insertion, querying, and finding similar documents.

Benchmarking and Performance Evaluation

Added benches/embedding_benchmark.rs: Benchmarks embedding generation, pooling strategies, and similarity computations for various batch sizes and configurations using Criterion.
Added benches/gpu_benchmark.rs: Benchmarks CPU vs GPU performance for similarity, pooling, and vector operations, including memory throughput and end-to-end search scenarios.

Reimagined embedding generation using ONNX Runtime in pure Rust: - Native ONNX inference via `ort` crate - HuggingFace tokenizer integration - Multiple pooling strategies (Mean, CLS, Max, etc.) - SIMD-optimized distance calculations - Batch processing with parallel execution - Direct RuVector HNSW index integration - RAG pipeline support - GPU acceleration support (CUDA, TensorRT, CoreML) Includes comprehensive examples for: - Basic embedding generation - Batch processing benchmarks - Semantic search with RuVector - Text clustering

…ckage setup - Fix ort 2.0 Session API changes (Session::builder vs SessionBuilder::new) - Fix try_extract_tensor tuple return type (&Shape, &[f32]) - Add mutable self references for ONNX session run - Make package standalone with [workspace] in Cargo.toml - Replace ruvector-core integration with standalone vector database - Fix tokenizer download (manual HTTP instead of from_pretrained) - Update RuVectorEmbeddings to use Arc<RwLock<Embedder>> for thread safety - Clean up unused imports and apply clippy suggestions - Use std::iter::repeat_n for padding operations - Simplify sanitize_model_id with single replace call All compilation errors resolved and release binary builds successfully.

- Complete step-by-step tutorial for embedding generation - Model comparison table with dimensions, speed, and quality ratings - Batch processing guide with performance tables - Semantic search engine tutorial - RAG pipeline implementation guide - Text clustering example - Configuration reference with all options - Pooling strategies explained with use cases - Performance benchmarks for CPU/GPU configurations - API reference documentation - Troubleshooting guide with common issues - Architecture diagram - Updated benchmarks to use RefCell for mutable API

Implement comprehensive GPU acceleration module for ONNX embeddings: - Add modular gpu/ directory with config, backend, shaders, operations - Create GpuBackend trait with WebGPU (wgpu v23.0) and CPU fallback - Implement 11 WGSL compute shaders for GPU-accelerated operations: - Similarity: cosine, dot product, euclidean distance - Pooling: mean, max, CLS token extraction - Vector ops: normalize, matmul, add, scale - Add GpuConfig builder pattern with presets (high_performance, low_power) - Include HybridAccelerator for automatic GPU/CPU dispatch - Add optional feature flags: gpu, cuda-wasm, webgpu - Write 35+ unit tests for GPU module (46 total tests pass) - Create GPU benchmarks for performance comparison - Add comprehensive GPU_ACCELERATION.md documentation - Fix examples to use updated embedder API (mutable borrows)

…ation Complete the GPU acceleration implementation: - WebGpuBackend now properly manages buffers via HashMap tracking - Implement real write_buffer() using queue.write_buffer() - Implement real read_buffer() with staging buffer and async map - Implement dispatch() with proper bind groups, command encoder, and workgroup dispatch - Add memory_stats() to track allocated buffer sizes - Wire GPU operations (pooling, similarity, vector ops) to use actual shaders - Integrate GPU into Embedder with automatic initialization and fallback - Add has_gpu() and gpu_info() methods to Embedder - Use GPU-accelerated similarity in most_similar() for large corpora - Add shader constant aliases for operations.rs All 46 tests pass with GPU feature enabled, 12 tests pass without.

- Add 4th binding (uniform params) to backend bind group layout - Update all GPU operations to pass params buffers with shader parameters - Fix shaders to use consistent 4-binding layout (L2_NORMALIZE, CLS_POOL, VECTOR_SCALE) - Remove dead code: GpuCompute struct, unused pipeline fields - Fix clippy warnings: derive Default for enums, use iterators in pooling - Remove unused imports (bytemuck, GpuBuffer, ComputePipeline) - All 46 tests pass with zero clippy warnings

Add complete CUDA-WASM backend implementation providing: - Buffer management with HashMap tracking and memory statistics - Built-in kernels for batch_cosine_similarity, dot_product, mean_pool - Parallel execution via rayon for workgroup-like parallelism - Proper params parsing from uniform buffers - Clean GpuBackend trait implementation The CUDA-WASM backend serves as a portable compute fallback that: - Works across all platforms via WebAssembly - Uses SIMD when available (simd128 feature) - Falls back to scalar operations gracefully - Matches the WebGPU backend API exactly All 46 tests pass with zero clippy warnings.

- Fix GpuAccelerator to properly initialize backends for pooler/similarity/vector_ops by wrapping backend in Arc and calling set_backend on each component - Add missing CUDA-WASM kernels: euclidean_distance, l2_normalize, max_pool, matmul, vector_add - Enable GPU operations for both 'gpu' and 'cuda-wasm' features - Update all cfg attributes to use any(feature = "gpu", feature = "cuda-wasm") - All 46 tests pass with both features enabled

… discovery #28 (null): hub_modules ∈ {0, 1, 2, 3, 4, 6, 8} at N=1024/40-modules. Peak stays at hub=3 → 0.516. hub ∈ [0, 2] cluster at 0.487–0.488; hub ≥ 4 collapses to 0.37–0.43. Narrow non-monotonic peak, not a smooth ridge. The "smaller hub wins" pattern from N=512 does NOT generalise to N=1024 — 2nd ADR-level case of "hypothesis from small-N extrapolates wrong at large N" (1st was item 22 on fixed γ). #29: fine num_modules ∈ {20, 25, 30, 35, 40, 50, 60, 80} at N=1024/ hub=3. New N=1024 peak: 0.531 @ modules=30 (density 34.1), γ=3.0 (70 communities vs 30 truth). Secondary peak at modules=80/γ=2.5 scores 0.515 — multi-modal landscape confirmed. Finding: at N=1024 the optimal density is 34.1 neurons/module, not 25.6. At N=512 it's 25.6. The 4-D landscape (N × density × γ × hub) does not factorize. AC-3a gap at N=1024 now 1.41× (down from 1.47×). Best-across-scales remains 0.599 @ (N=512, modules=20, hub=1, γ=4.0) — 1.25× gap. - tests/leiden_cpm.rs: leiden_cpm_hub_fraction_sweep_at_n1024, leiden_cpm_module_count_sweep_at_n1024_hub3 - ADR-154 §17 rows 28, 29 + heading 27 → 29 Co-Authored-By: claude-flow <ruv@ruv.net>

claude added 8 commits November 29, 2025 14:51

ruvnet merged commit 7782532 into main Nov 29, 2025
7 checks passed

ruvnet deleted the claude/onnx-rust-embeddings-01DFnS2iJEVtBq98gHsqaFF6 branch April 21, 2026 20:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reimagine ONNX for Rust vector embeddings#29

Reimagine ONNX for Rust vector embeddings#29
ruvnet merged 8 commits intomainfrom
claude/onnx-rust-embeddings-01DFnS2iJEVtBq98gHsqaFF6

ruvnet commented Nov 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ruvnet commented Nov 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants