Merged
Conversation
Reimagined embedding generation using ONNX Runtime in pure Rust: - Native ONNX inference via `ort` crate - HuggingFace tokenizer integration - Multiple pooling strategies (Mean, CLS, Max, etc.) - SIMD-optimized distance calculations - Batch processing with parallel execution - Direct RuVector HNSW index integration - RAG pipeline support - GPU acceleration support (CUDA, TensorRT, CoreML) Includes comprehensive examples for: - Basic embedding generation - Batch processing benchmarks - Semantic search with RuVector - Text clustering
…ckage setup - Fix ort 2.0 Session API changes (Session::builder vs SessionBuilder::new) - Fix try_extract_tensor tuple return type (&Shape, &[f32]) - Add mutable self references for ONNX session run - Make package standalone with [workspace] in Cargo.toml - Replace ruvector-core integration with standalone vector database - Fix tokenizer download (manual HTTP instead of from_pretrained) - Update RuVectorEmbeddings to use Arc<RwLock<Embedder>> for thread safety - Clean up unused imports and apply clippy suggestions - Use std::iter::repeat_n for padding operations - Simplify sanitize_model_id with single replace call All compilation errors resolved and release binary builds successfully.
- Complete step-by-step tutorial for embedding generation - Model comparison table with dimensions, speed, and quality ratings - Batch processing guide with performance tables - Semantic search engine tutorial - RAG pipeline implementation guide - Text clustering example - Configuration reference with all options - Pooling strategies explained with use cases - Performance benchmarks for CPU/GPU configurations - API reference documentation - Troubleshooting guide with common issues - Architecture diagram - Updated benchmarks to use RefCell for mutable API
Implement comprehensive GPU acceleration module for ONNX embeddings: - Add modular gpu/ directory with config, backend, shaders, operations - Create GpuBackend trait with WebGPU (wgpu v23.0) and CPU fallback - Implement 11 WGSL compute shaders for GPU-accelerated operations: - Similarity: cosine, dot product, euclidean distance - Pooling: mean, max, CLS token extraction - Vector ops: normalize, matmul, add, scale - Add GpuConfig builder pattern with presets (high_performance, low_power) - Include HybridAccelerator for automatic GPU/CPU dispatch - Add optional feature flags: gpu, cuda-wasm, webgpu - Write 35+ unit tests for GPU module (46 total tests pass) - Create GPU benchmarks for performance comparison - Add comprehensive GPU_ACCELERATION.md documentation - Fix examples to use updated embedder API (mutable borrows)
…ation Complete the GPU acceleration implementation: - WebGpuBackend now properly manages buffers via HashMap tracking - Implement real write_buffer() using queue.write_buffer() - Implement real read_buffer() with staging buffer and async map - Implement dispatch() with proper bind groups, command encoder, and workgroup dispatch - Add memory_stats() to track allocated buffer sizes - Wire GPU operations (pooling, similarity, vector ops) to use actual shaders - Integrate GPU into Embedder with automatic initialization and fallback - Add has_gpu() and gpu_info() methods to Embedder - Use GPU-accelerated similarity in most_similar() for large corpora - Add shader constant aliases for operations.rs All 46 tests pass with GPU feature enabled, 12 tests pass without.
- Add 4th binding (uniform params) to backend bind group layout - Update all GPU operations to pass params buffers with shader parameters - Fix shaders to use consistent 4-binding layout (L2_NORMALIZE, CLS_POOL, VECTOR_SCALE) - Remove dead code: GpuCompute struct, unused pipeline fields - Fix clippy warnings: derive Default for enums, use iterators in pooling - Remove unused imports (bytemuck, GpuBuffer, ComputePipeline) - All 46 tests pass with zero clippy warnings
Add complete CUDA-WASM backend implementation providing: - Buffer management with HashMap tracking and memory statistics - Built-in kernels for batch_cosine_similarity, dot_product, mean_pool - Parallel execution via rayon for workgroup-like parallelism - Proper params parsing from uniform buffers - Clean GpuBackend trait implementation The CUDA-WASM backend serves as a portable compute fallback that: - Works across all platforms via WebAssembly - Uses SIMD when available (simd128 feature) - Falls back to scalar operations gracefully - Matches the WebGPU backend API exactly All 46 tests pass with zero clippy warnings.
- Fix GpuAccelerator to properly initialize backends for pooler/similarity/vector_ops by wrapping backend in Arc and calling set_backend on each component - Add missing CUDA-WASM kernels: euclidean_distance, l2_normalize, max_pool, matmul, vector_add - Enable GPU operations for both 'gpu' and 'cuda-wasm' features - Update all cfg attributes to use any(feature = "gpu", feature = "cuda-wasm") - All 46 tests pass with both features enabled
ruvnet
added a commit
that referenced
this pull request
Apr 23, 2026
… discovery #28 (null): hub_modules ∈ {0, 1, 2, 3, 4, 6, 8} at N=1024/40-modules. Peak stays at hub=3 → 0.516. hub ∈ [0, 2] cluster at 0.487–0.488; hub ≥ 4 collapses to 0.37–0.43. Narrow non-monotonic peak, not a smooth ridge. The "smaller hub wins" pattern from N=512 does NOT generalise to N=1024 — 2nd ADR-level case of "hypothesis from small-N extrapolates wrong at large N" (1st was item 22 on fixed γ). #29: fine num_modules ∈ {20, 25, 30, 35, 40, 50, 60, 80} at N=1024/ hub=3. New N=1024 peak: 0.531 @ modules=30 (density 34.1), γ=3.0 (70 communities vs 30 truth). Secondary peak at modules=80/γ=2.5 scores 0.515 — multi-modal landscape confirmed. Finding: at N=1024 the optimal density is 34.1 neurons/module, not 25.6. At N=512 it's 25.6. The 4-D landscape (N × density × γ × hub) does not factorize. AC-3a gap at N=1024 now 1.41× (down from 1.47×). Best-across-scales remains 0.599 @ (N=512, modules=20, hub=1, γ=4.0) — 1.25× gap. - tests/leiden_cpm.rs: leiden_cpm_hub_fraction_sweep_at_n1024, leiden_cpm_module_count_sweep_at_n1024_hub3 - ADR-154 §17 rows 28, 29 + heading 27 → 29 Co-Authored-By: claude-flow <ruv@ruv.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new
ruvector-onnx-embeddingsexample package that demonstrates ONNX-based embedding generation in Rust, along with comprehensive examples and performance benchmarks. The package is designed to be standalone and showcases embedding, batch processing, semantic search, and both CPU and GPU benchmarking for vector operations.New ONNX Embeddings Example Package
Cargo.tomlfor theruvector-onnx-embeddingsexample, specifying dependencies for ONNX runtime, tokenization, tensor operations, async runtime, serialization, error handling, HTTP, progress, logging, file operations, parallelism, GPU acceleration (optional), and benchmarking. This configuration enables flexible and performant embedding workflows in Rust.Examples for Embedding Workflows
examples/basic.rs: Demonstrates single text embedding and similarity computation using the ONNX embedder.examples/batch.rs: Shows batch embedding with both sequential and parallel processing, including performance measurement.examples/semantic_search.rs: Provides a semantic search example integrating the embedder with a vector index, supporting document insertion, querying, and finding similar documents.Benchmarking and Performance Evaluation
benches/embedding_benchmark.rs: Benchmarks embedding generation, pooling strategies, and similarity computations for various batch sizes and configurations using Criterion.benches/gpu_benchmark.rs: Benchmarks CPU vs GPU performance for similarity, pooling, and vector operations, including memory throughput and end-to-end search scenarios.