feat: Add PowerInfer-style sparse inference engine with precision lanes by ruvnet · Pull Request #106 · ruvnet/RuVector

ruvnet · 2026-01-05T04:19:38Z

This commit introduces a comprehensive sparse inference engine for RuVector
that exploits activation locality in neural networks for efficient edge deployment.

Key Features

Core Sparse Inference Engine (ruvector-sparse-inference)

Low-rank predictor using P·Q matrix factorization for fast neuron selection
Sparse FFN kernels that only compute active neurons
Hot/cold neuron classification and caching
SIMD optimization for AVX2, SSE4.1, NEON, and WASM SIMD
GGUF parser with full quantization support (Q4_0 through Q6_K)

Precision Lanes (3/5/7-bit Layered Quantization)

3-bit lane: Reflex signals, gating, health metrics (ESP32 compatible)
5-bit lane: Streaming embeddings, drift detection (V0 appliance)
7-bit lane: Reasoning, memory writes, micro-LoRA (Desktop/FPGA)
Graduation policy with automatic lane escalation/demotion
Telemetry and statistics tracking per lane

Model Support

LFM2-style embedding models (Liquid AI)
Sentence-transformer encoders (BERT, MiniLM)
Llama-family decoder models (GGUF format)

Integration

EmbeddingProvider integration for Ruvector
InferenceBackend integration for RuvLLM
WebAssembly bindings (ruvector-sparse-inference-wasm)

Performance Targets

LFM2 350M: ~5-10ms per sentence (2.5x speedup)
Llama 7B: 50-100ms per token (5-10x speedup)
Memory: 1.5-2x reduction via weight offloading
<1% accuracy loss with 70% sparsity

Tests & Benchmarks

50+ unit tests for predictor, FFN, quantization
SIMD kernel benchmarks
Property-based tests with proptest

This implements the SPARC specification for activation locality inference
with layered quantization as the control theory foundation.

This commit introduces a comprehensive sparse inference engine for RuVector that exploits activation locality in neural networks for efficient edge deployment. ## Key Features ### Core Sparse Inference Engine (ruvector-sparse-inference) - Low-rank predictor using P·Q matrix factorization for fast neuron selection - Sparse FFN kernels that only compute active neurons - Hot/cold neuron classification and caching - SIMD optimization for AVX2, SSE4.1, NEON, and WASM SIMD - GGUF parser with full quantization support (Q4_0 through Q6_K) ### Precision Lanes (3/5/7-bit Layered Quantization) - 3-bit lane: Reflex signals, gating, health metrics (ESP32 compatible) - 5-bit lane: Streaming embeddings, drift detection (V0 appliance) - 7-bit lane: Reasoning, memory writes, micro-LoRA (Desktop/FPGA) - Graduation policy with automatic lane escalation/demotion - Telemetry and statistics tracking per lane ### Model Support - LFM2-style embedding models (Liquid AI) - Sentence-transformer encoders (BERT, MiniLM) - Llama-family decoder models (GGUF format) ### Integration - EmbeddingProvider integration for Ruvector - InferenceBackend integration for RuvLLM - WebAssembly bindings (ruvector-sparse-inference-wasm) ### Performance Targets - LFM2 350M: ~5-10ms per sentence (2.5x speedup) - Llama 7B: 50-100ms per token (5-10x speedup) - Memory: 1.5-2x reduction via weight offloading - <1% accuracy loss with 70% sparsity ### Tests & Benchmarks - 50+ unit tests for predictor, FFN, quantization - SIMD kernel benchmarks - Property-based tests with proptest This implements the SPARC specification for activation locality inference with layered quantization as the control theory foundation.

Implements π (pi) as a structural constant for 3/5/7-bit precision systems: π Module Components: - constants.rs: π-derived calibration constants (PI_SCALE_3BIT/5BIT/7BIT) avoiding power-of-2 resonance artifacts with anti-resonance offsets - drift.rs: Quantization honesty detection via π transforms measuring error growth to detect precision degradation - angular.rs: Hyperspherical embeddings with π phase encoding enabling angle-based similarity in low-bit systems - chaos.rs: Deterministic pseudo-randomness from π digits for tie-breaking, scheduling, and micro-LoRA ordering Key insight: π is not about geometry here. It is about injecting infinite structure into finite machines without breaking determinism. Also updates README.md with comprehensive documentation including architecture diagrams, π integration examples, and precision lane graduation rules. 35 new tests, all passing.

- Add missing memory module with QuantizedWeights and NeuronCache types - Fix LowRankPredictor initialization to use Distribution trait correctly - Update SparseInferenceEngine to use top-K selection for reliable activation - Update SparseEmbeddingProvider and SparseInferenceBackend with top-K - Fix GELU test to use correct expected value (-0.159, not -0.841) - Fix sparse_matmul_accumulate for non-contiguous column views - Update benchmarks to use correct API signatures - Adjust 3-bit quantization test tolerance for realistic error bounds - Improve test robustness with appropriate sparsity ratios All 98 tests pass with 2.9-8.7x speedup demonstrated in benchmarks. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add description, keywords, categories, and readme reference 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…es (#106) ## Summary - Add PowerInfer-style sparse inference engine with precision lanes - Add memory module with QuantizedWeights and NeuronCache - Fix compilation and test issues - Demonstrated 2.9-8.7x speedup at typical sparsity levels - Published to crates.io as ruvector-sparse-inference v0.1.30 ## Key Features - Low-rank predictor using P·Q matrix factorization for fast neuron selection - Sparse FFN kernels that only compute active neurons - SIMD optimization for AVX2, SSE4.1, NEON, and WASM SIMD - GGUF parser with full quantization support (Q4_0 through Q6_K) - Precision lanes (3/5/7-bit layered quantization) - π integration for low-precision systems 🤖 Generated with [Claude Code](https://claude.com/claude-code)

claude and others added 4 commits January 5, 2026 02:49

chore: Add metadata for crates.io publishing

475c65c

- Add description, keywords, categories, and readme reference 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

ruvnet merged commit 76cec56 into main Jan 5, 2026
5 checks passed

ruvnet deleted the claude/sparse-inference-engine-Z7lVd branch April 21, 2026 20:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add PowerInfer-style sparse inference engine with precision lanes#106

feat: Add PowerInfer-style sparse inference engine with precision lanes#106
ruvnet merged 4 commits intomainfrom
claude/sparse-inference-engine-Z7lVd

ruvnet commented Jan 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ruvnet commented Jan 5, 2026

Key Features

Core Sparse Inference Engine (ruvector-sparse-inference)

Precision Lanes (3/5/7-bit Layered Quantization)

Model Support

Integration

Performance Targets

Tests & Benchmarks

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants