docs(research): add ultra-low-bit quantization & edge deployment research by ruvnet · Pull Request #255 · ruvnet/RuVector

ruvnet · 2026-03-12T14:05:43Z

Comprehensive research collection on 2-bit/3-bit quantization for ruvLLM:

01: Ultra-low-bit quantization survey (ICLR'26, QuIP, BitNet, I-quants)
02: Quantization-aware training (QAT) with reasoning preservation
03: QuIP 2-bit framework analysis (incoherence processing, E8 lattice)
04: MoE memory-aware routing for edge SRAM budgets
05: ruvLLM quantization architecture deep review and gap analysis
06: Rust implementation plan for 2-bit QAT pipeline (14-week roadmap)
07: Novel 3-int pi-constant quantization using irrational scaling

Key findings: ruvLLM has strong foundations (BitNet, K-quants, GGUF, KV cache)
but needs QAT training loop and differentiable quantization primitives.
Pi-constant scaling provides ~0.5 bit effective precision gain at 3-bit.

https://claude.ai/code/session_01E4pmfETYzknb1xq2dzCCaj

…arch Comprehensive research collection on 2-bit/3-bit quantization for ruvLLM: - 01: Ultra-low-bit quantization survey (ICLR'26, QuIP, BitNet, I-quants) - 02: Quantization-aware training (QAT) with reasoning preservation - 03: QuIP 2-bit framework analysis (incoherence processing, E8 lattice) - 04: MoE memory-aware routing for edge SRAM budgets - 05: ruvLLM quantization architecture deep review and gap analysis - 06: Rust implementation plan for 2-bit QAT pipeline (14-week roadmap) - 07: Novel 3-int pi-constant quantization using irrational scaling Key findings: ruvLLM has strong foundations (BitNet, K-quants, GGUF, KV cache) but needs QAT training loop and differentiable quantization primitives. Pi-constant scaling provides ~0.5 bit effective precision gain at 3-bit. https://claude.ai/code/session_01E4pmfETYzknb1xq2dzCCaj

…ecture Comprehensive architecture decision record for implementing 2-bit/3-bit quantization-aware training in ruvLLM using Domain-Driven Design: - 5 bounded contexts: Quantization Core, Training, MoE Routing, WASM Runtime, Observability - Pi-constant quantization with irrational scaling (pi/k step sizes) - QAT training loop with STE variants and LoRA-QAT lightweight path - QuIP incoherence via fast Walsh-Hadamard (O(n log n)) - Memory-aware MoE routing with expert precision allocation - WASM SIMD128 kernels reusing existing tl1_wasm.rs LUT pattern - Security: weight integrity, GGUF validation, WASM sandbox - Benchmarking: criterion suite with throughput/quality targets - 14-week timeline, maps to 18 existing files for extension Placed in docs/adr/ddd/ per DDD architectural pattern organization. https://claude.ai/code/session_01E4pmfETYzknb1xq2dzCCaj

claude added 2 commits March 12, 2026 13:13

ruvnet merged commit aee77ba into main Mar 12, 2026
7 checks passed

ruvnet deleted the claude/research-quantization-edge-rDuRZ branch April 21, 2026 20:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(research): add ultra-low-bit quantization & edge deployment research#255

docs(research): add ultra-low-bit quantization & edge deployment research#255
ruvnet merged 2 commits intomainfrom
claude/research-quantization-edge-rDuRZ

ruvnet commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ruvnet commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants