Skip to content

docs(research): add ultra-low-bit quantization & edge deployment research#255

Merged
ruvnet merged 2 commits intomainfrom
claude/research-quantization-edge-rDuRZ
Mar 12, 2026
Merged

docs(research): add ultra-low-bit quantization & edge deployment research#255
ruvnet merged 2 commits intomainfrom
claude/research-quantization-edge-rDuRZ

Conversation

@ruvnet
Copy link
Copy Markdown
Owner

@ruvnet ruvnet commented Mar 12, 2026

Comprehensive research collection on 2-bit/3-bit quantization for ruvLLM:

  • 01: Ultra-low-bit quantization survey (ICLR'26, QuIP, BitNet, I-quants)
  • 02: Quantization-aware training (QAT) with reasoning preservation
  • 03: QuIP 2-bit framework analysis (incoherence processing, E8 lattice)
  • 04: MoE memory-aware routing for edge SRAM budgets
  • 05: ruvLLM quantization architecture deep review and gap analysis
  • 06: Rust implementation plan for 2-bit QAT pipeline (14-week roadmap)
  • 07: Novel 3-int pi-constant quantization using irrational scaling

Key findings: ruvLLM has strong foundations (BitNet, K-quants, GGUF, KV cache)
but needs QAT training loop and differentiable quantization primitives.
Pi-constant scaling provides ~0.5 bit effective precision gain at 3-bit.

https://claude.ai/code/session_01E4pmfETYzknb1xq2dzCCaj

claude added 2 commits March 12, 2026 13:13
…arch

Comprehensive research collection on 2-bit/3-bit quantization for ruvLLM:

- 01: Ultra-low-bit quantization survey (ICLR'26, QuIP, BitNet, I-quants)
- 02: Quantization-aware training (QAT) with reasoning preservation
- 03: QuIP 2-bit framework analysis (incoherence processing, E8 lattice)
- 04: MoE memory-aware routing for edge SRAM budgets
- 05: ruvLLM quantization architecture deep review and gap analysis
- 06: Rust implementation plan for 2-bit QAT pipeline (14-week roadmap)
- 07: Novel 3-int pi-constant quantization using irrational scaling

Key findings: ruvLLM has strong foundations (BitNet, K-quants, GGUF, KV cache)
but needs QAT training loop and differentiable quantization primitives.
Pi-constant scaling provides ~0.5 bit effective precision gain at 3-bit.

https://claude.ai/code/session_01E4pmfETYzknb1xq2dzCCaj
…ecture

Comprehensive architecture decision record for implementing 2-bit/3-bit
quantization-aware training in ruvLLM using Domain-Driven Design:

- 5 bounded contexts: Quantization Core, Training, MoE Routing, WASM Runtime, Observability
- Pi-constant quantization with irrational scaling (pi/k step sizes)
- QAT training loop with STE variants and LoRA-QAT lightweight path
- QuIP incoherence via fast Walsh-Hadamard (O(n log n))
- Memory-aware MoE routing with expert precision allocation
- WASM SIMD128 kernels reusing existing tl1_wasm.rs LUT pattern
- Security: weight integrity, GGUF validation, WASM sandbox
- Benchmarking: criterion suite with throughput/quality targets
- 14-week timeline, maps to 18 existing files for extension

Placed in docs/adr/ddd/ per DDD architectural pattern organization.

https://claude.ai/code/session_01E4pmfETYzknb1xq2dzCCaj
@ruvnet ruvnet merged commit aee77ba into main Mar 12, 2026
7 checks passed
@ruvnet ruvnet deleted the claude/research-quantization-edge-rDuRZ branch April 21, 2026 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants