feat: Ternary-everywhere refactor + STE shadow weight infrastructure by shift · Pull Request #59 · shift/FerrisRes

shift · 2026-04-22T01:18:15Z

Ternary-Everywhere Refactor + STE Infrastructure

Summary

Two major changes in this PR:

1. Ternary-everywhere refactor (all base weights as {-1, 0, +1})

CpuLinear::from_weight() immediately quantizes FP32 → ternary
CpuMoELayer stores TernaryExpert for all expert gate/up/down projections
TernaryLinear::from_cpu_linear() uses raw ternary values (no FP32 round-trip)
TernaryMoELayer::from_cpu_moe() copies ternary directly
Backward pass dequantizes via .to_fp32() where needed
Memory reduction: 2-expert MoE ~2.1 GB ternary vs ~17 GB FP32

2. STE shadow weight infrastructure (BF16-ready)

ShadowPrecision trait: switchable BF16/FP32 for shadow weights
BitLinear<P>: ternary base + optional shadow, STE forward, stochastic rounding, boundary noise injection, running average α
BitMoELayer: per-block synchronized α across experts + PLE projections, ScaleSync enum (PerBlock vs PerExpert)
Added half crate with num-traits/serde/bytemuck features
20 new tests (13 shadow_weights + 7 bit_moe)

3. Research papers

ste_ternary_training.md — STE viability analysis, shadow weight memory constraints
ternary_quality_ceiling.md — rank sweep experiment design, LoRA vs STE ceiling
expert_count_tradeoff.md — 4 big vs 128 small experts tradeoff analysis

Test Results

1596 tests passing (20 new, 0 failures)
Clean build with --features vulkan
Pre-existing clippy warnings unchanged (516 main → 520 branch, delta from half crate)

Architecture Decisions

Decision	Choice	Rationale
Scale α	Running average (not learnable)	Prevents feedback loop death spiral
Scale sync	Per-block (all experts + PLE share α)	Prevents routing bias from scale differences
Shadow precision	BF16	50% memory savings, same dynamic range as FP32
Gradient precision	FP32	Accumulation stability

Memory Impact

Component	Before	After
Base weights (2-expert)	~17 GB FP32	~2.1 GB ternary
Base weights (4-expert)	~29 GB FP32	~3.7 GB ternary
LoRA + optimizer	~200 MB	~200 MB (unchanged)
STE shadow weights	N/A	~8.5 GB BF16 (future, not wired yet)

Next Steps (post-merge)

Wire ternary inference path (end-to-end test with quantized model)
Rank sweep experiment (ranks 4/8/16/32/64) to find quality ceiling
STE training pipeline (shadow weights → LoRA replacement)
GPU kernel for block-wide α reduction in WGSL

[3da81652]

CpuLinear now quantizes to ternary on creation via from_weight(). CpuMoELayer stores TernaryExpert for all expert weights. TernaryLinear::from_cpu_linear() uses raw ternary (no FP32 round-trip). TernaryMoELayer::from_cpu_moe() copies ternary values directly. Backward pass dequantizes via to_fp32() where needed. Memory: 2-expert MoE ~2.1 GB ternary vs ~17 GB FP32. [3da81652]

ShadowPrecision trait: switchable BF16/FP32 for shadow weights. BitLinear<P>: ternary base + optional shadow, STE forward path, stochastic rounding, boundary noise injection, running average α. BitMoELayer: per-block synchronized α across experts + PLE projections, ScaleSync enum for PerBlock vs PerExpert modes. Added half crate with num-traits/serde/bytemuck features. 20 new tests (13 shadow_weights + 7 bit_moe), all passing. [3da81652]

3 research papers + renderdoc added to flake for GPU kernel debugging. STE viability: shadow weight memory constraint (BF16 saves 50%). Quality ceiling: rank sweep needed to determine if LoRA can close gap. Expert count: 4 big vs 128 small experts with same param budget. [3da81652]

shift added 3 commits April 22, 2026 03:13

shift mentioned this pull request Apr 22, 2026

feat: Ternary student model inference path #60

Merged

shift merged commit 874a2c9 into main Apr 22, 2026
4 checks passed

shift deleted the feat/ternary-everywhere branch April 22, 2026 07:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Ternary-everywhere refactor + STE shadow weight infrastructure#59

feat: Ternary-everywhere refactor + STE shadow weight infrastructure#59
shift merged 3 commits intomainfrom
feat/ternary-everywhere

shift commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shift commented Apr 22, 2026

Ternary-Everywhere Refactor + STE Infrastructure

Summary

1. Ternary-everywhere refactor (all base weights as {-1, 0, +1})

2. STE shadow weight infrastructure (BF16-ready)

3. Research papers

Test Results

Architecture Decisions

Memory Impact

Next Steps (post-merge)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant