Task
Exploit zero-weight sparsity in ternary matrices. Current sparsity=25%, target 50%+ with regularizer.
Skip zero weights in matmul = proportional speedup.
Scientific Background
SpTGEMM: Sparse Ternary GEMM (arxiv:2510.06957)
- 5.98x speedup at 50% sparsity on Apple M1
- 5.59x speedup at 25% sparsity with NEON vectorization
- Blocked interleaved TCSC format > standard CSR/CSC for ternary
- Performance stable across varying sparsity levels — critical for real networks
BitNet a4.8 Sparsification (arxiv:2411.04965)
- Hybrid: 1-bit weights + top-k activation sparsification
- 55% activated parameters through learned sparsification
- L1 regularization during training explicitly encourages zero weights
- Achieves competitive performance with 45% fewer compute ops
FATNN: Fast Ternary Neural Networks (ICCV 2021)
- Ternary inner product: eliminate multiplication entirely
- +1 → identity, -1 → negation, 0 → skip
- 1.6-2.5x speedup in pure C++ (no specialized hardware)
- 2x faster than conventional ternary across 6 convolution configs
Implementation Plan
Phase 1: Sparsity-aware matmul
// Ternary packed format: 2 bits per weight (+1=01, -1=10, 0=00)
// Process only non-zero weights
for (k in 0..K) {
const w = weights[m][k]; // ternary
if (w == 0) continue; // skip zero — 25-50% of iterations
if (w == 1) acc += input[k];
else acc -= input[k]; // w == -1
}
Phase 2: Blocked interleaved format (from SpTGEMM paper)
- 4x4 blocks, interleaved across matrix dimensions
- Cache-line aligned storage: 64B = 256 ternary weights (2-bit packed)
- Separate positive/negative indices for branch-free processing
Phase 3: Sparsity regularizer (WDR)
L_total = L_ce + λ * Σ|w_i| * (1 - |w_i|)
- Weight Discretization Regularizer pushes weights toward {-1, 0, +1}
- λ controls sparsity level: λ=0.01 → ~25%, λ=0.05 → ~50%
- Progressive: start λ=0 → anneal to target over first 20K steps
Changes
src/hslm/simd_ops.zig: sparse ternary matmul with zero-skip
src/hslm/trainer.zig: add WDR regularizer with --sparsity flag
- Blocked interleaved storage format for weight matrices
- Benchmark:
tri test bench-sparse (varying sparsity 25-75%)
Expected
- 25% speedup at current sparsity (25%)
- 4-6x speedup at 50% sparsity (with regularizer)
- PPL impact: minimal if sparsity < 50%, ~2-5% degradation at 60%+
References
Task
Exploit zero-weight sparsity in ternary matrices. Current sparsity=25%, target 50%+ with regularizer.
Skip zero weights in matmul = proportional speedup.
Scientific Background
SpTGEMM: Sparse Ternary GEMM (arxiv:2510.06957)
BitNet a4.8 Sparsification (arxiv:2411.04965)
FATNN: Fast Ternary Neural Networks (ICCV 2021)
Implementation Plan
Phase 1: Sparsity-aware matmul
Phase 2: Blocked interleaved format (from SpTGEMM paper)
Phase 3: Sparsity regularizer (WDR)
Changes
src/hslm/simd_ops.zig: sparse ternary matmul with zero-skipsrc/hslm/trainer.zig: add WDR regularizer with --sparsity flagtri test bench-sparse(varying sparsity 25-75%)Expected
References