Skip to content

feat(hslm): TTQ — Trained Ternary Quantization with learned thresholds per layer #320

@gHashTag

Description

@gHashTag

Task

Replace fixed ternary thresholding with learned per-layer scaling factors.
Current: symmetric {-1, 0, +1}. New: asymmetric {-W_n, 0, +W_p} per layer.

Scientific Background

TTQ Paper (Li et al., 2016)

  • Learns separate positive/negative scaling: {-W_n^l, 0, +W_p^l}
  • Exceeds full-precision accuracy on CIFAR-10 (ResNet-32/44/56 by 0.04-0.36%)
  • 3% higher accuracy than fixed TWN on ImageNet
  • Dual gradient: one to weights (ternary assignment), one to thresholds (optimal values)

Gradient-Corrected STE (CVPR 2019, He et al.)

  • Scale STE gradient by 1/S_l (inverse of scaling factor)
  • Faster convergence than naive unit gradient
  • Properly accounts for scaling factor × quantization interaction

FOGZO (NeurIPS 2025)

  • First-Order-Guided Zeroth-Order gradient descent
  • Reduces STE bias while keeping cost low
  • 1-22 PPL improvement for quantized LMs vs baseline STE
  • Good for final training phases

Implementation

// Per-layer learned thresholds
const LayerQuantParams = struct {
    w_pos: f32 = 1.0,  // positive scaling
    w_neg: f32 = 1.0,  // negative scaling
    threshold: f32 = 0.5,  // ternary boundary
};

fn ternarize(w: f32, params: LayerQuantParams) i2 {
    if (w > params.threshold) return 1;   // maps to +w_pos
    if (w < -params.threshold) return -1; // maps to -w_neg
    return 0;
}

// Gradient for threshold:
// dL/d_threshold = sum of dL/dw for weights near boundary

Changes

  • src/hslm/trainer.zig: per-layer QuantParams struct, gradient update
  • src/hslm/quantize.zig: asymmetric ternarization with learned thresholds
  • 6 additional params total (2 per layer × 3 layers) — negligible overhead

Expected

  • 2-5% PPL improvement from better quantization adaptation
  • Especially impactful in early layers (embedding projection)
  • Compound with OHEM: better thresholds → better hard example selection

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent:spawnAuto-spawn agent container

    Projects

    Status
    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions