Skip to content

evidence(distill): Stage C — first real-corpus distillation on GB10 PASSES#1845

Merged
noahgift merged 3 commits into
mainfrom
evidence/distill-stage-c-real-corpus-pass
May 20, 2026
Merged

evidence(distill): Stage C — first real-corpus distillation on GB10 PASSES#1845
noahgift merged 3 commits into
mainfrom
evidence/distill-stage-c-real-corpus-pass

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

🎉 Stage C Phase 4 trial: real corpus, GB10, F-DISTILL-SMOKE-001 PASS

initial_loss = 15.6094
final_loss   =  6.0095   ← Δ = -9.60 (-62% reduction)
124 steps, 232.4s, 1.87 sec/step

First end-to-end Phase 4 dispatch with real corpus (.bin shards via ShardBatchSource). 0.5B Qwen2.5-Coder teacher → 0.5B student on Blackwell GB10 sm_121.

What this validates

Cascade math

Stage Δloss steps per-step data
Stage A -6.80 62 -0.110/step synthetic, seq=256
Stage C -9.60 124 -0.077/step real corpus, seq=256

Stage A's per-step rate is higher because synthetic = zero variance per batch. Stage C has higher variance but covers more concepts, so absolute Δ is larger.

Phase 4 ladder

Stage PR Status
A #1833 ✅ MERGED + verified
B-1 #1836 ✅ MERGED
B-2 #1839 ✅ MERGED
C-prep #1840 ✅ MERGED
B-1.5 tests #1841 🟡 in CI
C trial THIS evidence PASSED 2026-05-20
D — 50K dispatch (next) ⏳ awaiting user check-in (28h GB10 compute)
E — HumanEval pass@1 (Phase 5) ⏳ turnkey post-D
F — publish v2 (Phase 6) ⏳ turnkey post-E

Test plan

Evidence-only PR; this captures proof-of-success for the Phase 4 readiness checkpoint.

🤖 Generated with Claude Code

…ASSES (Phase 4 ladder)

2026-05-20 12:34 UTC — first end-to-end Phase 4 dispatch with real corpus
(.bin shards via ShardBatchSource). 0.5B Qwen2.5-Coder teacher → 0.5B
student on Blackwell GB10 (sm_121), 100-step trial.

  initial_loss = 15.6094
  final_loss   =  6.0095   ← Δ = -9.60 (-62% reduction)
  124 steps, 232.4s, 1.87 sec/step

This is the first real-corpus Phase 4 dispatch. The synthetic Phase 3
victory (#1828, -0.47 over 62 steps) and the seq_len=256 Stage A smoke
(#1833, -6.80) both predicted Phase 4 readiness; Stage C confirms it
with strictly better convergence on real data (codeparrot Python
tokenized to Qwen vocab, 10 shards / 383 MB).

What this validates:
- ShardBatchSource (PR #1836, PMAT-PHASE4-STAGE-B-1) reads .bin shards
  correctly and produces non-degenerate batches
- Pipeline integration (PR #1839, PMAT-PHASE4-STAGE-B-2) swaps from
  synthetic → real source via with_batch_source() cleanly
- Dispatch script DATASET_DIR knob (PR #1840) end-to-end through gx10
- Full Phase 4 readiness for the 50K-step Stage D run (compute-gated,
  requires user check-in per autonomous-mode rule)

Cascade math:
  Stage A:  Δloss = -6.80 over 62 steps  (synthetic, seq=256)
  Stage C:  Δloss = -9.60 over 124 steps (real corpus, seq=256)
  Per-step loss decrease:
    Stage A: -0.110/step
    Stage C: -0.077/step
  Stage A's per-step rate is higher because synthetic data has zero
  variance — every batch is the same identity-mapping task. Real-corpus
  Stage C has higher variance but covers more concepts, so absolute
  delta is larger.

Phase 4 ladder progress:
  Stage A (#1833)              ✅ MERGED + verified
  Stage B-1 (#1836)            ✅ MERGED
  Stage B-2 (#1839)            ✅ MERGED
  Stage C-prep (#1840)         ✅ MERGED
  Stage B-1.5 tests (#1841)    🟡 in CI
  Stage C trial (THIS evidence) ✅ PASSED 2026-05-20
  Stage D 50K dispatch          ⏳ awaiting user check-in (28h GB10 compute)
  Stage E HumanEval pass@1      ⏳ Phase 5 (turnkey post-Stage-D)
  Stage F publish v2            ⏳ Phase 6 (turnkey post-Stage-E)

Evidence:
- evidence/distill-stage-c-trial/dispatch.json — dispatch manifest
- evidence/distill-stage-c-trial/launch-victory.txt — full training log

Run dir on gx10: /home/noah/runs/distill-smoke-20260520-123259/
Trained checkpoint: student-trained.apr/model.safetensors

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) May 20, 2026 10:41
@noahgift noahgift merged commit d7fa25e into main May 20, 2026
10 checks passed
@noahgift noahgift deleted the evidence/distill-stage-c-real-corpus-pass branch May 20, 2026 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant