chore(distill): Phase 5 HumanEval dispatch script (turnkey post-Stage-D)#1847
Merged
Conversation
SPEC-DISTILL-001 Phase 5 evaluates the trained student from Phase 4
on the HumanEval-164 coding benchmark. This script wraps `apr eval
--dataset humaneval` for one-command dispatch on gx10 Blackwell GB10.
Defaults:
MODEL_PATH = output of Stage D 50K dispatch
(/home/noah/runs/distill-smoke-20260520-124239/student-trained.apr)
SAMPLES = 1 (pass@1 with greedy decoding)
TEMPERATURE= 0.0 (deterministic)
THRESHOLD = 30% (F-DISTILL-HUMANEVAL-001 minimum)
Auto-pulls HumanEval JSONL via `apr pull dataset openai/humaneval`
if HUMANEVAL_JSONL env not set.
Estimated runtime: ~14min (164 problems × ~5s/problem inference on
Blackwell). Negligible compared to Stage D's 28h.
Usage post-Stage-D:
bash scripts/dispatch-phase5-humaneval-gx10.sh
Phase 4 ladder progress:
Stages A → C-prep + Stage C trial ✅ DONE
Stage D 50K dispatch 🟡 RUNNING (PID 196378, gx10)
Stage E HumanEval (THIS script) ⏳ ready to fire post-D
Stage F publish v2 ⏳ Phase 6 (turnkey post-E)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2 tasks
noahgift
added a commit
that referenced
this pull request
May 20, 2026
…e-5) (#1848) SPEC-DISTILL-001 Phase 6 publishes the trained student from Phase 4 + validated by Phase 5 HumanEval to HuggingFace Hub per SPEC-HF-PUBLISH-001. Wraps `apr publish` for one-command HF upload. Defaults: MODEL_DIR = Stage D output directory REPO_ID = paiml/qwen2.5-coder-0.5b-distilled-v2 LICENSE = apache-2.0 LIBRARY_NAME = aprender PIPELINE_TAG = text-generation TAGS = distillation,qwen2.5,code,blackwell-gb10 PUBLISH_HOST = gx10 (where the model lives) Falsifier: F-DISTILL-PUBLISH-001 — `apr pull <REPO_ID>` round-trips to the same checkpoint that was uploaded. Pre-flight: - HF_TOKEN required (write scope for paiml/ namespace) - MODEL_DIR must exist (checked on publish host, except in --dry-run) Estimated runtime: ~10 min (upload of ~1 GB model + companion files over LFS/NDJSON commits per SPEC-HF-PUBLISH-001). Usage post-Phase-5: bash scripts/dispatch-phase6-publish.sh Phase 4 ladder progress: Stages A → C + Stage C trial ✅ DONE Stage D 50K dispatch 🟡 RUNNING (PID 196378, gx10) Stage E HumanEval (#1847) 🟡 in CI, turnkey post-D Stage F publish (THIS script) ⏳ ready to fire post-E After Stage F lands, MODEL-2 distillation lifecycle is complete: apr pull paiml/qwen2.5-coder-0.5b-distilled-v2 → working trained student available for end-user dogfood. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 20, 2026
… 4 RUNNING (#1851) Captures the live state of the distillation epic as of 2026-05-20: Phase 1 — Teacher provider ✅ MERGED (#1786, #1787) Phase 2 — Student fwd/bwd + KD ✅ MERGED (#1788–#1797) Phase 3 — E2E smoke on Blackwell GB10 ✅ DISCHARGED (#1828) Phase 3b — seq_len=256 scale verify ✅ DISCHARGED (#1833) Phase 4 — 50K training (Stage D) 🟡 RUNNING (PID 196378, gx10) Phase 5 — HumanEval pass@1 ⏳ ready (#1847) Phase 6 — Publish v2 ⏳ ready (#1848) Inserts a new top-of-doc status table that points at: - The 11-PR Blackwell cascade (post-mortem in blackwell-cascade-postmortem.md) - Stage C real-corpus dispatch result (15.61 → 6.01 over 124 steps) - Stage D running with ETA ~22h from 2026-05-20 13:43 UTC - Phase 5/6 turnkey scripts ready post-D This captures institutional knowledge for the team and future sessions: the spec doc reflects what's actually shipped rather than the original plan from 2026-05-18 when the epic was still scaffolded. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 5 prep: turnkey HumanEval dispatch
SPEC-DISTILL-001 Phase 5 evaluates the trained student from Stage D on HumanEval-164. This script wraps
apr eval --dataset humanevalfor one-command dispatch on gx10.Defaults
MODEL_PATH/home/noah/runs/distill-smoke-20260520-124239/student-trained.aprSAMPLESTEMPERATURETHRESHOLD_PCTHUMANEVAL_JSONLapr pull dataset openai/humanevalRuntime
~14 min (164 problems × ~5s/problem inference on Blackwell). Negligible compared to Stage D's 28h.
Usage post-Stage-D
Phase 4 ladder
🤖 Generated with Claude Code