Skip to content

chore(distill): Phase 5 HumanEval dispatch script (turnkey post-Stage-D)#1847

Merged
noahgift merged 4 commits into
mainfrom
chore/phase5-humaneval-dispatch-gx10
May 20, 2026
Merged

chore(distill): Phase 5 HumanEval dispatch script (turnkey post-Stage-D)#1847
noahgift merged 4 commits into
mainfrom
chore/phase5-humaneval-dispatch-gx10

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Phase 5 prep: turnkey HumanEval dispatch

SPEC-DISTILL-001 Phase 5 evaluates the trained student from Stage D on HumanEval-164. This script wraps apr eval --dataset humaneval for one-command dispatch on gx10.

Defaults

Knob Default Notes
MODEL_PATH Stage D output /home/noah/runs/distill-smoke-20260520-124239/student-trained.apr
SAMPLES 1 pass@1, greedy decoding
TEMPERATURE 0.0 deterministic
THRESHOLD_PCT 30 F-DISTILL-HUMANEVAL-001 minimum
HUMANEVAL_JSONL (auto-pull) apr pull dataset openai/humaneval

Runtime

~14 min (164 problems × ~5s/problem inference on Blackwell). Negligible compared to Stage D's 28h.

Usage post-Stage-D

bash scripts/dispatch-phase5-humaneval-gx10.sh

Phase 4 ladder

Stage Status
Stages A → C-prep + Stage C trial ✅ DONE
Stage D 50K dispatch 🟡 RUNNING (PID 196378, gx10)
Stage E HumanEval (THIS script) ⏳ ready to fire post-D
Stage F publish v2 ⏳ Phase 6 (turnkey post-E)

🤖 Generated with Claude Code

SPEC-DISTILL-001 Phase 5 evaluates the trained student from Phase 4
on the HumanEval-164 coding benchmark. This script wraps `apr eval
--dataset humaneval` for one-command dispatch on gx10 Blackwell GB10.

Defaults:
  MODEL_PATH = output of Stage D 50K dispatch
               (/home/noah/runs/distill-smoke-20260520-124239/student-trained.apr)
  SAMPLES    = 1 (pass@1 with greedy decoding)
  TEMPERATURE= 0.0 (deterministic)
  THRESHOLD  = 30% (F-DISTILL-HUMANEVAL-001 minimum)

Auto-pulls HumanEval JSONL via `apr pull dataset openai/humaneval`
if HUMANEVAL_JSONL env not set.

Estimated runtime: ~14min (164 problems × ~5s/problem inference on
Blackwell). Negligible compared to Stage D's 28h.

Usage post-Stage-D:
  bash scripts/dispatch-phase5-humaneval-gx10.sh

Phase 4 ladder progress:
  Stages A → C-prep + Stage C trial   ✅ DONE
  Stage D 50K dispatch                 🟡 RUNNING (PID 196378, gx10)
  Stage E HumanEval (THIS script)      ⏳ ready to fire post-D
  Stage F publish v2                   ⏳ Phase 6 (turnkey post-E)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 20, 2026
…e-5) (#1848)

SPEC-DISTILL-001 Phase 6 publishes the trained student from Phase 4 +
validated by Phase 5 HumanEval to HuggingFace Hub per SPEC-HF-PUBLISH-001.
Wraps `apr publish` for one-command HF upload.

Defaults:
  MODEL_DIR    = Stage D output directory
  REPO_ID      = paiml/qwen2.5-coder-0.5b-distilled-v2
  LICENSE      = apache-2.0
  LIBRARY_NAME = aprender
  PIPELINE_TAG = text-generation
  TAGS         = distillation,qwen2.5,code,blackwell-gb10
  PUBLISH_HOST = gx10 (where the model lives)

Falsifier: F-DISTILL-PUBLISH-001 — `apr pull <REPO_ID>` round-trips to
the same checkpoint that was uploaded.

Pre-flight:
  - HF_TOKEN required (write scope for paiml/ namespace)
  - MODEL_DIR must exist (checked on publish host, except in --dry-run)

Estimated runtime: ~10 min (upload of ~1 GB model + companion files
over LFS/NDJSON commits per SPEC-HF-PUBLISH-001).

Usage post-Phase-5:
  bash scripts/dispatch-phase6-publish.sh

Phase 4 ladder progress:
  Stages A → C + Stage C trial   ✅ DONE
  Stage D 50K dispatch            🟡 RUNNING (PID 196378, gx10)
  Stage E HumanEval (#1847)       🟡 in CI, turnkey post-D
  Stage F publish (THIS script)   ⏳ ready to fire post-E

After Stage F lands, MODEL-2 distillation lifecycle is complete:
  apr pull paiml/qwen2.5-coder-0.5b-distilled-v2 → working trained
  student available for end-user dogfood.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit 2af1a80 into main May 20, 2026
18 of 20 checks passed
@noahgift noahgift deleted the chore/phase5-humaneval-dispatch-gx10 branch May 20, 2026 14:46
noahgift added a commit that referenced this pull request May 20, 2026
… 4 RUNNING (#1851)

Captures the live state of the distillation epic as of 2026-05-20:

  Phase 1 — Teacher provider              ✅ MERGED (#1786, #1787)
  Phase 2 — Student fwd/bwd + KD          ✅ MERGED (#1788#1797)
  Phase 3 — E2E smoke on Blackwell GB10   ✅ DISCHARGED (#1828)
  Phase 3b — seq_len=256 scale verify     ✅ DISCHARGED (#1833)
  Phase 4 — 50K training (Stage D)        🟡 RUNNING (PID 196378, gx10)
  Phase 5 — HumanEval pass@1              ⏳ ready (#1847)
  Phase 6 — Publish v2                    ⏳ ready (#1848)

Inserts a new top-of-doc status table that points at:
- The 11-PR Blackwell cascade (post-mortem in blackwell-cascade-postmortem.md)
- Stage C real-corpus dispatch result (15.61 → 6.01 over 124 steps)
- Stage D running with ETA ~22h from 2026-05-20 13:43 UTC
- Phase 5/6 turnkey scripts ready post-D

This captures institutional knowledge for the team and future sessions:
the spec doc reflects what's actually shipped rather than the original
plan from 2026-05-18 when the epic was still scaffolded.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant