Conversation
10 tasks
lusoris
added a commit
that referenced
this pull request
May 5, 2026
…(ADR-0313) (#410) * ci(policy): Required Checks Aggregator — unblock doc/Python-only PRs (ADR-0313) The 23-named-required-check posture (ADR-0037) deadlocks doc/Python-only PRs: the C-build matrix path-filter-skips on their diffs, but branch protection counts a path-filter-skip + a never-ran-at-all as not satisfying the required-check. PR #400 hit this concretely (10/23 succeeded; 13/23 either skipped or never reported; gh pr merge returned "the base branch policy prohibits the merge"). Aggregator is one workflow with no path filter. It polls up to 8 minutes for sibling workflows to register, then verifies each named check on the head SHA reported success/skipped/neutral (or didn't appear at all, which is the documented path-filter rejection semantics). Aggregator becomes the single branch-protection required check; the 23 individual workflows continue to run unchanged. Manual operator step at adoption (after this PR merges): gh api -X PUT "repos/lusoris/vmaf/branches/master/protection/required_status_checks" \ -F 'strict=true' -F 'contexts=["Required Checks Aggregator"]' Unblocks #400, #403, #404, #405, #406, #407 currently stuck on the deadlock. Per user popup direction 2026-05-05. Files: .github/workflows/required-aggregator.yml (new), docs/adr/0313-*.md (new), changelog.d/added/*.md (new), docs/adr/README.md (+1 row), docs/adr/_index_fragments/_order.txt (+1 line + new fragment), docs/rebase-notes.md §0313. * ci: retrigger after PR body cleanup * ci: retrigger after deliverables opt-out polarity fix --------- Co-authored-by: Lusoris <lusoris@pm.me>
…ip workflow (ADR-0309) Follow-up to ADR-0303 / PR #399 that ships the operational harness for actually running the 5-seed x 9-fold LOSO retrain against the locally available Netflix Public Dataset and emitting a machine-checkable verdict file. - ai/scripts/run_ensemble_v2_real_corpus_loso.sh: Bash wrapper that validates .workingdir2/netflix/, loops the seeds through the existing train_fr_regressor_v2_ensemble_loso.py, tees timestamped per-seed logs. - ai/scripts/validate_ensemble_seeds.py: applies the ADR-0303 gate (mean PLCC >= 0.95 AND max-min <= 0.005), snapshots the corpus YUV file list as sha256 over sorted relpath+size, writes PROMOTE.json on gate-pass or HOLD.json on gate-fail. - ai/tests/test_validate_ensemble_seeds.py: 7 tests covering both verdict paths plus exit-code coverage. - docs/ai/ensemble-v2-real-corpus-retrain-runbook.md: prerequisites, two-command run, verdict interpretation, rollback procedure. - docs/adr/0309-*.md (Proposed): decision matrix with 4 alternatives. - docs/research/0081-*.md: corpus-size sufficiency, LOSO sizing, seed-diversity hyperparameters, Seeking_25fps weak-fold diagnostic. - ai/AGENTS.md: appended ADR-0309 invariant (registry-flip is a separate PR; never flip during a rebase). The harness deliberately does NOT run the LOSO inside the PR (6-12 h GPU work) and does NOT flip the registry — the registry flip is a separate follow-up PR gated on a passing PROMOTE.json. Reproducer: pytest ai/tests/test_validate_ensemble_seeds.py -v Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2875cb6 to
31e4aa2
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces the operational “real-corpus LOSO retrain harness” for the fr_regressor_v2 deep-ensemble workflow described in ADR-0309: a bash wrapper to run per-seed LOSO training, a Python validator to apply the ADR-0303 gate and emit PROMOTE.json/HOLD.json, plus accompanying tests and documentation (runbook, ADR, research digest).
Changes:
- Add
ai/scripts/run_ensemble_v2_real_corpus_loso.shandai/scripts/validate_ensemble_seeds.py(with pytest coverage) to drive and validate an out-of-band retrain. - Add ADR-0309 + a runbook documenting the operator workflow and rollback guidance.
- Add a new research digest and index entries for the workstream.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/research/README.md | Adds Research-0081 to the research index. |
| docs/research/0081-fr-regressor-v2-ensemble-real-corpus-methodology.md | New research digest for the real-corpus retrain methodology. |
| docs/rebase-notes.md | Adds a rebase-note entry for ADR-0309. |
| docs/ai/ensemble-v2-real-corpus-retrain-runbook.md | New operator runbook for running wrapper + validator and interpreting verdicts. |
| docs/adr/README.md | Adds ADR-0309 to the ADR index table (but this file is generated). |
| docs/adr/0309-fr-regressor-v2-ensemble-real-corpus-retrain.md | New ADR documenting the harness/flip workflow decision. |
| CHANGELOG.md | Adds an Unreleased entry (but this file is generated). |
| ai/tests/test_validate_ensemble_seeds.py | New tests for validator verdict emission + exit codes. |
| ai/scripts/validate_ensemble_seeds.py | New validator script that applies the ADR-0303 gate and writes PROMOTE/HOLD verdict files. |
| ai/scripts/run_ensemble_v2_real_corpus_loso.sh | New wrapper intended to run per-seed LOSO training and collect logs/artefacts. |
| ai/AGENTS.md | Adds an invariant note that registry flips must happen in a separate PR. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+11
to
+41
| - **`fr_regressor_v2` ensemble — real-corpus retrain harness + | ||
| flip workflow (ADR-0309).** Follow-up to | ||
| [ADR-0303](docs/adr/0303-fr-regressor-v2-ensemble-prod-flip.md) / | ||
| PR #399 that ships the operational harness for actually running | ||
| the 5-seed × 9-fold LOSO retrain against the locally available | ||
| Netflix Public Dataset (`.workingdir2/netflix/`) and emitting a | ||
| machine-checkable verdict file. Adds | ||
| [`ai/scripts/run_ensemble_v2_real_corpus_loso.sh`](ai/scripts/run_ensemble_v2_real_corpus_loso.sh) | ||
| (Bash wrapper that validates the corpus, loops the seeds through | ||
| `train_fr_regressor_v2_ensemble_loso.py`, and tees timestamped | ||
| per-seed logs under `runs/ensemble_v2_real/logs/`), | ||
| [`ai/scripts/validate_ensemble_seeds.py`](ai/scripts/validate_ensemble_seeds.py) | ||
| (Python validator that calls the ADR-0303 gate, snapshots the | ||
| corpus YUV file list as sha256 over sorted `relpath\tsize`, and | ||
| writes `PROMOTE.json` on gate-pass with a recommendation to flip | ||
| the five `fr_regressor_v2_ensemble_v1_seed{0..4}` rows in | ||
| `model/tiny/registry.json` from `smoke: true` to `smoke: false`, | ||
| or `HOLD.json` on gate-fail with the failing-seed details and a | ||
| recommendation to keep `smoke: true` and investigate diversity / | ||
| hyperparameters), unit tests for both verdict paths, and a | ||
| runbook | ||
| [`docs/ai/ensemble-v2-real-corpus-retrain-runbook.md`](docs/ai/ensemble-v2-real-corpus-retrain-runbook.md) | ||
| covering prerequisites, the two-command run, verdict | ||
| interpretation, and rollback. The harness deliberately does | ||
| **not** run the LOSO inside the PR (6–12 h GPU work) and does | ||
| **not** flip the registry — the registry flip is a separate | ||
| follow-up PR gated on a passing `PROMOTE.json` (preserves a clean | ||
| revert surface and honours the new `ai/AGENTS.md` invariant that | ||
| registry-flip never happens during a rebase). Companion research | ||
| digest: | ||
| [Research-0081](docs/research/0081-fr-regressor-v2-ensemble-real-corpus-methodology.md). |
Comment on lines
+72
to
+77
| python "$repo_root/ai/scripts/train_fr_regressor_v2_ensemble_loso.py" \ | ||
| --seeds "$seed" \ | ||
| --corpus-root "$corpus_root" \ | ||
| --output "$out_dir/loso_seed${seed}.json" \ | ||
| --out-dir "$out_dir" \ | ||
| 2>&1 | tee "$log_file" |
Comment on lines
+55
to
+57
| `train_fr_regressor_v2_ensemble_loso.py --seed N | ||
| --corpus-root $CORPUS_ROOT | ||
| --output runs/ensemble_v2_real/loso_seed{N}.json` per seed. |
| 2. **Verify the registry** — `python ai/scripts/validate_model_registry.py` | ||
| should pass; the five rows must read `"smoke": true` again. | ||
| 3. **Verify the C-side ORT loader** — re-run | ||
| `python ai/tests/test_registry.py` to confirm the smoke graphs |
| @@ -312,5 +312,6 @@ ADRs may exist there for local session continuity, but the tracked | |||
| | [ADR-0255](0253-fastdvdnet-pre-real-weights.md) | T6-7b — FastDVDnet temporal pre-filter real upstream weights drop. Replaces the [ADR-0215](0215-fastdvdnet-pre-filter.md) smoke-only placeholder ONNX with the verbatim trained checkpoint from upstream `m-tassano/fastdvdnet` (commit `c8fdf61`, MIT) wrapped by a `LumaAdapter` PyTorch module that preserves the C-side luma `[1, 5, H, W]` → `[1, 1, H, W]` contract: each luma plane is `Concat`-tiled into RGB (`Y → [Y, Y, Y]`) to match upstream's 15-channel input, a constant `sigma = 25/255` noise map (upstream's reference inference level) is broadcast via `ones_like(centre) * sigma`, and the upstream RGB output is collapsed back to luma using BT.601 weights (`Y = 0.299 R + 0.587 G + 0.114 B`). Every `nn.PixelShuffle` instance in upstream's UpBlock is swapped pre-export for an allowlist-safe `Reshape`/`Transpose`/`Reshape` decomposition (zero learned params → numerically identical, verified `< 1e-6` max-abs diff between upstream PyTorch and exported ONNX); `DepthToSpace` deliberately stays off the op allowlist. Shipped graph uses only allowlisted ops. Registry row flips `smoke: false` with `license: MIT`, upstream commit pin, and refreshed `sha256`; sidecar JSON + doc `docs/ai/models/fastdvdnet_pre.md` carry full provenance. New `ai/scripts/export_fastdvdnet_pre.py` (replaces the `_placeholder.py` exporter — kept for reference). 9.5 MiB ONNX (well under the 50 MiB DNN size cap). Luma-native retrain tracked as T6-7c follow-up; INT8 PTQ tracked as T6-7d follow-up. | Accepted | ai, dnn, feature-extractor, wave-1, weights-drop, fork-local | | |||
Comment on lines
+71
to
+79
| echo "[ensemble-v2-real] seed=$seed -> $log_file" | ||
| python "$repo_root/ai/scripts/train_fr_regressor_v2_ensemble_loso.py" \ | ||
| --seeds "$seed" \ | ||
| --corpus-root "$corpus_root" \ | ||
| --output "$out_dir/loso_seed${seed}.json" \ | ||
| --out-dir "$out_dir" \ | ||
| 2>&1 | tee "$log_file" | ||
| done | ||
|
|
Comment on lines
+3
to
+11
| - **Status**: Active | ||
| - **Date**: 2026-05-05 | ||
| - **ADR**: [ADR-0309](../adr/0309-fr-regressor-v2-ensemble-real-corpus-retrain.md) | ||
| - **Related**: [Research-0075](0075-fr-regressor-v2-ensemble-prod-flip.md) | ||
| (parent — gate theory + conformal calibration sketch), | ||
| [Research-0067](0067-fr-regressor-v2-prod-loso.md) | ||
| (deterministic LOSO baseline), | ||
| [Research-0058](0058-fr-regressor-v2-feasibility.md) | ||
| (codec-aware feasibility). |
This was referenced May 6, 2026
lusoris
added a commit
that referenced
this pull request
May 6, 2026
The 2026-05-06 merge train shipped 13 ADRs whose implementing PRs landed but Status was never bumped from Proposed to Accepted. Per docs/adr/README.md and ADR-0028, ADRs flip to Accepted once the deliverable lands. The train moved faster than the per-ADR Status edits could keep up; this PR catches up. Flipped: - ADR-0302 (#401, ENCODER_VOCAB v3 schema expansion) - ADR-0303 (#399, fr_regressor_v2 ensemble prod-flip gate) - ADR-0304 (#402, vmaf-tune fast-path Optuna TPE) - ADR-0305 (#400, knob-sweep Pareto analysis scaffold) - ADR-0307 (#404, vmaf-tune ladder default sampler) - ADR-0308 (#406, knob-sweep recipe-regression policy) - ADR-0309 (#405, ensemble retrain harness) - ADR-0311 (#408, libfuzzer harness expansion) - ADR-0313 (#410, CI Required Checks Aggregator) [table-format Status, sed-edited inline] - ADR-0314 (#412, vmaf-tune --score-backend=vulkan) - ADR-0316 (#414, cli_parse long-only-option assertion fix) - ADR-0317 (#415, CI Docker + FFmpeg-SYCL flake fix) - ADR-0319 (#422, ensemble LOSO trainer real impl) Already-Accepted (no change): ADR-0310 (#407), ADR-0312 (#425), ADR-0315 (skeleton, intentionally Proposed), ADR-0321 (#424).
9 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to ADR-0303 /
PR #399 that ships the operational harness for actually running the
5-seed x 9-fold LOSO retrain against the locally available Netflix
Public Dataset (
.workingdir2/netflix/) and emitting amachine-checkable verdict file.
ai/scripts/run_ensemble_v2_real_corpus_loso.sh—validates the corpus, loops the seeds through the existing
train_fr_regressor_v2_ensemble_loso.py, tees timestamped per-seedlogs under
runs/ensemble_v2_real/logs/.ai/scripts/validate_ensemble_seeds.py— calls theADR-0303 gate, snapshots the corpus YUV file list as sha256, writes
PROMOTE.jsonon gate-pass (recommends flipping the fivefr_regressor_v2_ensemble_v1_seed{0..4}rows inmodel/tiny/registry.jsonfromsmoke: truetosmoke: false)or
HOLD.jsonon gate-fail.(6–12 h GPU work) and does not flip the registry — the registry
flip is a separate follow-up PR gated on a passing
PROMOTE.json.Six deep-dive deliverables (ADR-0108)
docs/research/0081-fr-regressor-v2-ensemble-real-corpus-methodology.md— corpus sufficiency, LOSO fold sizing, seed-diversity hyperparameters, Seeking_25fps weak-fold diagnostic.ai/AGENTS.md— registry-flip is aseparate PR; never flip during a rebase.
pytest ai/tests/test_validate_ensemble_seeds.py -vUnreleased — lusoris forkrow added.docs/rebase-notes.mdentry 0309.Test plan
pytest ai/tests/test_validate_ensemble_seeds.py -v(7/7 pass)python ai/scripts/validate_ensemble_seeds.py --helpbash -n ai/scripts/run_ensemble_v2_real_corpus_loso.shblack --check/ruff check/isort --checkclean onnew Python files
.workingdir2/netflix/(6–12 h, deferred to follow-up flip PR per ADR-0309)
Status: DRAFT
Leaving as draft until the user confirms direction. The follow-up
flip PR is blocked on a maintainer running the wrapper out-of-band
and producing a passing
PROMOTE.json.🤖 Generated with Claude Code