Skip to content

feat(ai): fr_regressor_v2 ensemble — real-corpus retrain harness + flip workflow (ADR-0309)#405

Merged
lusoris merged 2 commits intomasterfrom
feat/fr-regressor-v2-ensemble-real-corpus-retrain
May 6, 2026
Merged

feat(ai): fr_regressor_v2 ensemble — real-corpus retrain harness + flip workflow (ADR-0309)#405
lusoris merged 2 commits intomasterfrom
feat/fr-regressor-v2-ensemble-real-corpus-retrain

Conversation

@lusoris
Copy link
Copy Markdown
Owner

@lusoris lusoris commented May 5, 2026

Summary

Follow-up to ADR-0303 /
PR #399 that ships the operational harness for actually running the
5-seed x 9-fold LOSO retrain against the locally available Netflix
Public Dataset (.workingdir2/netflix/) and emitting a
machine-checkable verdict file.

  • Wrapper: ai/scripts/run_ensemble_v2_real_corpus_loso.sh
    validates the corpus, loops the seeds through the existing
    train_fr_regressor_v2_ensemble_loso.py, tees timestamped per-seed
    logs under runs/ensemble_v2_real/logs/.
  • Validator: ai/scripts/validate_ensemble_seeds.py — calls the
    ADR-0303 gate, snapshots the corpus YUV file list as sha256, writes
    PROMOTE.json on gate-pass (recommends flipping the five
    fr_regressor_v2_ensemble_v1_seed{0..4} rows in
    model/tiny/registry.json from smoke: true to smoke: false)
    or HOLD.json on gate-fail.
  • The harness deliberately does not run the LOSO inside the PR
    (6–12 h GPU work) and does not flip the registry — the registry
    flip is a separate follow-up PR gated on a passing PROMOTE.json.

Six deep-dive deliverables (ADR-0108)

  • (1) Research digest: docs/research/0081-fr-regressor-v2-ensemble-real-corpus-methodology.md — corpus sufficiency, LOSO fold sizing, seed-diversity hyperparameters, Seeking_25fps weak-fold diagnostic.
  • (2) Decision matrix: ADR-0309 §Alternatives considered (4 options).
  • (3) AGENTS.md invariant note: ai/AGENTS.md — registry-flip is a
    separate PR; never flip during a rebase.
  • (4) Reproducer / smoke-test command: pytest ai/tests/test_validate_ensemble_seeds.py -v
  • (5) CHANGELOG fragment: Unreleased — lusoris fork row added.
  • (6) Rebase note: docs/rebase-notes.md entry 0309.

Test plan

  • pytest ai/tests/test_validate_ensemble_seeds.py -v (7/7 pass)
  • python ai/scripts/validate_ensemble_seeds.py --help
  • bash -n ai/scripts/run_ensemble_v2_real_corpus_loso.sh
  • black --check / ruff check / isort --check clean on
    new Python files
  • Out-of-band: real LOSO run on .workingdir2/netflix/
    (6–12 h, deferred to follow-up flip PR per ADR-0309)

Status: DRAFT

Leaving as draft until the user confirms direction. The follow-up
flip PR is blocked on a maintainer running the wrapper out-of-band
and producing a passing PROMOTE.json.

🤖 Generated with Claude Code

lusoris added a commit that referenced this pull request May 5, 2026
…(ADR-0313) (#410)

* ci(policy): Required Checks Aggregator — unblock doc/Python-only PRs (ADR-0313)

The 23-named-required-check posture (ADR-0037) deadlocks doc/Python-only
PRs: the C-build matrix path-filter-skips on their diffs, but branch
protection counts a path-filter-skip + a never-ran-at-all as not
satisfying the required-check. PR #400 hit this concretely (10/23
succeeded; 13/23 either skipped or never reported; gh pr merge returned
"the base branch policy prohibits the merge").

Aggregator is one workflow with no path filter. It polls up to 8 minutes
for sibling workflows to register, then verifies each named check on the
head SHA reported success/skipped/neutral (or didn't appear at all,
which is the documented path-filter rejection semantics). Aggregator
becomes the single branch-protection required check; the 23 individual
workflows continue to run unchanged.

Manual operator step at adoption (after this PR merges):

  gh api -X PUT "repos/lusoris/vmaf/branches/master/protection/required_status_checks" \
    -F 'strict=true' -F 'contexts=["Required Checks Aggregator"]'

Unblocks #400, #403, #404, #405, #406, #407 currently stuck on the
deadlock. Per user popup direction 2026-05-05.

Files: .github/workflows/required-aggregator.yml (new),
docs/adr/0313-*.md (new), changelog.d/added/*.md (new),
docs/adr/README.md (+1 row), docs/adr/_index_fragments/_order.txt
(+1 line + new fragment), docs/rebase-notes.md §0313.

* ci: retrigger after PR body cleanup

* ci: retrigger after deliverables opt-out polarity fix

---------

Co-authored-by: Lusoris <lusoris@pm.me>
@lusoris lusoris marked this pull request as ready for review May 6, 2026 01:31
Copilot AI review requested due to automatic review settings May 6, 2026 01:31
…ip workflow (ADR-0309)

Follow-up to ADR-0303 / PR #399 that ships the operational harness
for actually running the 5-seed x 9-fold LOSO retrain against the
locally available Netflix Public Dataset and emitting a
machine-checkable verdict file.

- ai/scripts/run_ensemble_v2_real_corpus_loso.sh: Bash wrapper that
  validates .workingdir2/netflix/, loops the seeds through the
  existing train_fr_regressor_v2_ensemble_loso.py, tees timestamped
  per-seed logs.
- ai/scripts/validate_ensemble_seeds.py: applies the ADR-0303 gate
  (mean PLCC >= 0.95 AND max-min <= 0.005), snapshots the corpus
  YUV file list as sha256 over sorted relpath+size, writes
  PROMOTE.json on gate-pass or HOLD.json on gate-fail.
- ai/tests/test_validate_ensemble_seeds.py: 7 tests covering both
  verdict paths plus exit-code coverage.
- docs/ai/ensemble-v2-real-corpus-retrain-runbook.md: prerequisites,
  two-command run, verdict interpretation, rollback procedure.
- docs/adr/0309-*.md (Proposed): decision matrix with 4 alternatives.
- docs/research/0081-*.md: corpus-size sufficiency, LOSO sizing,
  seed-diversity hyperparameters, Seeking_25fps weak-fold diagnostic.
- ai/AGENTS.md: appended ADR-0309 invariant (registry-flip is a
  separate PR; never flip during a rebase).

The harness deliberately does NOT run the LOSO inside the PR
(6-12 h GPU work) and does NOT flip the registry — the registry
flip is a separate follow-up PR gated on a passing PROMOTE.json.

Reproducer: pytest ai/tests/test_validate_ensemble_seeds.py -v

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lusoris lusoris force-pushed the feat/fr-regressor-v2-ensemble-real-corpus-retrain branch from 2875cb6 to 31e4aa2 Compare May 6, 2026 01:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces the operational “real-corpus LOSO retrain harness” for the fr_regressor_v2 deep-ensemble workflow described in ADR-0309: a bash wrapper to run per-seed LOSO training, a Python validator to apply the ADR-0303 gate and emit PROMOTE.json/HOLD.json, plus accompanying tests and documentation (runbook, ADR, research digest).

Changes:

  • Add ai/scripts/run_ensemble_v2_real_corpus_loso.sh and ai/scripts/validate_ensemble_seeds.py (with pytest coverage) to drive and validate an out-of-band retrain.
  • Add ADR-0309 + a runbook documenting the operator workflow and rollback guidance.
  • Add a new research digest and index entries for the workstream.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
docs/research/README.md Adds Research-0081 to the research index.
docs/research/0081-fr-regressor-v2-ensemble-real-corpus-methodology.md New research digest for the real-corpus retrain methodology.
docs/rebase-notes.md Adds a rebase-note entry for ADR-0309.
docs/ai/ensemble-v2-real-corpus-retrain-runbook.md New operator runbook for running wrapper + validator and interpreting verdicts.
docs/adr/README.md Adds ADR-0309 to the ADR index table (but this file is generated).
docs/adr/0309-fr-regressor-v2-ensemble-real-corpus-retrain.md New ADR documenting the harness/flip workflow decision.
CHANGELOG.md Adds an Unreleased entry (but this file is generated).
ai/tests/test_validate_ensemble_seeds.py New tests for validator verdict emission + exit codes.
ai/scripts/validate_ensemble_seeds.py New validator script that applies the ADR-0303 gate and writes PROMOTE/HOLD verdict files.
ai/scripts/run_ensemble_v2_real_corpus_loso.sh New wrapper intended to run per-seed LOSO training and collect logs/artefacts.
ai/AGENTS.md Adds an invariant note that registry flips must happen in a separate PR.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CHANGELOG.md
Comment on lines +11 to +41
- **`fr_regressor_v2` ensemble — real-corpus retrain harness +
flip workflow (ADR-0309).** Follow-up to
[ADR-0303](docs/adr/0303-fr-regressor-v2-ensemble-prod-flip.md) /
PR #399 that ships the operational harness for actually running
the 5-seed × 9-fold LOSO retrain against the locally available
Netflix Public Dataset (`.workingdir2/netflix/`) and emitting a
machine-checkable verdict file. Adds
[`ai/scripts/run_ensemble_v2_real_corpus_loso.sh`](ai/scripts/run_ensemble_v2_real_corpus_loso.sh)
(Bash wrapper that validates the corpus, loops the seeds through
`train_fr_regressor_v2_ensemble_loso.py`, and tees timestamped
per-seed logs under `runs/ensemble_v2_real/logs/`),
[`ai/scripts/validate_ensemble_seeds.py`](ai/scripts/validate_ensemble_seeds.py)
(Python validator that calls the ADR-0303 gate, snapshots the
corpus YUV file list as sha256 over sorted `relpath\tsize`, and
writes `PROMOTE.json` on gate-pass with a recommendation to flip
the five `fr_regressor_v2_ensemble_v1_seed{0..4}` rows in
`model/tiny/registry.json` from `smoke: true` to `smoke: false`,
or `HOLD.json` on gate-fail with the failing-seed details and a
recommendation to keep `smoke: true` and investigate diversity /
hyperparameters), unit tests for both verdict paths, and a
runbook
[`docs/ai/ensemble-v2-real-corpus-retrain-runbook.md`](docs/ai/ensemble-v2-real-corpus-retrain-runbook.md)
covering prerequisites, the two-command run, verdict
interpretation, and rollback. The harness deliberately does
**not** run the LOSO inside the PR (6–12 h GPU work) and does
**not** flip the registry — the registry flip is a separate
follow-up PR gated on a passing `PROMOTE.json` (preserves a clean
revert surface and honours the new `ai/AGENTS.md` invariant that
registry-flip never happens during a rebase). Companion research
digest:
[Research-0081](docs/research/0081-fr-regressor-v2-ensemble-real-corpus-methodology.md).
Comment on lines +72 to +77
python "$repo_root/ai/scripts/train_fr_regressor_v2_ensemble_loso.py" \
--seeds "$seed" \
--corpus-root "$corpus_root" \
--output "$out_dir/loso_seed${seed}.json" \
--out-dir "$out_dir" \
2>&1 | tee "$log_file"
Comment on lines +55 to +57
`train_fr_regressor_v2_ensemble_loso.py --seed N
--corpus-root $CORPUS_ROOT
--output runs/ensemble_v2_real/loso_seed{N}.json` per seed.
2. **Verify the registry** — `python ai/scripts/validate_model_registry.py`
should pass; the five rows must read `"smoke": true` again.
3. **Verify the C-side ORT loader** — re-run
`python ai/tests/test_registry.py` to confirm the smoke graphs
Comment thread docs/adr/README.md
@@ -312,5 +312,6 @@ ADRs may exist there for local session continuity, but the tracked
| [ADR-0255](0253-fastdvdnet-pre-real-weights.md) | T6-7b — FastDVDnet temporal pre-filter real upstream weights drop. Replaces the [ADR-0215](0215-fastdvdnet-pre-filter.md) smoke-only placeholder ONNX with the verbatim trained checkpoint from upstream `m-tassano/fastdvdnet` (commit `c8fdf61`, MIT) wrapped by a `LumaAdapter` PyTorch module that preserves the C-side luma `[1, 5, H, W]` → `[1, 1, H, W]` contract: each luma plane is `Concat`-tiled into RGB (`Y → [Y, Y, Y]`) to match upstream's 15-channel input, a constant `sigma = 25/255` noise map (upstream's reference inference level) is broadcast via `ones_like(centre) * sigma`, and the upstream RGB output is collapsed back to luma using BT.601 weights (`Y = 0.299 R + 0.587 G + 0.114 B`). Every `nn.PixelShuffle` instance in upstream's UpBlock is swapped pre-export for an allowlist-safe `Reshape`/`Transpose`/`Reshape` decomposition (zero learned params → numerically identical, verified `< 1e-6` max-abs diff between upstream PyTorch and exported ONNX); `DepthToSpace` deliberately stays off the op allowlist. Shipped graph uses only allowlisted ops. Registry row flips `smoke: false` with `license: MIT`, upstream commit pin, and refreshed `sha256`; sidecar JSON + doc `docs/ai/models/fastdvdnet_pre.md` carry full provenance. New `ai/scripts/export_fastdvdnet_pre.py` (replaces the `_placeholder.py` exporter — kept for reference). 9.5 MiB ONNX (well under the 50 MiB DNN size cap). Luma-native retrain tracked as T6-7c follow-up; INT8 PTQ tracked as T6-7d follow-up. | Accepted | ai, dnn, feature-extractor, wave-1, weights-drop, fork-local |
Comment on lines +71 to +79
echo "[ensemble-v2-real] seed=$seed -> $log_file"
python "$repo_root/ai/scripts/train_fr_regressor_v2_ensemble_loso.py" \
--seeds "$seed" \
--corpus-root "$corpus_root" \
--output "$out_dir/loso_seed${seed}.json" \
--out-dir "$out_dir" \
2>&1 | tee "$log_file"
done

Comment on lines +3 to +11
- **Status**: Active
- **Date**: 2026-05-05
- **ADR**: [ADR-0309](../adr/0309-fr-regressor-v2-ensemble-real-corpus-retrain.md)
- **Related**: [Research-0075](0075-fr-regressor-v2-ensemble-prod-flip.md)
(parent — gate theory + conformal calibration sketch),
[Research-0067](0067-fr-regressor-v2-prod-loso.md)
(deterministic LOSO baseline),
[Research-0058](0058-fr-regressor-v2-feasibility.md)
(codec-aware feasibility).
@lusoris lusoris merged commit e45299e into master May 6, 2026
55 checks passed
@lusoris lusoris deleted the feat/fr-regressor-v2-ensemble-real-corpus-retrain branch May 6, 2026 01:55
lusoris added a commit that referenced this pull request May 6, 2026
The 2026-05-06 merge train shipped 13 ADRs whose implementing PRs
landed but Status was never bumped from Proposed to Accepted. Per
docs/adr/README.md and ADR-0028, ADRs flip to Accepted once the
deliverable lands. The train moved faster than the per-ADR Status
edits could keep up; this PR catches up.

Flipped:
- ADR-0302 (#401, ENCODER_VOCAB v3 schema expansion)
- ADR-0303 (#399, fr_regressor_v2 ensemble prod-flip gate)
- ADR-0304 (#402, vmaf-tune fast-path Optuna TPE)
- ADR-0305 (#400, knob-sweep Pareto analysis scaffold)
- ADR-0307 (#404, vmaf-tune ladder default sampler)
- ADR-0308 (#406, knob-sweep recipe-regression policy)
- ADR-0309 (#405, ensemble retrain harness)
- ADR-0311 (#408, libfuzzer harness expansion)
- ADR-0313 (#410, CI Required Checks Aggregator) [table-format Status, sed-edited inline]
- ADR-0314 (#412, vmaf-tune --score-backend=vulkan)
- ADR-0316 (#414, cli_parse long-only-option assertion fix)
- ADR-0317 (#415, CI Docker + FFmpeg-SYCL flake fix)
- ADR-0319 (#422, ensemble LOSO trainer real impl)

Already-Accepted (no change): ADR-0310 (#407), ADR-0312 (#425),
ADR-0315 (skeleton, intentionally Proposed), ADR-0321 (#424).
@lusoris lusoris mentioned this pull request May 6, 2026
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants