Conversation
10 tasks
lusoris
added a commit
that referenced
this pull request
May 5, 2026
…(ADR-0313) (#410) * ci(policy): Required Checks Aggregator — unblock doc/Python-only PRs (ADR-0313) The 23-named-required-check posture (ADR-0037) deadlocks doc/Python-only PRs: the C-build matrix path-filter-skips on their diffs, but branch protection counts a path-filter-skip + a never-ran-at-all as not satisfying the required-check. PR #400 hit this concretely (10/23 succeeded; 13/23 either skipped or never reported; gh pr merge returned "the base branch policy prohibits the merge"). Aggregator is one workflow with no path filter. It polls up to 8 minutes for sibling workflows to register, then verifies each named check on the head SHA reported success/skipped/neutral (or didn't appear at all, which is the documented path-filter rejection semantics). Aggregator becomes the single branch-protection required check; the 23 individual workflows continue to run unchanged. Manual operator step at adoption (after this PR merges): gh api -X PUT "repos/lusoris/vmaf/branches/master/protection/required_status_checks" \ -F 'strict=true' -F 'contexts=["Required Checks Aggregator"]' Unblocks #400, #403, #404, #405, #406, #407 currently stuck on the deadlock. Per user popup direction 2026-05-05. Files: .github/workflows/required-aggregator.yml (new), docs/adr/0313-*.md (new), changelog.d/added/*.md (new), docs/adr/README.md (+1 row), docs/adr/_index_fragments/_order.txt (+1 line + new fragment), docs/rebase-notes.md §0313. * ci: retrigger after PR body cleanup * ci: retrigger after deliverables opt-out polarity fix --------- Co-authored-by: Lusoris <lusoris@pm.me>
… gap) `tools/vmaf-tune/src/vmaftune/ladder.py::_default_sampler` no longer raises `NotImplementedError`. It composes Phase A's `corpus.iter_rows` (encode + score) with the Phase B-equivalent `recommend.pick_target_vmaf` predicate (smallest CRF whose VMAF clears the target) over the canonical 5-point CRF sweep `DEFAULT_SAMPLER_CRF_SWEEP = (18, 23, 28, 33, 38)` at the codec adapter's mid-range preset (`"medium"` for libx264 / libx265 / libsvtav1). The `SamplerFn` seam stays open. Callers needing a finer grid, a Bayesian bisect, or a precomputed corpus stream pass an explicit `sampler=`. Tests stub `iter_rows` via `monkeypatch.setattr`; no live ffmpeg / vmaf binaries are required. Closes the Phase B/E gap left by ADR-0295. The original raise docstring claimed PR #347 was Phase B's bisect — it was not (PR #347 shipped the `fr_regressor_v2` codec-aware scaffold). The actual Phase B-equivalent (`recommend.pick_target_vmaf` + `corpus.iter_rows`) shipped via ADR-0306, so the missing piece is a small composition. ADR-0307 + Research-0079 + AGENTS.md invariant + rebase-notes §0307 + changelog fragment land in the same PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3b899f4 to
add4c69
Compare
There was a problem hiding this comment.
Pull request overview
This PR wires vmaf-tune Phase E’s ladder default sampler so build_ladder() can run on the nominal path by composing Phase A corpus row generation (corpus.iter_rows) with the Phase B-equivalent predicate (recommend.pick_target_vmaf), and adds stubbed unit tests plus supporting docs/ADR/changelog entries.
Changes:
- Implement
_default_samplerinvmaftune/ladder.pyand introduceDEFAULT_SAMPLER_CRF_SWEEP. - Add unit tests validating default-sampler wiring and error propagation without requiring ffmpeg/libvmaf binaries.
- Add ADR/research/changelog/rebase-notes + update
tools/vmaf-tune/AGENTS.mdinvariants to document the new default behavior.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/vmaf-tune/src/vmaftune/ladder.py | Implements the default sampler and documents its intended composition with corpus + recommend. |
| tools/vmaf-tune/tests/test_ladder.py | Adds tests for default sampler wiring and failure behavior via monkeypatched iter_rows. |
| tools/vmaf-tune/AGENTS.md | Updates Phase E sampler invariant documentation to describe the new default. |
| docs/research/0079-vmaf-tune-ladder-default-sampler.md | Research digest describing the wiring rationale and sweep choice. |
| docs/adr/0307-vmaf-tune-ladder-default-sampler.md | ADR capturing the decision to wire the default sampler and use a fixed sweep. |
| docs/adr/README.md | Adds ADR-0307 to the ADR index table. |
| docs/rebase-notes.md | Adds rebase note entry for ADR-0307 changes. |
| changelog.d/changed/vmaf-tune-ladder-default-sampler.md | Changelog fragment announcing the default sampler wiring. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+112
to
+115
| pick the (preset_default, CRF) row whose VMAF is closest to | ||
| ``target_vmaf`` over the canonical 5-point CRF sweep | ||
| ``(18, 23, 28, 33, 38)`` (ADR-0307, Research-0079). Tests inject a | ||
| stub via ``sampler=`` to avoid live encoder runs. |
Comment on lines
+128
to
+133
| # Canonical 5-point CRF sweep used by the default sampler (ADR-0307). | ||
| # Spans the perceptually-informative range for libx264; non-x264 | ||
| # adapters validate the points against their own ``quality_range`` | ||
| # inside ``corpus.iter_rows``. Callers needing a finer grid pass an | ||
| # explicit ``sampler=`` to ``build_ladder``. | ||
| DEFAULT_SAMPLER_CRF_SWEEP: tuple[int, ...] = (18, 23, 28, 33, 38) |
Comment on lines
+163
to
+166
| smallest-CRF-meeting-target predicate). The JSONL corpus is | ||
| written to a tempfile that's discarded after the call returns; the | ||
| encode-side temp dir lives under the same prefix and is cleaned up | ||
| on exit. |
Comment on lines
+171
to
+174
| :func:`vmaftune.corpus.iter_rows` and picks the row closest to | ||
| ``target_vmaf`` via :func:`vmaftune.recommend.pick_target_vmaf`. | ||
| Live encoder runs are stubbed by monkeypatching ``iter_rows`` to | ||
| yield synthetic rows. |
Comment on lines
+76
to
+78
| adapter rejects a candidate via `validate(preset, crf)`, the row's | ||
| `exit_status != 0` and the recommend filter drops it — the sampler | ||
| gracefully degrades. |
Comment on lines
+39
to
+52
| `recommend.pick_target_vmaf` over a fixed 5-point CRF sweep | ||
| `DEFAULT_SAMPLER_CRF_SWEEP = (18, 23, 28, 33, 38)` at the codec | ||
| adapter's mid-range preset (`"medium"` for libx264 / libx265 / | ||
| libsvtav1; otherwise the midpoint of the adapter's `presets` tuple). | ||
| The corpus is written to a `tempfile.TemporaryDirectory`-scoped JSONL | ||
| that is discarded after the sampler returns; encoded outputs land in | ||
| the same temp tree so cleanup is automatic. | ||
|
|
||
| The 5-point sweep covers the perceptually-informative range for x264 | ||
| (CRF 18 is near-transparent on most content; CRF 38 is firmly | ||
| distorted) at the same number of probes as the canonical CRF coarse | ||
| pass in [ADR-0306](0306-vmaf-tune-coarse-to-fine.md). Five encodes is | ||
| the wall-time budget Phase E's downstream sizing already assumes per | ||
| `(resolution, target_vmaf)` cell. |
Comment on lines
+43
to
+45
| The corpus is written to a `tempfile.TemporaryDirectory`-scoped JSONL | ||
| that is discarded after the sampler returns; encoded outputs land in | ||
| the same temp tree so cleanup is automatic. |
Comment on lines
+9
to
+11
| adapter's mid-range preset (`"medium"` for libx264 / libx265 / | ||
| libsvtav1). The `SamplerFn` seam stays open: callers needing a | ||
| finer grid or a non-CRF-based search pass an explicit `sampler=`. |
Comment on lines
311
to
313
| | [ADR-0253](0253-speed-qa-extractor.md) | Defer SpEED-QA full-reference reduction. Closes the user's 2026-04-21 deep-research queued track. The fork keeps `speed_chroma` / `speed_temporal` (PR #213, port of upstream `d3647c73`) as research-stage extractors gated behind `-Denable_float=true`; does not add a `speed_qa` reduction; does not register a SpEED-driven model. Rationale: SpEED-QA overlaps `vif` substantially (both GSM-prior divisive-normalisation entropy estimators), the "speed" headline inverts on the fork's AVX-512 / CUDA / SYCL VIF stack, and the assumed `model/speed_4_v0.6.0.json` upstream binary does not exist — there is no Netflix artefact to mirror. Reversible on three named triggers (Netflix lands a model JSON consuming SpEED features; explicit user request; FUNQUE+ / pVMAF / tiny-AI fusion model names SpEED-QA as load-bearing input). Companion research digest: [`docs/research/0051-speed-qa-feasibility.md`](../research/0051-speed-qa-feasibility.md). | Proposed | metrics, research, feature-extractor, roadmap | | ||
| | [ADR-0255](0253-fastdvdnet-pre-real-weights.md) | T6-7b — FastDVDnet temporal pre-filter real upstream weights drop. Replaces the [ADR-0215](0215-fastdvdnet-pre-filter.md) smoke-only placeholder ONNX with the verbatim trained checkpoint from upstream `m-tassano/fastdvdnet` (commit `c8fdf61`, MIT) wrapped by a `LumaAdapter` PyTorch module that preserves the C-side luma `[1, 5, H, W]` → `[1, 1, H, W]` contract: each luma plane is `Concat`-tiled into RGB (`Y → [Y, Y, Y]`) to match upstream's 15-channel input, a constant `sigma = 25/255` noise map (upstream's reference inference level) is broadcast via `ones_like(centre) * sigma`, and the upstream RGB output is collapsed back to luma using BT.601 weights (`Y = 0.299 R + 0.587 G + 0.114 B`). Every `nn.PixelShuffle` instance in upstream's UpBlock is swapped pre-export for an allowlist-safe `Reshape`/`Transpose`/`Reshape` decomposition (zero learned params → numerically identical, verified `< 1e-6` max-abs diff between upstream PyTorch and exported ONNX); `DepthToSpace` deliberately stays off the op allowlist. Shipped graph uses only allowlisted ops. Registry row flips `smoke: false` with `license: MIT`, upstream commit pin, and refreshed `sha256`; sidecar JSON + doc `docs/ai/models/fastdvdnet_pre.md` carry full provenance. New `ai/scripts/export_fastdvdnet_pre.py` (replaces the `_placeholder.py` exporter — kept for reference). 9.5 MiB ONNX (well under the 50 MiB DNN size cap). Luma-native retrain tracked as T6-7c follow-up; INT8 PTQ tracked as T6-7d follow-up. | Accepted | ai, dnn, feature-extractor, wave-1, weights-drop, fork-local | | ||
| | [ADR-0306](0306-vmaf-tune-coarse-to-fine.md) | `vmaf-tune corpus --coarse-to-fine` and a new `vmaf-tune recommend` subcommand replace the 52-encode full-grid sweep with a 2-pass coarse-then-fine search. Defaults: `coarse_step=10` over `[10..50]` (5 points) + `fine_radius=5 step=1` around best-coarse (up to 10 points) = 15 visited encodes per (source, preset) → 3.46× wall-time speedup vs full grid. 1-pass shortcut when the highest-CRF coarse point already meets `--target-vmaf` skips refinement entirely (~10× speedup). Builds on [ADR-0237](0237-quality-aware-encode-automation.md) (Phase A harness); no JSONL schema bump (visited rows use existing `SCHEMA_VERSION=1`). Widens the libx264 adapter `quality_range` from the old `(15, 40)` informative window to the codec's nominal `(0, 51)` so the search domain matches the user's CLI. | Accepted | tooling, automation, vmaf-tune, ffmpeg, fork-local | |
Comment on lines
+179
to
+182
| adapter's mid-range preset (`"medium"` for libx264 / libx265 / | ||
| libsvtav1). The 5-point sweep is the load-bearing default; do not | ||
| widen it without an ADR-0307 follow-up — Phase E callers downstream | ||
| size their wall-time budget against five encodes per |
9 tasks
lusoris
added a commit
that referenced
this pull request
May 6, 2026
The 2026-05-06 merge train shipped 13 ADRs whose implementing PRs landed but Status was never bumped from Proposed to Accepted. Per docs/adr/README.md and ADR-0028, ADRs flip to Accepted once the deliverable lands. The train moved faster than the per-ADR Status edits could keep up; this PR catches up. Flipped: - ADR-0302 (#401, ENCODER_VOCAB v3 schema expansion) - ADR-0303 (#399, fr_regressor_v2 ensemble prod-flip gate) - ADR-0304 (#402, vmaf-tune fast-path Optuna TPE) - ADR-0305 (#400, knob-sweep Pareto analysis scaffold) - ADR-0307 (#404, vmaf-tune ladder default sampler) - ADR-0308 (#406, knob-sweep recipe-regression policy) - ADR-0309 (#405, ensemble retrain harness) - ADR-0311 (#408, libfuzzer harness expansion) - ADR-0313 (#410, CI Required Checks Aggregator) [table-format Status, sed-edited inline] - ADR-0314 (#412, vmaf-tune --score-backend=vulkan) - ADR-0316 (#414, cli_parse long-only-option assertion fix) - ADR-0317 (#415, CI Docker + FFmpeg-SYCL flake fix) - ADR-0319 (#422, ensemble LOSO trainer real impl) Already-Accepted (no change): ADR-0310 (#407), ADR-0312 (#425), ADR-0315 (skeleton, intentionally Proposed), ADR-0321 (#424).
9 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tools/vmaf-tune/src/vmaftune/ladder.py::_default_samplerto composecorpus.iter_rows(Phase A encode + score) withrecommend.pick_target_vmaf(Phase B-equivalent smallest-CRF-clearing-target predicate).build_ladder()/build_and_emit()no longer raiseNotImplementedErroron the documented happy path.DEFAULT_SAMPLER_CRF_SWEEP = (18, 23, 28, 33, 38)at the codec adapter's mid-range preset ("medium"for libx264 / libx265 / libsvtav1). Mirrors the ADR-0306 coarse-pass cardinality so wall-time sizing carries over.SamplerFnseam stays open — callers needing a finer grid or a non-CRF predicate pass an explicitsampler=. Tests stubiter_rowsviamonkeypatch.setattr; no live ffmpeg / vmaf binaries are required.Why now
ADR-0295 shipped Phase E with a guard claiming PR #347 was Phase B's bisect — that was wrong. PR #347 actually shipped the
fr_regressor_v2 codec-aware scaffold. The Phase B-equivalent predicate (recommend.pick_target_vmaf) shipped under ADR-0306; the missing piece was a small composition. This PR closes that gap.Six deliverables (per ADR-0108)
docs/research/0079-vmaf-tune-ladder-default-sampler.md— gap analysis (Phase B-equivalent already shipped; the raise was stale), 5-point sweep rationale, alternatives.docs/adr/0307-vmaf-tune-ladder-default-sampler.md§Alternatives considered — three rows (5-point fixed sweep / 7-point fixed sweep / adaptive bisect).tools/vmaf-tune/AGENTS.md— "Phase E sampler is pluggable; default is a 5-point CRF sweep (ADR-0307)" pinning the sweep + override semantics.changelog.d/changed/vmaf-tune-ladder-default-sampler.md.docs/rebase-notes.md§0307 —tools/vmaf-tune/is fork-introduced (ADR-0237); zero upstream interaction.Test plan
pytest tools/vmaf-tune/tests/test_ladder.py -v— 17 passed (3 new tests added: default-sampler wiring, build_ladder no-longer-raises, no-scorable-rows error path).black src tests/ruff check src tests/isort src tests— all clean insidetools/vmaf-tune/.tools/vmaf-tunetest suite pre/post diff: net +2 passing tests, 43 pre-existing failures unchanged.Constraints honoured
feat/vmaf-tune-ladder-default-sampler; no force-push.ladder.py, ~92 intest_ladder.py.🤖 Generated with Claude Code