feat(tools): vmaf-tune ladder — wire default sampler (close Phase B/E gap) by lusoris · Pull Request #404 · lusoris/vmaf

lusoris · 2026-05-05T20:31:53Z

Summary

Wires tools/vmaf-tune/src/vmaftune/ladder.py::_default_sampler to compose corpus.iter_rows (Phase A encode + score) with recommend.pick_target_vmaf (Phase B-equivalent smallest-CRF-clearing-target predicate). build_ladder() / build_and_emit() no longer raise NotImplementedError on the documented happy path.
Default sweep is the canonical 5-point CRF set DEFAULT_SAMPLER_CRF_SWEEP = (18, 23, 28, 33, 38) at the codec adapter's mid-range preset ("medium" for libx264 / libx265 / libsvtav1). Mirrors the ADR-0306 coarse-pass cardinality so wall-time sizing carries over.
The SamplerFn seam stays open — callers needing a finer grid or a non-CRF predicate pass an explicit sampler=. Tests stub iter_rows via monkeypatch.setattr; no live ffmpeg / vmaf binaries are required.

Why now

ADR-0295 shipped Phase E with a guard claiming PR #347 was Phase B's bisect — that was wrong. PR #347 actually shipped the fr_regressor_v2 codec-aware scaffold. The Phase B-equivalent predicate (recommend.pick_target_vmaf) shipped under ADR-0306; the missing piece was a small composition. This PR closes that gap.

Six deliverables (per ADR-0108)

(1) Research digest: docs/research/0079-vmaf-tune-ladder-default-sampler.md — gap analysis (Phase B-equivalent already shipped; the raise was stale), 5-point sweep rationale, alternatives.
(2) Decision matrix: docs/adr/0307-vmaf-tune-ladder-default-sampler.md §Alternatives considered — three rows (5-point fixed sweep / 7-point fixed sweep / adaptive bisect).
(3) AGENTS.md invariant note: tools/vmaf-tune/AGENTS.md — "Phase E sampler is pluggable; default is a 5-point CRF sweep (ADR-0307)" pinning the sweep + override semantics.
(4) Reproducer / smoke-test command:
```
pytest tools/vmaf-tune/tests/test_ladder.py -v
```
(17 tests pass, all stubbed — no live encoder runs.)
(5) CHANGELOG fragment: changelog.d/changed/vmaf-tune-ladder-default-sampler.md.
(6) Rebase note: docs/rebase-notes.md §0307 — tools/vmaf-tune/ is fork-introduced (ADR-0237); zero upstream interaction.

Test plan

pytest tools/vmaf-tune/tests/test_ladder.py -v — 17 passed (3 new tests added: default-sampler wiring, build_ladder no-longer-raises, no-scorable-rows error path).
black src tests / ruff check src tests / isort src tests — all clean inside tools/vmaf-tune/.
Pre-commit hooks pass (clang-format / black / isort / ruff / commit-msg / semgrep / etc.).
Full tools/vmaf-tune test suite pre/post diff: net +2 passing tests, 43 pre-existing failures unchanged.

Constraints honoured

No Netflix golden assertions touched (CLAUDE rule 1).
Single commit on feat/vmaf-tune-ladder-default-sampler; no force-push.
Total LOC under 500 (excl. docs): ~120 added in ladder.py, ~92 in test_ladder.py.
ENCODER_VOCAB / ensemble surfaces untouched.

🤖 Generated with Claude Code

…(ADR-0313) (#410) * ci(policy): Required Checks Aggregator — unblock doc/Python-only PRs (ADR-0313) The 23-named-required-check posture (ADR-0037) deadlocks doc/Python-only PRs: the C-build matrix path-filter-skips on their diffs, but branch protection counts a path-filter-skip + a never-ran-at-all as not satisfying the required-check. PR #400 hit this concretely (10/23 succeeded; 13/23 either skipped or never reported; gh pr merge returned "the base branch policy prohibits the merge"). Aggregator is one workflow with no path filter. It polls up to 8 minutes for sibling workflows to register, then verifies each named check on the head SHA reported success/skipped/neutral (or didn't appear at all, which is the documented path-filter rejection semantics). Aggregator becomes the single branch-protection required check; the 23 individual workflows continue to run unchanged. Manual operator step at adoption (after this PR merges): gh api -X PUT "repos/lusoris/vmaf/branches/master/protection/required_status_checks" \ -F 'strict=true' -F 'contexts=["Required Checks Aggregator"]' Unblocks #400, #403, #404, #405, #406, #407 currently stuck on the deadlock. Per user popup direction 2026-05-05. Files: .github/workflows/required-aggregator.yml (new), docs/adr/0313-*.md (new), changelog.d/added/*.md (new), docs/adr/README.md (+1 row), docs/adr/_index_fragments/_order.txt (+1 line + new fragment), docs/rebase-notes.md §0313. * ci: retrigger after PR body cleanup * ci: retrigger after deliverables opt-out polarity fix --------- Co-authored-by: Lusoris <lusoris@pm.me>

… gap) `tools/vmaf-tune/src/vmaftune/ladder.py::_default_sampler` no longer raises `NotImplementedError`. It composes Phase A's `corpus.iter_rows` (encode + score) with the Phase B-equivalent `recommend.pick_target_vmaf` predicate (smallest CRF whose VMAF clears the target) over the canonical 5-point CRF sweep `DEFAULT_SAMPLER_CRF_SWEEP = (18, 23, 28, 33, 38)` at the codec adapter's mid-range preset (`"medium"` for libx264 / libx265 / libsvtav1). The `SamplerFn` seam stays open. Callers needing a finer grid, a Bayesian bisect, or a precomputed corpus stream pass an explicit `sampler=`. Tests stub `iter_rows` via `monkeypatch.setattr`; no live ffmpeg / vmaf binaries are required. Closes the Phase B/E gap left by ADR-0295. The original raise docstring claimed PR #347 was Phase B's bisect — it was not (PR #347 shipped the `fr_regressor_v2` codec-aware scaffold). The actual Phase B-equivalent (`recommend.pick_target_vmaf` + `corpus.iter_rows`) shipped via ADR-0306, so the missing piece is a small composition. ADR-0307 + Research-0079 + AGENTS.md invariant + rebase-notes §0307 + changelog fragment land in the same PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR wires vmaf-tune Phase E’s ladder default sampler so build_ladder() can run on the nominal path by composing Phase A corpus row generation (corpus.iter_rows) with the Phase B-equivalent predicate (recommend.pick_target_vmaf), and adds stubbed unit tests plus supporting docs/ADR/changelog entries.

Changes:

Implement _default_sampler in vmaftune/ladder.py and introduce DEFAULT_SAMPLER_CRF_SWEEP.
Add unit tests validating default-sampler wiring and error propagation without requiring ffmpeg/libvmaf binaries.
Add ADR/research/changelog/rebase-notes + update tools/vmaf-tune/AGENTS.md invariants to document the new default behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
tools/vmaf-tune/src/vmaftune/ladder.py	Implements the default sampler and documents its intended composition with corpus + recommend.
tools/vmaf-tune/tests/test_ladder.py	Adds tests for default sampler wiring and failure behavior via monkeypatched `iter_rows`.
tools/vmaf-tune/AGENTS.md	Updates Phase E sampler invariant documentation to describe the new default.
docs/research/0079-vmaf-tune-ladder-default-sampler.md	Research digest describing the wiring rationale and sweep choice.
docs/adr/0307-vmaf-tune-ladder-default-sampler.md	ADR capturing the decision to wire the default sampler and use a fixed sweep.
docs/adr/README.md	Adds ADR-0307 to the ADR index table.
docs/rebase-notes.md	Adds rebase note entry for ADR-0307 changes.
changelog.d/changed/vmaf-tune-ladder-default-sampler.md	Changelog fragment announcing the default sampler wiring.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    pick the (preset_default, CRF) row whose VMAF is closest to
+    ``target_vmaf`` over the canonical 5-point CRF sweep
+    ``(18, 23, 28, 33, 38)`` (ADR-0307, Research-0079). Tests inject a
+    stub via ``sampler=`` to avoid live encoder runs.


+# Canonical 5-point CRF sweep used by the default sampler (ADR-0307).
+# Spans the perceptually-informative range for libx264; non-x264
+# adapters validate the points against their own ``quality_range``
+# inside ``corpus.iter_rows``. Callers needing a finer grid pass an
+# explicit ``sampler=`` to ``build_ladder``.
+DEFAULT_SAMPLER_CRF_SWEEP: tuple[int, ...] = (18, 23, 28, 33, 38)


+    smallest-CRF-meeting-target predicate). The JSONL corpus is
+    written to a tempfile that's discarded after the call returns; the
+    encode-side temp dir lives under the same prefix and is cleaned up
+    on exit.


+    :func:`vmaftune.corpus.iter_rows` and picks the row closest to
+    ``target_vmaf`` via :func:`vmaftune.recommend.pick_target_vmaf`.
+    Live encoder runs are stubbed by monkeypatching ``iter_rows`` to
+    yield synthetic rows.


+adapter rejects a candidate via `validate(preset, crf)`, the row's
+`exit_status != 0` and the recommend filter drops it — the sampler
+gracefully degrades.


+`recommend.pick_target_vmaf` over a fixed 5-point CRF sweep
+`DEFAULT_SAMPLER_CRF_SWEEP = (18, 23, 28, 33, 38)` at the codec
+adapter's mid-range preset (`"medium"` for libx264 / libx265 /
+libsvtav1; otherwise the midpoint of the adapter's `presets` tuple).
+The corpus is written to a `tempfile.TemporaryDirectory`-scoped JSONL
+that is discarded after the sampler returns; encoded outputs land in
+the same temp tree so cleanup is automatic.
+
+The 5-point sweep covers the perceptually-informative range for x264
+(CRF 18 is near-transparent on most content; CRF 38 is firmly
+distorted) at the same number of probes as the canonical CRF coarse
+pass in [ADR-0306](0306-vmaf-tune-coarse-to-fine.md). Five encodes is
+the wall-time budget Phase E's downstream sizing already assumes per
+`(resolution, target_vmaf)` cell.


+The corpus is written to a `tempfile.TemporaryDirectory`-scoped JSONL
+that is discarded after the sampler returns; encoded outputs land in
+the same temp tree so cleanup is automatic.


+  adapter's mid-range preset (`"medium"` for libx264 / libx265 /
+  libsvtav1). The `SamplerFn` seam stays open: callers needing a
+  finer grid or a non-CRF-based search pass an explicit `sampler=`.


 | [ADR-0253](0253-speed-qa-extractor.md) | Defer SpEED-QA full-reference reduction. Closes the user's 2026-04-21 deep-research queued track. The fork keeps `speed_chroma` / `speed_temporal` (PR #213, port of upstream `d3647c73`) as research-stage extractors gated behind `-Denable_float=true`; does not add a `speed_qa` reduction; does not register a SpEED-driven model. Rationale: SpEED-QA overlaps `vif` substantially (both GSM-prior divisive-normalisation entropy estimators), the "speed" headline inverts on the fork's AVX-512 / CUDA / SYCL VIF stack, and the assumed `model/speed_4_v0.6.0.json` upstream binary does not exist — there is no Netflix artefact to mirror. Reversible on three named triggers (Netflix lands a model JSON consuming SpEED features; explicit user request; FUNQUE+ / pVMAF / tiny-AI fusion model names SpEED-QA as load-bearing input). Companion research digest: [`docs/research/0051-speed-qa-feasibility.md`](../research/0051-speed-qa-feasibility.md). | Proposed | metrics, research, feature-extractor, roadmap |
 | [ADR-0255](0253-fastdvdnet-pre-real-weights.md) | T6-7b — FastDVDnet temporal pre-filter real upstream weights drop. Replaces the [ADR-0215](0215-fastdvdnet-pre-filter.md) smoke-only placeholder ONNX with the verbatim trained checkpoint from upstream `m-tassano/fastdvdnet` (commit `c8fdf61`, MIT) wrapped by a `LumaAdapter` PyTorch module that preserves the C-side luma `[1, 5, H, W]` → `[1, 1, H, W]` contract: each luma plane is `Concat`-tiled into RGB (`Y → [Y, Y, Y]`) to match upstream's 15-channel input, a constant `sigma = 25/255` noise map (upstream's reference inference level) is broadcast via `ones_like(centre) * sigma`, and the upstream RGB output is collapsed back to luma using BT.601 weights (`Y = 0.299 R + 0.587 G + 0.114 B`). Every `nn.PixelShuffle` instance in upstream's UpBlock is swapped pre-export for an allowlist-safe `Reshape`/`Transpose`/`Reshape` decomposition (zero learned params → numerically identical, verified `< 1e-6` max-abs diff between upstream PyTorch and exported ONNX); `DepthToSpace` deliberately stays off the op allowlist. Shipped graph uses only allowlisted ops. Registry row flips `smoke: false` with `license: MIT`, upstream commit pin, and refreshed `sha256`; sidecar JSON + doc `docs/ai/models/fastdvdnet_pre.md` carry full provenance. New `ai/scripts/export_fastdvdnet_pre.py` (replaces the `_placeholder.py` exporter — kept for reference). 9.5 MiB ONNX (well under the 50 MiB DNN size cap). Luma-native retrain tracked as T6-7c follow-up; INT8 PTQ tracked as T6-7d follow-up. | Accepted | ai, dnn, feature-extractor, wave-1, weights-drop, fork-local |
 | [ADR-0306](0306-vmaf-tune-coarse-to-fine.md) | `vmaf-tune corpus --coarse-to-fine` and a new `vmaf-tune recommend` subcommand replace the 52-encode full-grid sweep with a 2-pass coarse-then-fine search. Defaults: `coarse_step=10` over `[10..50]` (5 points) + `fine_radius=5 step=1` around best-coarse (up to 10 points) = 15 visited encodes per (source, preset) → 3.46× wall-time speedup vs full grid. 1-pass shortcut when the highest-CRF coarse point already meets `--target-vmaf` skips refinement entirely (~10× speedup). Builds on [ADR-0237](0237-quality-aware-encode-automation.md) (Phase A harness); no JSONL schema bump (visited rows use existing `SCHEMA_VERSION=1`). Widens the libx264 adapter `quality_range` from the old `(15, 40)` informative window to the codec's nominal `(0, 51)` so the search domain matches the user's CLI. | Accepted | tooling, automation, vmaf-tune, ffmpeg, fork-local |


+  adapter's mid-range preset (`"medium"` for libx264 / libx265 /
+  libsvtav1). The 5-point sweep is the load-bearing default; do not
+  widen it without an ADR-0307 follow-up — Phase E callers downstream
+  size their wall-time budget against five encodes per


The 2026-05-06 merge train shipped 13 ADRs whose implementing PRs landed but Status was never bumped from Proposed to Accepted. Per docs/adr/README.md and ADR-0028, ADRs flip to Accepted once the deliverable lands. The train moved faster than the per-ADR Status edits could keep up; this PR catches up. Flipped: - ADR-0302 (#401, ENCODER_VOCAB v3 schema expansion) - ADR-0303 (#399, fr_regressor_v2 ensemble prod-flip gate) - ADR-0304 (#402, vmaf-tune fast-path Optuna TPE) - ADR-0305 (#400, knob-sweep Pareto analysis scaffold) - ADR-0307 (#404, vmaf-tune ladder default sampler) - ADR-0308 (#406, knob-sweep recipe-regression policy) - ADR-0309 (#405, ensemble retrain harness) - ADR-0311 (#408, libfuzzer harness expansion) - ADR-0313 (#410, CI Required Checks Aggregator) [table-format Status, sed-edited inline] - ADR-0314 (#412, vmaf-tune --score-backend=vulkan) - ADR-0316 (#414, cli_parse long-only-option assertion fix) - ADR-0317 (#415, CI Docker + FFmpeg-SYCL flake fix) - ADR-0319 (#422, ensemble LOSO trainer real impl) Already-Accepted (no change): ADR-0310 (#407), ADR-0312 (#425), ADR-0315 (skeleton, intentionally Proposed), ADR-0321 (#424).

lusoris mentioned this pull request May 5, 2026

ci(policy): Required Checks Aggregator — unblock doc/Python-only PRs (ADR-0313) #410

Merged

10 tasks

lusoris marked this pull request as ready for review May 6, 2026 01:06

Copilot AI review requested due to automatic review settings May 6, 2026 01:06

lusoris force-pushed the feat/vmaf-tune-ladder-default-sampler branch from 3b899f4 to add4c69 Compare May 6, 2026 01:07

Copilot started reviewing on behalf of lusoris May 6, 2026 01:07 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

lusoris merged commit bd690d8 into master May 6, 2026
55 checks passed

lusoris deleted the feat/vmaf-tune-ladder-default-sampler branch May 6, 2026 01:31

lusoris mentioned this pull request May 6, 2026

docs(adr): bulk flip Proposed → Accepted for 13 merge-train ADRs #426

Merged

9 tasks

lusoris mentioned this pull request May 6, 2026

feat/vmaf tune score backend vulkan #436

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(tools): vmaf-tune ladder — wire default sampler (close Phase B/E gap)#404

feat(tools): vmaf-tune ladder — wire default sampler (close Phase B/E gap)#404
lusoris merged 1 commit intomasterfrom
feat/vmaf-tune-ladder-default-sampler

lusoris commented May 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lusoris commented May 5, 2026

Summary

Why now

Six deliverables (per ADR-0108)

Test plan

Constraints honoured

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants