Skip to content

feat(tools): vmaf-tune ladder — wire default sampler (close Phase B/E gap)#404

Merged
lusoris merged 1 commit intomasterfrom
feat/vmaf-tune-ladder-default-sampler
May 6, 2026
Merged

feat(tools): vmaf-tune ladder — wire default sampler (close Phase B/E gap)#404
lusoris merged 1 commit intomasterfrom
feat/vmaf-tune-ladder-default-sampler

Conversation

@lusoris
Copy link
Copy Markdown
Owner

@lusoris lusoris commented May 5, 2026

Summary

  • Wires tools/vmaf-tune/src/vmaftune/ladder.py::_default_sampler to compose corpus.iter_rows (Phase A encode + score) with recommend.pick_target_vmaf (Phase B-equivalent smallest-CRF-clearing-target predicate). build_ladder() / build_and_emit() no longer raise NotImplementedError on the documented happy path.
  • Default sweep is the canonical 5-point CRF set DEFAULT_SAMPLER_CRF_SWEEP = (18, 23, 28, 33, 38) at the codec adapter's mid-range preset ("medium" for libx264 / libx265 / libsvtav1). Mirrors the ADR-0306 coarse-pass cardinality so wall-time sizing carries over.
  • The SamplerFn seam stays open — callers needing a finer grid or a non-CRF predicate pass an explicit sampler=. Tests stub iter_rows via monkeypatch.setattr; no live ffmpeg / vmaf binaries are required.

Why now

ADR-0295 shipped Phase E with a guard claiming PR #347 was Phase B's bisect — that was wrong. PR #347 actually shipped the fr_regressor_v2 codec-aware scaffold. The Phase B-equivalent predicate (recommend.pick_target_vmaf) shipped under ADR-0306; the missing piece was a small composition. This PR closes that gap.

Six deliverables (per ADR-0108)

Test plan

  • pytest tools/vmaf-tune/tests/test_ladder.py -v — 17 passed (3 new tests added: default-sampler wiring, build_ladder no-longer-raises, no-scorable-rows error path).
  • black src tests / ruff check src tests / isort src tests — all clean inside tools/vmaf-tune/.
  • Pre-commit hooks pass (clang-format / black / isort / ruff / commit-msg / semgrep / etc.).
  • Full tools/vmaf-tune test suite pre/post diff: net +2 passing tests, 43 pre-existing failures unchanged.

Constraints honoured

  • No Netflix golden assertions touched (CLAUDE rule 1).
  • Single commit on feat/vmaf-tune-ladder-default-sampler; no force-push.
  • Total LOC under 500 (excl. docs): ~120 added in ladder.py, ~92 in test_ladder.py.
  • ENCODER_VOCAB / ensemble surfaces untouched.

🤖 Generated with Claude Code

lusoris added a commit that referenced this pull request May 5, 2026
…(ADR-0313) (#410)

* ci(policy): Required Checks Aggregator — unblock doc/Python-only PRs (ADR-0313)

The 23-named-required-check posture (ADR-0037) deadlocks doc/Python-only
PRs: the C-build matrix path-filter-skips on their diffs, but branch
protection counts a path-filter-skip + a never-ran-at-all as not
satisfying the required-check. PR #400 hit this concretely (10/23
succeeded; 13/23 either skipped or never reported; gh pr merge returned
"the base branch policy prohibits the merge").

Aggregator is one workflow with no path filter. It polls up to 8 minutes
for sibling workflows to register, then verifies each named check on the
head SHA reported success/skipped/neutral (or didn't appear at all,
which is the documented path-filter rejection semantics). Aggregator
becomes the single branch-protection required check; the 23 individual
workflows continue to run unchanged.

Manual operator step at adoption (after this PR merges):

  gh api -X PUT "repos/lusoris/vmaf/branches/master/protection/required_status_checks" \
    -F 'strict=true' -F 'contexts=["Required Checks Aggregator"]'

Unblocks #400, #403, #404, #405, #406, #407 currently stuck on the
deadlock. Per user popup direction 2026-05-05.

Files: .github/workflows/required-aggregator.yml (new),
docs/adr/0313-*.md (new), changelog.d/added/*.md (new),
docs/adr/README.md (+1 row), docs/adr/_index_fragments/_order.txt
(+1 line + new fragment), docs/rebase-notes.md §0313.

* ci: retrigger after PR body cleanup

* ci: retrigger after deliverables opt-out polarity fix

---------

Co-authored-by: Lusoris <lusoris@pm.me>
@lusoris lusoris marked this pull request as ready for review May 6, 2026 01:06
Copilot AI review requested due to automatic review settings May 6, 2026 01:06
… gap)

`tools/vmaf-tune/src/vmaftune/ladder.py::_default_sampler` no longer
raises `NotImplementedError`. It composes Phase A's
`corpus.iter_rows` (encode + score) with the Phase B-equivalent
`recommend.pick_target_vmaf` predicate (smallest CRF whose VMAF clears
the target) over the canonical 5-point CRF sweep
`DEFAULT_SAMPLER_CRF_SWEEP = (18, 23, 28, 33, 38)` at the codec
adapter's mid-range preset (`"medium"` for libx264 / libx265 /
libsvtav1).

The `SamplerFn` seam stays open. Callers needing a finer grid, a
Bayesian bisect, or a precomputed corpus stream pass an explicit
`sampler=`. Tests stub `iter_rows` via `monkeypatch.setattr`; no live
ffmpeg / vmaf binaries are required.

Closes the Phase B/E gap left by ADR-0295. The original raise
docstring claimed PR #347 was Phase B's bisect — it was not (PR #347
shipped the `fr_regressor_v2` codec-aware scaffold). The actual
Phase B-equivalent (`recommend.pick_target_vmaf` + `corpus.iter_rows`)
shipped via ADR-0306, so the missing piece is a small composition.

ADR-0307 + Research-0079 + AGENTS.md invariant + rebase-notes §0307
+ changelog fragment land in the same PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lusoris lusoris force-pushed the feat/vmaf-tune-ladder-default-sampler branch from 3b899f4 to add4c69 Compare May 6, 2026 01:07
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR wires vmaf-tune Phase E’s ladder default sampler so build_ladder() can run on the nominal path by composing Phase A corpus row generation (corpus.iter_rows) with the Phase B-equivalent predicate (recommend.pick_target_vmaf), and adds stubbed unit tests plus supporting docs/ADR/changelog entries.

Changes:

  • Implement _default_sampler in vmaftune/ladder.py and introduce DEFAULT_SAMPLER_CRF_SWEEP.
  • Add unit tests validating default-sampler wiring and error propagation without requiring ffmpeg/libvmaf binaries.
  • Add ADR/research/changelog/rebase-notes + update tools/vmaf-tune/AGENTS.md invariants to document the new default behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tools/vmaf-tune/src/vmaftune/ladder.py Implements the default sampler and documents its intended composition with corpus + recommend.
tools/vmaf-tune/tests/test_ladder.py Adds tests for default sampler wiring and failure behavior via monkeypatched iter_rows.
tools/vmaf-tune/AGENTS.md Updates Phase E sampler invariant documentation to describe the new default.
docs/research/0079-vmaf-tune-ladder-default-sampler.md Research digest describing the wiring rationale and sweep choice.
docs/adr/0307-vmaf-tune-ladder-default-sampler.md ADR capturing the decision to wire the default sampler and use a fixed sweep.
docs/adr/README.md Adds ADR-0307 to the ADR index table.
docs/rebase-notes.md Adds rebase note entry for ADR-0307 changes.
changelog.d/changed/vmaf-tune-ladder-default-sampler.md Changelog fragment announcing the default sampler wiring.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +112 to +115
pick the (preset_default, CRF) row whose VMAF is closest to
``target_vmaf`` over the canonical 5-point CRF sweep
``(18, 23, 28, 33, 38)`` (ADR-0307, Research-0079). Tests inject a
stub via ``sampler=`` to avoid live encoder runs.
Comment on lines +128 to +133
# Canonical 5-point CRF sweep used by the default sampler (ADR-0307).
# Spans the perceptually-informative range for libx264; non-x264
# adapters validate the points against their own ``quality_range``
# inside ``corpus.iter_rows``. Callers needing a finer grid pass an
# explicit ``sampler=`` to ``build_ladder``.
DEFAULT_SAMPLER_CRF_SWEEP: tuple[int, ...] = (18, 23, 28, 33, 38)
Comment on lines +163 to +166
smallest-CRF-meeting-target predicate). The JSONL corpus is
written to a tempfile that's discarded after the call returns; the
encode-side temp dir lives under the same prefix and is cleaned up
on exit.
Comment on lines +171 to +174
:func:`vmaftune.corpus.iter_rows` and picks the row closest to
``target_vmaf`` via :func:`vmaftune.recommend.pick_target_vmaf`.
Live encoder runs are stubbed by monkeypatching ``iter_rows`` to
yield synthetic rows.
Comment on lines +76 to +78
adapter rejects a candidate via `validate(preset, crf)`, the row's
`exit_status != 0` and the recommend filter drops it — the sampler
gracefully degrades.
Comment on lines +39 to +52
`recommend.pick_target_vmaf` over a fixed 5-point CRF sweep
`DEFAULT_SAMPLER_CRF_SWEEP = (18, 23, 28, 33, 38)` at the codec
adapter's mid-range preset (`"medium"` for libx264 / libx265 /
libsvtav1; otherwise the midpoint of the adapter's `presets` tuple).
The corpus is written to a `tempfile.TemporaryDirectory`-scoped JSONL
that is discarded after the sampler returns; encoded outputs land in
the same temp tree so cleanup is automatic.

The 5-point sweep covers the perceptually-informative range for x264
(CRF 18 is near-transparent on most content; CRF 38 is firmly
distorted) at the same number of probes as the canonical CRF coarse
pass in [ADR-0306](0306-vmaf-tune-coarse-to-fine.md). Five encodes is
the wall-time budget Phase E's downstream sizing already assumes per
`(resolution, target_vmaf)` cell.
Comment on lines +43 to +45
The corpus is written to a `tempfile.TemporaryDirectory`-scoped JSONL
that is discarded after the sampler returns; encoded outputs land in
the same temp tree so cleanup is automatic.
Comment on lines +9 to +11
adapter's mid-range preset (`"medium"` for libx264 / libx265 /
libsvtav1). The `SamplerFn` seam stays open: callers needing a
finer grid or a non-CRF-based search pass an explicit `sampler=`.
Comment thread docs/adr/README.md
Comment on lines 311 to 313
| [ADR-0253](0253-speed-qa-extractor.md) | Defer SpEED-QA full-reference reduction. Closes the user's 2026-04-21 deep-research queued track. The fork keeps `speed_chroma` / `speed_temporal` (PR #213, port of upstream `d3647c73`) as research-stage extractors gated behind `-Denable_float=true`; does not add a `speed_qa` reduction; does not register a SpEED-driven model. Rationale: SpEED-QA overlaps `vif` substantially (both GSM-prior divisive-normalisation entropy estimators), the "speed" headline inverts on the fork's AVX-512 / CUDA / SYCL VIF stack, and the assumed `model/speed_4_v0.6.0.json` upstream binary does not exist — there is no Netflix artefact to mirror. Reversible on three named triggers (Netflix lands a model JSON consuming SpEED features; explicit user request; FUNQUE+ / pVMAF / tiny-AI fusion model names SpEED-QA as load-bearing input). Companion research digest: [`docs/research/0051-speed-qa-feasibility.md`](../research/0051-speed-qa-feasibility.md). | Proposed | metrics, research, feature-extractor, roadmap |
| [ADR-0255](0253-fastdvdnet-pre-real-weights.md) | T6-7b — FastDVDnet temporal pre-filter real upstream weights drop. Replaces the [ADR-0215](0215-fastdvdnet-pre-filter.md) smoke-only placeholder ONNX with the verbatim trained checkpoint from upstream `m-tassano/fastdvdnet` (commit `c8fdf61`, MIT) wrapped by a `LumaAdapter` PyTorch module that preserves the C-side luma `[1, 5, H, W]` → `[1, 1, H, W]` contract: each luma plane is `Concat`-tiled into RGB (`Y → [Y, Y, Y]`) to match upstream's 15-channel input, a constant `sigma = 25/255` noise map (upstream's reference inference level) is broadcast via `ones_like(centre) * sigma`, and the upstream RGB output is collapsed back to luma using BT.601 weights (`Y = 0.299 R + 0.587 G + 0.114 B`). Every `nn.PixelShuffle` instance in upstream's UpBlock is swapped pre-export for an allowlist-safe `Reshape`/`Transpose`/`Reshape` decomposition (zero learned params → numerically identical, verified `< 1e-6` max-abs diff between upstream PyTorch and exported ONNX); `DepthToSpace` deliberately stays off the op allowlist. Shipped graph uses only allowlisted ops. Registry row flips `smoke: false` with `license: MIT`, upstream commit pin, and refreshed `sha256`; sidecar JSON + doc `docs/ai/models/fastdvdnet_pre.md` carry full provenance. New `ai/scripts/export_fastdvdnet_pre.py` (replaces the `_placeholder.py` exporter — kept for reference). 9.5 MiB ONNX (well under the 50 MiB DNN size cap). Luma-native retrain tracked as T6-7c follow-up; INT8 PTQ tracked as T6-7d follow-up. | Accepted | ai, dnn, feature-extractor, wave-1, weights-drop, fork-local |
| [ADR-0306](0306-vmaf-tune-coarse-to-fine.md) | `vmaf-tune corpus --coarse-to-fine` and a new `vmaf-tune recommend` subcommand replace the 52-encode full-grid sweep with a 2-pass coarse-then-fine search. Defaults: `coarse_step=10` over `[10..50]` (5 points) + `fine_radius=5 step=1` around best-coarse (up to 10 points) = 15 visited encodes per (source, preset) → 3.46× wall-time speedup vs full grid. 1-pass shortcut when the highest-CRF coarse point already meets `--target-vmaf` skips refinement entirely (~10× speedup). Builds on [ADR-0237](0237-quality-aware-encode-automation.md) (Phase A harness); no JSONL schema bump (visited rows use existing `SCHEMA_VERSION=1`). Widens the libx264 adapter `quality_range` from the old `(15, 40)` informative window to the codec's nominal `(0, 51)` so the search domain matches the user's CLI. | Accepted | tooling, automation, vmaf-tune, ffmpeg, fork-local |
Comment thread tools/vmaf-tune/AGENTS.md
Comment on lines +179 to +182
adapter's mid-range preset (`"medium"` for libx264 / libx265 /
libsvtav1). The 5-point sweep is the load-bearing default; do not
widen it without an ADR-0307 follow-up — Phase E callers downstream
size their wall-time budget against five encodes per
@lusoris lusoris merged commit bd690d8 into master May 6, 2026
55 checks passed
@lusoris lusoris deleted the feat/vmaf-tune-ladder-default-sampler branch May 6, 2026 01:31
lusoris added a commit that referenced this pull request May 6, 2026
The 2026-05-06 merge train shipped 13 ADRs whose implementing PRs
landed but Status was never bumped from Proposed to Accepted. Per
docs/adr/README.md and ADR-0028, ADRs flip to Accepted once the
deliverable lands. The train moved faster than the per-ADR Status
edits could keep up; this PR catches up.

Flipped:
- ADR-0302 (#401, ENCODER_VOCAB v3 schema expansion)
- ADR-0303 (#399, fr_regressor_v2 ensemble prod-flip gate)
- ADR-0304 (#402, vmaf-tune fast-path Optuna TPE)
- ADR-0305 (#400, knob-sweep Pareto analysis scaffold)
- ADR-0307 (#404, vmaf-tune ladder default sampler)
- ADR-0308 (#406, knob-sweep recipe-regression policy)
- ADR-0309 (#405, ensemble retrain harness)
- ADR-0311 (#408, libfuzzer harness expansion)
- ADR-0313 (#410, CI Required Checks Aggregator) [table-format Status, sed-edited inline]
- ADR-0314 (#412, vmaf-tune --score-backend=vulkan)
- ADR-0316 (#414, cli_parse long-only-option assertion fix)
- ADR-0317 (#415, CI Docker + FFmpeg-SYCL flake fix)
- ADR-0319 (#422, ensemble LOSO trainer real impl)

Already-Accepted (no change): ADR-0310 (#407), ADR-0312 (#425),
ADR-0315 (skeleton, intentionally Proposed), ADR-0321 (#424).
@lusoris lusoris mentioned this pull request May 6, 2026
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants