research(tools): vmaf-tune capability audit — what else can it do? by lusoris · Pull Request #354 · lusoris/vmaf

lusoris · 2026-05-03T19:23:45Z

Summary

Pure-research scoping pass: audits the vmaf-tune capability surface
beyond Phase A (corpus tooling, PR #329 merged), Phase B
(fr_regressor_v2 codec-aware proxy, in flight via PR #347), and the
fast-path proposal (proxy + Bayesian + GPU verify, parallel PR being
scaffolded right now).

Surveys 18 capability buckets, ranks them by impact ÷ effort, and
recommends an execution order for Phases C–H. No code; the digest
extends Research-0044 and feeds ADR-0237's umbrella roadmap.

Top-3 next steps (impact ÷ effort): (1) bitrate-budget mode +
quality-floor mode (tied; both S effort, both reuse Phase B's
bisect with a different predicate), (2) bitrate-ladder optimisation
(Phase E, L effort, game-changer), (3) per-shot CRF tuning
(Phase D, on roadmap, M effort).

Game-changer: bitrate-ladder optimisation re-frames the fork from
"best open-source VMAF measurement" to "only open-source per-title
ladder generator with measured-PLCC proxy".

Biggest blocker: codec adapter coverage. Five buckets degrade to
x264-only until x265 / SVT-AV1 / libaom / libvvenc adapters land.
Recommend running adapter PRs in parallel with Phase D so multi-codec
capabilities don't all stack at the end.

Type

docs — documentation only

Checklist

Commits follow Conventional Commits.
No code touched (pure research digest).

Bug-status hygiene

no state delta: research scoping, no bug interaction.

Netflix golden-data gate

Did not modify any assertAlmostEqual score.

Deep-dive deliverables (ADR-0108)

(1) Research digest: docs/research/0054-vmaf-tune-capability-audit.md (this PR).
(2) Decision matrix
no alternatives: research-only scoping, no decision is being made here. Decisions land in the per-phase ADRs that will consume this digest.
(3) AGENTS.md invariant note
no rebase-sensitive invariants: docs-only.
(4) Reproducer / smoke-test command: see below under "Reproducer".
(5) CHANGELOG fragment
no CHANGELOG needed: research-only digest, not a user-visible change.
(6) Rebase note
no rebase impact: docs-only addition under docs/research/.

Reproducer

# Verify the digest renders and links resolve
mkdocs build --strict 2>&1 | grep -i "0054-vmaf-tune-capability-audit" || echo "renders clean"
# Or just:
ls docs/research/0054-vmaf-tune-capability-audit.md

Out of scope

No implementation.
No promised speedup numbers without back-of-envelope justification (every "X×" hypothesis is flagged as such in the digest).
No Netflix-internal data / services.

Adds the first sibling codec to the ADR-0237 Phase A `libx264` scaffold: a one-file `X265Adapter` mirroring the `x264.py` shape (10 presets including `placebo`, 0..51 CRF window, `profile_for(pix_fmt)` helper that maps `yuv420p10le` → `main10` for downstream HDR work). Registered under `libx265` in `codec_adapters/__init__.py`; CLI `--encoder` now accepts `libx264 | libx265` via `choices=list(known_codecs())`. `encode.parse_versions` gains an encoder-aware regex so corpus rows record `libx265-<version>` correctly (default remains `libx264` for backward compatibility). No `SCHEMA_VERSION` bump — the existing `encoder` row column already carries codec identity. Phase B/C consumers receive the new codec without any contract change. 14 new subprocess-mocked smoke tests under `tools/vmaf-tune/tests/test_codec_adapter_x265.py` (29 of 30 vmaf-tune tests pass green; the one skipped case is the real-binary integration test gated on `VMAF_TUNE_INTEGRATION=1`). Unblocks ADR-0235 codec-aware FR regressor and PR #354 audit's buckets #6 (bitrate-ladder), #7 (codec-comparison), #9 (HDR), #15 (Pareto). Six deep-dive deliverables (ADR-0108): 1. research digest: no digest needed — trivial mirror of `x264.py`; alternatives matrix is exhausted in ADR-0276. 2. decision matrix: ADR-0276 §Alternatives considered. 3. AGENTS.md invariant note: tools/vmaf-tune/AGENTS.md updated to document the per-codec banner-regex carve-out in `parse_versions` and the wired-codecs list (libx264, libx265). 4. reproducer: `python -m pytest tools/vmaf-tune/tests/`. 5. CHANGELOG: changelog.d/added/ADR-0276-vmaf-tune-x265-adapter.md. 6. rebase-notes: docs/rebase-notes.md entry 0228. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds tools/vmaf-tune/src/vmaftune/resolution.py exposing select_vmaf_model_version (height>=2160 -> vmaf_4k_v0.6.1, else vmaf_v0.6.1), select_vmaf_model (Path), and crf_offset_for_resolution (4K -2, 1080p 0, 720p +2, sub-720p +4). corpus.iter_rows auto-picks the model per encode resolution; CLI gains --resolution-aware / --no-resolution-aware (default on). Emitted JSONL row's vmaf_model field now records the *effective* model used per row, not the global option — required for mixed-ladder corpora to be unambiguous downstream. Closes PR #354 audit Bucket #8. Decision rule mirrors Netflix's published guidance. Refs: ADR-0237 (parent), ADR-0280 (this), Research-0054. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…al scaffold) Adds a probabilistic head on top of the codec-aware fr_regressor_v2 (parent: ADR-0272 / PR #347 in flight) so producers can drive the in-flight `vmaf-tune --quality-confidence 0.95` flag (ADR-0237) off a calibrated prediction interval instead of v2's bare MOS scalar. PR #354 audit Bucket #18 (top-3 ranked). Trainer (`ai/scripts/train_fr_regressor_v2_ensemble.py`) trains N=5 copies of the v2 architecture (`FRRegressor(num_codecs=NUM_CODECS)`) under distinct seeds, exports each as a separate two-input ONNX (`features [N, 6]` + `codec_onehot [N, NUM_CODECS]`), and writes an ensemble manifest sidecar that pins per-member sha256s, feature standardisation, codec vocab, nominal coverage, and an optional split-conformal residual quantile from a held-out calibration split. Inference rule is `mu ± q · σ` with `q = 1.96` (Gaussian) or the empirical conformal quantile (Vovk 2005, Romano 2019 — distribution-free marginal coverage on exchangeable data). Evaluator (`ai/scripts/eval_probabilistic_proxy.py`) reports empirical coverage at 50/80/95 % nominal levels, mean interval width, and the mean-prediction PLCC; reports the conformal-interval row when the manifest carries a conformal scalar. Smoke-only ship: synthetic 100-row corpus, 1 epoch / member. Production training is gated on the multi-codec Phase A corpus (T7-FR-REGRESSOR-V2-PROBABILISTIC). Six ADR-0108 deliverables: 1. Research digest: docs/research/0054-fr-regressor-v2-probabilistic.md. 2. Decision matrix: ADR-0279 § Alternatives considered. 3. AGENTS.md invariant note: appended to ai/AGENTS.md. 4. Reproducer: `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` followed by `python ai/scripts/eval_probabilistic_proxy.py --smoke`. 5. CHANGELOG ### Added entry under Unreleased — lusoris fork. 6. Rebase-notes entry: ### 0229 in docs/rebase-notes.md. Test plan: - `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` produces 5 valid two-input ONNX members + manifest sidecar (ran locally). - `python ai/scripts/eval_probabilistic_proxy.py --smoke` aggregates the 5 ONNX outputs into (mu, sigma) and reports coverage at 50/80/95 %. - `python ai/scripts/validate_model_registry.py` → 15 entries valid. - `pre-commit run --files <changed>` → Passed (black / isort / ruff / json-check / secrets / semgrep). - `markdownlint-cli2` on all new docs → 0 errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wires the fork-trained `saliency_student_v1` ONNX model (ADR-0286 / PR #359) into vmaf-tune so a single command can produce an encode that biases bits toward salient regions. New surfaces: - `tools/vmaf-tune/src/vmaftune/saliency.py` — pure-NumPy signal-blend pipeline (sample frames -> ImageNet-normalised RGB -> ONNX inference -> per-pixel saliency mean -> per-MB QP-offset map clamped to ±12 -> x264 ASCII qpfile). - `vmaf-tune recommend --saliency-aware [--saliency-offset -4] [--saliency-model PATH]` CLI subcommand. Falls back to a plain encode when onnxruntime or the model file is unavailable; the flag surface is wired so Phase B (target-VMAF bisect) can drop in without renaming flags. - 13 unit tests under `tests/test_saliency.py` mocking both the ONNX session and the encode runner — runs without onnxruntime or ffmpeg installed. Bucket #2 of the PR #354 audit. Six ADR-0108 deliverables: 1. Research-0046 (digest) 2. ADR-0287 §"Alternatives considered" (decision matrix) 3. tools/vmaf-tune/AGENTS.md (saliency invariant) 4. `pytest tools/vmaf-tune/tests/test_saliency.py -v` (smoke) 5. changelog.d/added/T-VMAF-TUNE-saliency-aware.md 6. docs/rebase-notes.md §0287 User docs: docs/usage/vmaf-tune.md §"Saliency-aware encoding". Decision: ADR-0287. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#9, ADR-0261) Closes Bucket #9 of the PR #354 vmaf-tune capability audit. Adds ffprobe-driven HDR detection, codec-specific HDR encode flag dispatch, and HDR-VMAF model resolution to the Phase A corpus driver. New module ``tools/vmaf-tune/src/vmaftune/hdr.py``: - ``detect_hdr(path)`` — runs ``ffprobe -show_streams -of json``, classifies the first video stream as PQ / HLG / SDR. Strict BT.2020-primaries gate so malformed signaling falls back to SDR. - ``hdr_codec_args(encoder, info)`` — per-encoder dispatch table covering libx264 (container ``-color_*``), libx265 (``-x265-params`` with master-display + max-cll + hdr10-opt), libsvtav1 (AV1 enums via ``-svtav1-params``), hevc_nvenc (``-pix_fmt p010le -profile:v main10``), libvvenc. - ``select_hdr_vmaf_model()`` — globs ``model/vmaf_hdr_*.json``; returns ``None`` when none shipped (current state — fork hasn't ported Netflix's HDR model yet). Corpus driver wiring: - ``CorpusOptions.hdr_mode`` ∈ {``auto``, ``force-sdr``, ``force-hdr-pq``, ``force-hdr-hlg``}; CLI flags ``--auto-hdr`` / ``--force-sdr`` / ``--force-hdr-pq`` / ``--force-hdr-hlg`` (mutually exclusive). Auto is the default. - New schema-v2 row keys ``hdr_transfer`` / ``hdr_primaries`` / ``hdr_forced``; ``SCHEMA_VERSION`` bumped 1 → 2. Phase B / C loaders treat missing keys as SDR (additive change, v1 rows remain readable). - ``score._model_arg`` now passes pre-formatted ``path=`` / ``version=`` strings through unchanged so the HDR model path can be injected via ``vmaf --model``. - HDR detected but no HDR model shipped → log warning, fall back to SDR model with notice that scores trend low. Tests (``tools/vmaf-tune/tests/test_hdr.py``, 21 cases): - detection: SDR / PQ / HLG / mismatched-primaries / missing-file / ffprobe-failure / invalid-JSON - codec dispatch: shape per encoder (x264, x265 PQ + HLG, SVT-AV1 PQ + HLG, NVENC HEVC, unknown encoder) - model resolution: empty dir / shipped / multi-version pick-latest / missing dir - corpus integration: end-to-end ``force-hdr-pq`` (verify HDR fields in row + ``-color_*`` in encode argv) and ``force-sdr`` ADR-0261 (Accepted, encode-side; HDR-VMAF scoring deferred until fork-local model port). Research-0054 digest, rebase-notes 0261, AGENTS.md invariant note, docs/usage/vmaf-tune.md HDR section, changelog fragment all included.

Survey 18 capability buckets beyond Phases A/B + the fast-path proposal. Rank by impact / effort. Recommend execution order for Phases C–H of the vmaf-tune umbrella (ADR-0237). Top-5 by impact-÷-effort: bitrate-budget mode, quality-floor mode, bitrate-ladder optimisation (Phase E, the game-changer), per-shot CRF (Phase D, on roadmap), probabilistic proxy. Biggest blocker called out: codec adapter coverage — five buckets degrade to x264-only until x265/SVT-AV1/libaom/libvvenc adapters land. Recommends opening the adapter stream in parallel with Phase D so multi-codec capabilities don't all stack at the end. Pure scoping pass — no code, no implementation. Numbers are back-of-envelope hypotheses; ADRs that consume this digest must re-validate against real corpora. Six deliverables (ADR-0108): - (1) digest = this file - (2) no decision matrix needed: research-only, no decision being made - (3) no rebase-sensitive invariants - (4) reproducer in PR description - (5) no CHANGELOG fragment needed: research-only - (6) no rebase impact: docs-only Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds a new research digest that audits the potential capability surface of vmaf-tune beyond Phases A/B and the proposed “fast path”, ranking 18 feature buckets by impact vs effort and proposing a Phase C–H execution order.

Changes:

Introduces a capability audit digest covering 18 potential vmaf-tune feature buckets.
Provides an impact/effort ranking and a recommended implementation sequence, highlighting codec-adapter coverage as the main dependency.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+# Research-0054: `vmaf-tune` capability audit — beyond Phases A/B + the fast path
+
+- **Status**: Active
+- **Workstream**: ADR-0237 (`vmaf-tune` umbrella), ADR-0235 (codec-aware FR regressor)
+- **Last updated**: 2026-05-03
+- **Author**: research scoping pass (no code)


+### Bucket 1 — Per-shot CRF tuning (Phase C as written)
+
+- **Summary**: Use `transnet_v2` to cut the source into shots, run
+  Phase B's bisect/proxy per shot, emit `--qpfile` (x264) /
+  `--zones` (x265) / SVT-AV1 segment table.
+- **Existing primitives**: `transnet_v2.onnx`, codec adapter
+  `emit_per_shot_overrides()` hook (already declared in ADR-0237),
+  Phase B bisect.
+- **Effort**: **M** — shot-aware orchestration + per-codec override
+  emission; non-trivial because shot-boundary frames near GOP edges
+  need encoder-specific handling.
+- **Impact**: **High** — Netflix's per-shot encoding is the canonical
+  reference; same-VMAF bitrate savings of 10–30% are the public
+  numbers from their 2018 paper.
+- **Open**: do we re-train the proxy on per-shot canonical-6, or is
+  the per-title proxy "good enough" if features are computed per
+  shot? (Hypothesis: per-title proxy generalises if features are
+  recomputed; needs a held-out check on BVI-DVC.)
+- **Already in roadmap**: yes — Phase D in ADR-0237, gated on T6-3b
+  per-shot CRF predictor.


+  user-facing flag (`--target-vmaf`, exists; `--minimise bitrate`,
+  doesn't).


+2. **Phase B docs follow-up** (≤ 1 week) — ship Buckets #4 + #5
+   explicitly as `--target-bitrate` and `--target-vmaf` modes;
+   trivial flag work, big perceived feature add. *Highest impact
+   ÷ effort in the audit.*


Co-Authored-By: Claude <noreply@anthropic.com>

… flags Implements Buckets 4 + 5 from Research-0061 (vmaf-tune capability audit). Adds a new `recommend` subcommand on `tools/vmaf-tune/` that consumes the Phase A corpus (either pre-built JSONL via `--from-corpus` or generated on the fly from `--source` + grid flags) and applies a user-supplied predicate over the existing `(crf, preset, vmaf_score, bitrate_kbps)` rows. - `--target-vmaf T` returns the row with the smallest CRF whose `vmaf_score >= T`. Falls back to the closest miss (highest VMAF) when no row clears the bar, with the predicate annotated `(UNMET)`. - `--target-bitrate KBPS` returns the row whose `bitrate_kbps` is closest to `KBPS` (absolute distance, ties broken by smaller CRF). The two flags are mutually exclusive at the argparse layer (exit code 2 when both are passed). Default output is a single human-readable line on stdout; `--json` switches to the full corpus row as a JSON object. No schema bump — `recommend` is a pure consumer of `CORPUS_ROW_KEYS`. 13-test suite under `tools/vmaf-tune/tests/test_recommend.py` covers predicate semantics, encoder/preset filtering, NaN/failed-encode rejection, and CLI exit codes; mocks all binaries so it runs in <100 ms. Six ADR-0108 deliverables: 1. Research digest — Research-0061 (parent, in flight via PR #354). 2. Decision matrix — no alternatives: only-one-way fix; the audit's ranked-by-impact/effort table is the alternatives matrix. 3. AGENTS.md invariant note — added in tools/vmaf-tune/AGENTS.md. 4. Reproducer / smoke test — pytest tools/vmaf-tune/tests/. 5. CHANGELOG.md — Added entry. 6. docs/rebase-notes.md — entry 0229. Parent ADR: ADR-0237 (vmaf-tune umbrella). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… flags (#358) * feat(tools): vmaf-tune recommend — --target-vmaf and --target-bitrate flags Implements Buckets 4 + 5 from Research-0061 (vmaf-tune capability audit). Adds a new `recommend` subcommand on `tools/vmaf-tune/` that consumes the Phase A corpus (either pre-built JSONL via `--from-corpus` or generated on the fly from `--source` + grid flags) and applies a user-supplied predicate over the existing `(crf, preset, vmaf_score, bitrate_kbps)` rows. - `--target-vmaf T` returns the row with the smallest CRF whose `vmaf_score >= T`. Falls back to the closest miss (highest VMAF) when no row clears the bar, with the predicate annotated `(UNMET)`. - `--target-bitrate KBPS` returns the row whose `bitrate_kbps` is closest to `KBPS` (absolute distance, ties broken by smaller CRF). The two flags are mutually exclusive at the argparse layer (exit code 2 when both are passed). Default output is a single human-readable line on stdout; `--json` switches to the full corpus row as a JSON object. No schema bump — `recommend` is a pure consumer of `CORPUS_ROW_KEYS`. 13-test suite under `tools/vmaf-tune/tests/test_recommend.py` covers predicate semantics, encoder/preset filtering, NaN/failed-encode rejection, and CLI exit codes; mocks all binaries so it runs in <100 ms. Six ADR-0108 deliverables: 1. Research digest — Research-0061 (parent, in flight via PR #354). 2. Decision matrix — no alternatives: only-one-way fix; the audit's ranked-by-impact/effort table is the alternatives matrix. 3. AGENTS.md invariant note — added in tools/vmaf-tune/AGENTS.md. 4. Reproducer / smoke test — pytest tools/vmaf-tune/tests/. 5. CHANGELOG.md — Added entry. 6. docs/rebase-notes.md — entry 0229. Parent ADR: ADR-0237 (vmaf-tune umbrella). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(ci): trigger workflow re-run Co-Authored-By: Claude <noreply@anthropic.com> * fix(tools): close fast.add_argument before rec parser (rebase residue) The marker-strip during the #358 rebase dropped the closing ')' between fast.add_argument and rec = sub.add_parser, leaving cli.py unparseable. Black + ruff hard-failed in CI. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds the first sibling codec to the ADR-0237 Phase A `libx264` scaffold: a one-file `X265Adapter` mirroring the `x264.py` shape (10 presets including `placebo`, 0..51 CRF window, `profile_for(pix_fmt)` helper that maps `yuv420p10le` → `main10` for downstream HDR work). Registered under `libx265` in `codec_adapters/__init__.py`; CLI `--encoder` now accepts `libx264 | libx265` via `choices=list(known_codecs())`. `encode.parse_versions` gains an encoder-aware regex so corpus rows record `libx265-<version>` correctly (default remains `libx264` for backward compatibility). No `SCHEMA_VERSION` bump — the existing `encoder` row column already carries codec identity. Phase B/C consumers receive the new codec without any contract change. 14 new subprocess-mocked smoke tests under `tools/vmaf-tune/tests/test_codec_adapter_x265.py` (29 of 30 vmaf-tune tests pass green; the one skipped case is the real-binary integration test gated on `VMAF_TUNE_INTEGRATION=1`). Unblocks ADR-0235 codec-aware FR regressor and PR #354 audit's buckets #6 (bitrate-ladder), #7 (codec-comparison), #9 (HDR), #15 (Pareto). Six deep-dive deliverables (ADR-0108): 1. research digest: no digest needed — trivial mirror of `x264.py`; alternatives matrix is exhausted in ADR-0276. 2. decision matrix: ADR-0276 §Alternatives considered. 3. AGENTS.md invariant note: tools/vmaf-tune/AGENTS.md updated to document the per-codec banner-regex carve-out in `parse_versions` and the wired-codecs list (libx264, libx265). 4. reproducer: `python -m pytest tools/vmaf-tune/tests/`. 5. CHANGELOG: changelog.d/added/ADR-0276-vmaf-tune-x265-adapter.md. 6. rebase-notes: docs/rebase-notes.md entry 0228. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(tools): vmaf-tune — x265 codec adapter (ADR-0276) Adds the first sibling codec to the ADR-0237 Phase A `libx264` scaffold: a one-file `X265Adapter` mirroring the `x264.py` shape (10 presets including `placebo`, 0..51 CRF window, `profile_for(pix_fmt)` helper that maps `yuv420p10le` → `main10` for downstream HDR work). Registered under `libx265` in `codec_adapters/__init__.py`; CLI `--encoder` now accepts `libx264 | libx265` via `choices=list(known_codecs())`. `encode.parse_versions` gains an encoder-aware regex so corpus rows record `libx265-<version>` correctly (default remains `libx264` for backward compatibility). No `SCHEMA_VERSION` bump — the existing `encoder` row column already carries codec identity. Phase B/C consumers receive the new codec without any contract change. 14 new subprocess-mocked smoke tests under `tools/vmaf-tune/tests/test_codec_adapter_x265.py` (29 of 30 vmaf-tune tests pass green; the one skipped case is the real-binary integration test gated on `VMAF_TUNE_INTEGRATION=1`). Unblocks ADR-0235 codec-aware FR regressor and PR #354 audit's buckets #6 (bitrate-ladder), #7 (codec-comparison), #9 (HDR), #15 (Pareto). Six deep-dive deliverables (ADR-0108): 1. research digest: no digest needed — trivial mirror of `x264.py`; alternatives matrix is exhausted in ADR-0276. 2. decision matrix: ADR-0276 §Alternatives considered. 3. AGENTS.md invariant note: tools/vmaf-tune/AGENTS.md updated to document the per-codec banner-regex carve-out in `parse_versions` and the wired-codecs list (libx264, libx265). 4. reproducer: `python -m pytest tools/vmaf-tune/tests/`. 5. CHANGELOG: changelog.d/added/ADR-0276-vmaf-tune-x265-adapter.md. 6. rebase-notes: docs/rebase-notes.md entry 0228. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: re-trigger CI after deliverables canonicalised * chore: re-trigger CI after research-digest opt-out * chore(tools): black format test_corpus.py --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds tools/vmaf-tune/src/vmaftune/resolution.py exposing select_vmaf_model_version (height>=2160 -> vmaf_4k_v0.6.1, else vmaf_v0.6.1), select_vmaf_model (Path), and crf_offset_for_resolution (4K -2, 1080p 0, 720p +2, sub-720p +4). corpus.iter_rows auto-picks the model per encode resolution; CLI gains --resolution-aware / --no-resolution-aware (default on). Emitted JSONL row's vmaf_model field now records the *effective* model used per row, not the global option — required for mixed-ladder corpora to be unambiguous downstream. Closes PR #354 audit Bucket #8. Decision rule mirrors Netflix's published guidance. Refs: ADR-0237 (parent), ADR-0280 (this), Research-0054. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ts (#363) Adds tools/vmaf-tune/src/vmaftune/resolution.py exposing select_vmaf_model_version (height>=2160 -> vmaf_4k_v0.6.1, else vmaf_v0.6.1), select_vmaf_model (Path), and crf_offset_for_resolution (4K -2, 1080p 0, 720p +2, sub-720p +4). corpus.iter_rows auto-picks the model per encode resolution; CLI gains --resolution-aware / --no-resolution-aware (default on). Emitted JSONL row's vmaf_model field now records the *effective* model used per row, not the global option — required for mixed-ladder corpora to be unambiguous downstream. Closes PR #354 audit Bucket #8. Decision rule mirrors Netflix's published guidance. Refs: ADR-0237 (parent), ADR-0280 (this), Research-0054. Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…er) (#371) * feat(tools): vmaf-tune Phase E — per-title bitrate ladder (game-changer) Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(docs): renumber phase-e ADR 0277→0295 + research 0066→0068 (collisions) --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…al scaffold) Adds a probabilistic head on top of the codec-aware fr_regressor_v2 (parent: ADR-0272 / PR #347 in flight) so producers can drive the in-flight `vmaf-tune --quality-confidence 0.95` flag (ADR-0237) off a calibrated prediction interval instead of v2's bare MOS scalar. PR #354 audit Bucket #18 (top-3 ranked). Trainer (`ai/scripts/train_fr_regressor_v2_ensemble.py`) trains N=5 copies of the v2 architecture (`FRRegressor(num_codecs=NUM_CODECS)`) under distinct seeds, exports each as a separate two-input ONNX (`features [N, 6]` + `codec_onehot [N, NUM_CODECS]`), and writes an ensemble manifest sidecar that pins per-member sha256s, feature standardisation, codec vocab, nominal coverage, and an optional split-conformal residual quantile from a held-out calibration split. Inference rule is `mu ± q · σ` with `q = 1.96` (Gaussian) or the empirical conformal quantile (Vovk 2005, Romano 2019 — distribution-free marginal coverage on exchangeable data). Evaluator (`ai/scripts/eval_probabilistic_proxy.py`) reports empirical coverage at 50/80/95 % nominal levels, mean interval width, and the mean-prediction PLCC; reports the conformal-interval row when the manifest carries a conformal scalar. Smoke-only ship: synthetic 100-row corpus, 1 epoch / member. Production training is gated on the multi-codec Phase A corpus (T7-FR-REGRESSOR-V2-PROBABILISTIC). Six ADR-0108 deliverables: 1. Research digest: docs/research/0054-fr-regressor-v2-probabilistic.md. 2. Decision matrix: ADR-0279 § Alternatives considered. 3. AGENTS.md invariant note: appended to ai/AGENTS.md. 4. Reproducer: `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` followed by `python ai/scripts/eval_probabilistic_proxy.py --smoke`. 5. CHANGELOG ### Added entry under Unreleased — lusoris fork. 6. Rebase-notes entry: ### 0229 in docs/rebase-notes.md. Test plan: - `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` produces 5 valid two-input ONNX members + manifest sidecar (ran locally). - `python ai/scripts/eval_probabilistic_proxy.py --smoke` aggregates the 5 ONNX outputs into (mu, sigma) and reports coverage at 50/80/95 %. - `python ai/scripts/validate_model_registry.py` → 15 entries valid. - `pre-commit run --files <changed>` → Passed (black / isort / ruff / json-check / secrets / semgrep). - `markdownlint-cli2` on all new docs → 0 errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…al scaffold) (#372) * feat(ai): fr_regressor_v2 probabilistic head (deep-ensemble + conformal scaffold) Adds a probabilistic head on top of the codec-aware fr_regressor_v2 (parent: ADR-0272 / PR #347 in flight) so producers can drive the in-flight `vmaf-tune --quality-confidence 0.95` flag (ADR-0237) off a calibrated prediction interval instead of v2's bare MOS scalar. PR #354 audit Bucket #18 (top-3 ranked). Trainer (`ai/scripts/train_fr_regressor_v2_ensemble.py`) trains N=5 copies of the v2 architecture (`FRRegressor(num_codecs=NUM_CODECS)`) under distinct seeds, exports each as a separate two-input ONNX (`features [N, 6]` + `codec_onehot [N, NUM_CODECS]`), and writes an ensemble manifest sidecar that pins per-member sha256s, feature standardisation, codec vocab, nominal coverage, and an optional split-conformal residual quantile from a held-out calibration split. Inference rule is `mu ± q · σ` with `q = 1.96` (Gaussian) or the empirical conformal quantile (Vovk 2005, Romano 2019 — distribution-free marginal coverage on exchangeable data). Evaluator (`ai/scripts/eval_probabilistic_proxy.py`) reports empirical coverage at 50/80/95 % nominal levels, mean interval width, and the mean-prediction PLCC; reports the conformal-interval row when the manifest carries a conformal scalar. Smoke-only ship: synthetic 100-row corpus, 1 epoch / member. Production training is gated on the multi-codec Phase A corpus (T7-FR-REGRESSOR-V2-PROBABILISTIC). Six ADR-0108 deliverables: 1. Research digest: docs/research/0054-fr-regressor-v2-probabilistic.md. 2. Decision matrix: ADR-0279 § Alternatives considered. 3. AGENTS.md invariant note: appended to ai/AGENTS.md. 4. Reproducer: `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` followed by `python ai/scripts/eval_probabilistic_proxy.py --smoke`. 5. CHANGELOG ### Added entry under Unreleased — lusoris fork. 6. Rebase-notes entry: ### 0229 in docs/rebase-notes.md. Test plan: - `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` produces 5 valid two-input ONNX members + manifest sidecar (ran locally). - `python ai/scripts/eval_probabilistic_proxy.py --smoke` aggregates the 5 ONNX outputs into (mu, sigma) and reports coverage at 50/80/95 %. - `python ai/scripts/validate_model_registry.py` → 15 entries valid. - `pre-commit run --files <changed>` → Passed (black / isort / ruff / json-check / secrets / semgrep). - `markdownlint-cli2` on all new docs → 0 errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(registry): split fr_regressor_v2 + ensemble_seed0 into distinct entries --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wires the fork-trained `saliency_student_v1` ONNX model (ADR-0286 / PR #359) into vmaf-tune so a single command can produce an encode that biases bits toward salient regions. New surfaces: - `tools/vmaf-tune/src/vmaftune/saliency.py` — pure-NumPy signal-blend pipeline (sample frames -> ImageNet-normalised RGB -> ONNX inference -> per-pixel saliency mean -> per-MB QP-offset map clamped to ±12 -> x264 ASCII qpfile). - `vmaf-tune recommend --saliency-aware [--saliency-offset -4] [--saliency-model PATH]` CLI subcommand. Falls back to a plain encode when onnxruntime or the model file is unavailable; the flag surface is wired so Phase B (target-VMAF bisect) can drop in without renaming flags. - 13 unit tests under `tests/test_saliency.py` mocking both the ONNX session and the encode runner — runs without onnxruntime or ffmpeg installed. Bucket #2 of the PR #354 audit. Six ADR-0108 deliverables: 1. Research-0046 (digest) 2. ADR-0287 §"Alternatives considered" (decision matrix) 3. tools/vmaf-tune/AGENTS.md (saliency invariant) 4. `pytest tools/vmaf-tune/tests/test_saliency.py -v` (smoke) 5. changelog.d/added/T-VMAF-TUNE-saliency-aware.md 6. docs/rebase-notes.md §0287 User docs: docs/usage/vmaf-tune.md §"Saliency-aware encoding". Decision: ADR-0287. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wires the fork-trained `saliency_student_v1` ONNX model (ADR-0286 / PR #359) into vmaf-tune so a single command can produce an encode that biases bits toward salient regions. New surfaces: - `tools/vmaf-tune/src/vmaftune/saliency.py` — pure-NumPy signal-blend pipeline (sample frames -> ImageNet-normalised RGB -> ONNX inference -> per-pixel saliency mean -> per-MB QP-offset map clamped to ±12 -> x264 ASCII qpfile). - `vmaf-tune recommend --saliency-aware [--saliency-offset -4] [--saliency-model PATH]` CLI subcommand. Falls back to a plain encode when onnxruntime or the model file is unavailable; the flag surface is wired so Phase B (target-VMAF bisect) can drop in without renaming flags. - 13 unit tests under `tests/test_saliency.py` mocking both the ONNX session and the encode runner — runs without onnxruntime or ffmpeg installed. Bucket #2 of the PR #354 audit. Six ADR-0108 deliverables: 1. Research-0046 (digest) 2. ADR-0287 §"Alternatives considered" (decision matrix) 3. tools/vmaf-tune/AGENTS.md (saliency invariant) 4. `pytest tools/vmaf-tune/tests/test_saliency.py -v` (smoke) 5. changelog.d/added/T-VMAF-TUNE-saliency-aware.md 6. docs/rebase-notes.md §0287 User docs: docs/usage/vmaf-tune.md §"Saliency-aware encoding". Decision: ADR-0287. Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#9, ADR-0261) Closes Bucket #9 of the PR #354 vmaf-tune capability audit. Adds ffprobe-driven HDR detection, codec-specific HDR encode flag dispatch, and HDR-VMAF model resolution to the Phase A corpus driver. New module ``tools/vmaf-tune/src/vmaftune/hdr.py``: - ``detect_hdr(path)`` — runs ``ffprobe -show_streams -of json``, classifies the first video stream as PQ / HLG / SDR. Strict BT.2020-primaries gate so malformed signaling falls back to SDR. - ``hdr_codec_args(encoder, info)`` — per-encoder dispatch table covering libx264 (container ``-color_*``), libx265 (``-x265-params`` with master-display + max-cll + hdr10-opt), libsvtav1 (AV1 enums via ``-svtav1-params``), hevc_nvenc (``-pix_fmt p010le -profile:v main10``), libvvenc. - ``select_hdr_vmaf_model()`` — globs ``model/vmaf_hdr_*.json``; returns ``None`` when none shipped (current state — fork hasn't ported Netflix's HDR model yet). Corpus driver wiring: - ``CorpusOptions.hdr_mode`` ∈ {``auto``, ``force-sdr``, ``force-hdr-pq``, ``force-hdr-hlg``}; CLI flags ``--auto-hdr`` / ``--force-sdr`` / ``--force-hdr-pq`` / ``--force-hdr-hlg`` (mutually exclusive). Auto is the default. - New schema-v2 row keys ``hdr_transfer`` / ``hdr_primaries`` / ``hdr_forced``; ``SCHEMA_VERSION`` bumped 1 → 2. Phase B / C loaders treat missing keys as SDR (additive change, v1 rows remain readable). - ``score._model_arg`` now passes pre-formatted ``path=`` / ``version=`` strings through unchanged so the HDR model path can be injected via ``vmaf --model``. - HDR detected but no HDR model shipped → log warning, fall back to SDR model with notice that scores trend low. Tests (``tools/vmaf-tune/tests/test_hdr.py``, 21 cases): - detection: SDR / PQ / HLG / mismatched-primaries / missing-file / ffprobe-failure / invalid-JSON - codec dispatch: shape per encoder (x264, x265 PQ + HLG, SVT-AV1 PQ + HLG, NVENC HEVC, unknown encoder) - model resolution: empty dir / shipped / multi-version pick-latest / missing dir - corpus integration: end-to-end ``force-hdr-pq`` (verify HDR fields in row + ``-color_*`` in encode argv) and ``force-sdr`` ADR-0261 (Accepted, encode-side; HDR-VMAF scoring deferred until fork-local model port). Research-0054 digest, rebase-notes 0261, AGENTS.md invariant note, docs/usage/vmaf-tune.md HDR section, changelog fragment all included.

* feat(tools): vmaf-tune — HDR-aware encoding + HDR-VMAF scoring (Bucket #9, ADR-0261) Closes Bucket #9 of the PR #354 vmaf-tune capability audit. Adds ffprobe-driven HDR detection, codec-specific HDR encode flag dispatch, and HDR-VMAF model resolution to the Phase A corpus driver. New module ``tools/vmaf-tune/src/vmaftune/hdr.py``: - ``detect_hdr(path)`` — runs ``ffprobe -show_streams -of json``, classifies the first video stream as PQ / HLG / SDR. Strict BT.2020-primaries gate so malformed signaling falls back to SDR. - ``hdr_codec_args(encoder, info)`` — per-encoder dispatch table covering libx264 (container ``-color_*``), libx265 (``-x265-params`` with master-display + max-cll + hdr10-opt), libsvtav1 (AV1 enums via ``-svtav1-params``), hevc_nvenc (``-pix_fmt p010le -profile:v main10``), libvvenc. - ``select_hdr_vmaf_model()`` — globs ``model/vmaf_hdr_*.json``; returns ``None`` when none shipped (current state — fork hasn't ported Netflix's HDR model yet). Corpus driver wiring: - ``CorpusOptions.hdr_mode`` ∈ {``auto``, ``force-sdr``, ``force-hdr-pq``, ``force-hdr-hlg``}; CLI flags ``--auto-hdr`` / ``--force-sdr`` / ``--force-hdr-pq`` / ``--force-hdr-hlg`` (mutually exclusive). Auto is the default. - New schema-v2 row keys ``hdr_transfer`` / ``hdr_primaries`` / ``hdr_forced``; ``SCHEMA_VERSION`` bumped 1 → 2. Phase B / C loaders treat missing keys as SDR (additive change, v1 rows remain readable). - ``score._model_arg`` now passes pre-formatted ``path=`` / ``version=`` strings through unchanged so the HDR model path can be injected via ``vmaf --model``. - HDR detected but no HDR model shipped → log warning, fall back to SDR model with notice that scores trend low. Tests (``tools/vmaf-tune/tests/test_hdr.py``, 21 cases): - detection: SDR / PQ / HLG / mismatched-primaries / missing-file / ffprobe-failure / invalid-JSON - codec dispatch: shape per encoder (x264, x265 PQ + HLG, SVT-AV1 PQ + HLG, NVENC HEVC, unknown encoder) - model resolution: empty dir / shipped / multi-version pick-latest / missing dir - corpus integration: end-to-end ``force-hdr-pq`` (verify HDR fields in row + ``-color_*`` in encode argv) and ``force-sdr`` ADR-0261 (Accepted, encode-side; HDR-VMAF scoring deferred until fork-local model port). Research-0054 digest, rebase-notes 0261, AGENTS.md invariant note, docs/usage/vmaf-tune.md HDR section, changelog fragment all included. * chore(docs): renumber hdr-aware ADR/research to dodge collisions (0295→0300, 0261→0300, 0071→0072) * fix(tools): drop duplicate vmaf_model dict key in corpus.py --------- Co-authored-by: Lusoris <lusoris@pm.me>

Wires the fork-trained `saliency_student_v1` ONNX model (ADR-0286 / PR #359) into vmaf-tune so a single command can produce an encode that biases bits toward salient regions. New surfaces: - `tools/vmaf-tune/src/vmaftune/saliency.py` — pure-NumPy signal-blend pipeline (sample frames -> ImageNet-normalised RGB -> ONNX inference -> per-pixel saliency mean -> per-MB QP-offset map clamped to ±12 -> x264 ASCII qpfile). - `vmaf-tune recommend --saliency-aware [--saliency-offset -4] [--saliency-model PATH]` CLI subcommand. Falls back to a plain encode when onnxruntime or the model file is unavailable; the flag surface is wired so Phase B (target-VMAF bisect) can drop in without renaming flags. - 13 unit tests under `tests/test_saliency.py` mocking both the ONNX session and the encode runner — runs without onnxruntime or ffmpeg installed. Bucket #2 of the PR #354 audit. Six ADR-0108 deliverables: 1. Research-0046 (digest) 2. ADR-0287 §"Alternatives considered" (decision matrix) 3. tools/vmaf-tune/AGENTS.md (saliency invariant) 4. `pytest tools/vmaf-tune/tests/test_saliency.py -v` (smoke) 5. changelog.d/added/T-VMAF-TUNE-saliency-aware.md 6. docs/rebase-notes.md §0287 User docs: docs/usage/vmaf-tune.md §"Saliency-aware encoding". Decision: ADR-0287. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(tools): vmaf-tune — saliency-aware ROI encoding (Bucket #2) Wires the fork-trained `saliency_student_v1` ONNX model (ADR-0286 / PR #359) into vmaf-tune so a single command can produce an encode that biases bits toward salient regions. New surfaces: - `tools/vmaf-tune/src/vmaftune/saliency.py` — pure-NumPy signal-blend pipeline (sample frames -> ImageNet-normalised RGB -> ONNX inference -> per-pixel saliency mean -> per-MB QP-offset map clamped to ±12 -> x264 ASCII qpfile). - `vmaf-tune recommend --saliency-aware [--saliency-offset -4] [--saliency-model PATH]` CLI subcommand. Falls back to a plain encode when onnxruntime or the model file is unavailable; the flag surface is wired so Phase B (target-VMAF bisect) can drop in without renaming flags. - 13 unit tests under `tests/test_saliency.py` mocking both the ONNX session and the encode runner — runs without onnxruntime or ffmpeg installed. Bucket #2 of the PR #354 audit. Six ADR-0108 deliverables: 1. Research-0046 (digest) 2. ADR-0287 §"Alternatives considered" (decision matrix) 3. tools/vmaf-tune/AGENTS.md (saliency invariant) 4. `pytest tools/vmaf-tune/tests/test_saliency.py -v` (smoke) 5. changelog.d/added/T-VMAF-TUNE-saliency-aware.md 6. docs/rebase-notes.md §0287 User docs: docs/usage/vmaf-tune.md §"Saliency-aware encoding". Decision: ADR-0287. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(vmaf-tune): CHANGELOG fragment for recommend-saliency CLI ADR-0108 deliverables-checklist gate on PR #432 wants the fragment to be added in this PR's diff, not just present on master. The existing fragment ``T-VMAF-TUNE-saliency-aware.md`` covers the saliency engine that's already merged; this new fragment covers the CLI subcommand specifically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#9, ADR-0261) Closes Bucket #9 of the PR #354 vmaf-tune capability audit. Adds ffprobe-driven HDR detection, codec-specific HDR encode flag dispatch, and HDR-VMAF model resolution to the Phase A corpus driver. New module ``tools/vmaf-tune/src/vmaftune/hdr.py``: - ``detect_hdr(path)`` — runs ``ffprobe -show_streams -of json``, classifies the first video stream as PQ / HLG / SDR. Strict BT.2020-primaries gate so malformed signaling falls back to SDR. - ``hdr_codec_args(encoder, info)`` — per-encoder dispatch table covering libx264 (container ``-color_*``), libx265 (``-x265-params`` with master-display + max-cll + hdr10-opt), libsvtav1 (AV1 enums via ``-svtav1-params``), hevc_nvenc (``-pix_fmt p010le -profile:v main10``), libvvenc. - ``select_hdr_vmaf_model()`` — globs ``model/vmaf_hdr_*.json``; returns ``None`` when none shipped (current state — fork hasn't ported Netflix's HDR model yet). Corpus driver wiring: - ``CorpusOptions.hdr_mode`` ∈ {``auto``, ``force-sdr``, ``force-hdr-pq``, ``force-hdr-hlg``}; CLI flags ``--auto-hdr`` / ``--force-sdr`` / ``--force-hdr-pq`` / ``--force-hdr-hlg`` (mutually exclusive). Auto is the default. - New schema-v2 row keys ``hdr_transfer`` / ``hdr_primaries`` / ``hdr_forced``; ``SCHEMA_VERSION`` bumped 1 → 2. Phase B / C loaders treat missing keys as SDR (additive change, v1 rows remain readable). - ``score._model_arg`` now passes pre-formatted ``path=`` / ``version=`` strings through unchanged so the HDR model path can be injected via ``vmaf --model``. - HDR detected but no HDR model shipped → log warning, fall back to SDR model with notice that scores trend low. Tests (``tools/vmaf-tune/tests/test_hdr.py``, 21 cases): - detection: SDR / PQ / HLG / mismatched-primaries / missing-file / ffprobe-failure / invalid-JSON - codec dispatch: shape per encoder (x264, x265 PQ + HLG, SVT-AV1 PQ + HLG, NVENC HEVC, unknown encoder) - model resolution: empty dir / shipped / multi-version pick-latest / missing dir - corpus integration: end-to-end ``force-hdr-pq`` (verify HDR fields in row + ``-color_*`` in encode argv) and ``force-sdr`` ADR-0261 (Accepted, encode-side; HDR-VMAF scoring deferred until fork-local model port). Research-0054 digest, rebase-notes 0261, AGENTS.md invariant note, docs/usage/vmaf-tune.md HDR section, changelog fragment all included.

…er) (#433) Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#9, ADR-0261) (#434) Closes Bucket #9 of the PR #354 vmaf-tune capability audit. Adds ffprobe-driven HDR detection, codec-specific HDR encode flag dispatch, and HDR-VMAF model resolution to the Phase A corpus driver. New module ``tools/vmaf-tune/src/vmaftune/hdr.py``: - ``detect_hdr(path)`` — runs ``ffprobe -show_streams -of json``, classifies the first video stream as PQ / HLG / SDR. Strict BT.2020-primaries gate so malformed signaling falls back to SDR. - ``hdr_codec_args(encoder, info)`` — per-encoder dispatch table covering libx264 (container ``-color_*``), libx265 (``-x265-params`` with master-display + max-cll + hdr10-opt), libsvtav1 (AV1 enums via ``-svtav1-params``), hevc_nvenc (``-pix_fmt p010le -profile:v main10``), libvvenc. - ``select_hdr_vmaf_model()`` — globs ``model/vmaf_hdr_*.json``; returns ``None`` when none shipped (current state — fork hasn't ported Netflix's HDR model yet). Corpus driver wiring: - ``CorpusOptions.hdr_mode`` ∈ {``auto``, ``force-sdr``, ``force-hdr-pq``, ``force-hdr-hlg``}; CLI flags ``--auto-hdr`` / ``--force-sdr`` / ``--force-hdr-pq`` / ``--force-hdr-hlg`` (mutually exclusive). Auto is the default. - New schema-v2 row keys ``hdr_transfer`` / ``hdr_primaries`` / ``hdr_forced``; ``SCHEMA_VERSION`` bumped 1 → 2. Phase B / C loaders treat missing keys as SDR (additive change, v1 rows remain readable). - ``score._model_arg`` now passes pre-formatted ``path=`` / ``version=`` strings through unchanged so the HDR model path can be injected via ``vmaf --model``. - HDR detected but no HDR model shipped → log warning, fall back to SDR model with notice that scores trend low. Tests (``tools/vmaf-tune/tests/test_hdr.py``, 21 cases): - detection: SDR / PQ / HLG / mismatched-primaries / missing-file / ffprobe-failure / invalid-JSON - codec dispatch: shape per encoder (x264, x265 PQ + HLG, SVT-AV1 PQ + HLG, NVENC HEVC, unknown encoder) - model resolution: empty dir / shipped / multi-version pick-latest / missing dir - corpus integration: end-to-end ``force-hdr-pq`` (verify HDR fields in row + ``-color_*`` in encode argv) and ``force-sdr`` ADR-0261 (Accepted, encode-side; HDR-VMAF scoring deferred until fork-local model port). Research-0054 digest, rebase-notes 0261, AGENTS.md invariant note, docs/usage/vmaf-tune.md HDR section, changelog fragment all included. Co-authored-by: Lusoris <lusoris@pm.me>

lusoris force-pushed the research/vmaf-tune-capability-audit branch from 2375000 to e133d98 Compare May 3, 2026 19:24

This was referenced May 3, 2026

feat(tools): vmaf-tune recommend — --target-vmaf and --target-bitrate flags #358

Merged

feat(tools): vmaf-tune — x265 codec adapter #362

Merged

feat(tools): vmaf-tune — resolution-aware model selection + CRF offsets #363

Merged

This was referenced May 3, 2026

feat(tools): vmaf-tune Phase D — per-shot CRF tuning (transnet_v2) #369

Merged

feat(tools): vmaf-tune Phase E — per-title bitrate ladder (game-changer) #371

Merged

lusoris mentioned this pull request May 3, 2026

feat(ai): fr_regressor_v2 probabilistic head (deep-ensemble + conformal scaffold) #372

Merged

12 tasks

lusoris mentioned this pull request May 3, 2026

feat(tools): vmaf-tune — saliency-aware ROI encoding (Bucket #2) #374

Merged

12 tasks

This was referenced May 3, 2026

feat(tools): vmaf-tune compare — multi-codec ranked output #377

Merged

feat(tools): vmaf-tune — HDR-aware encoding + HDR-VMAF scoring #379

Merged

lusoris force-pushed the research/vmaf-tune-capability-audit branch from e133d98 to beff045 Compare May 4, 2026 20:53

lusoris marked this pull request as ready for review May 4, 2026 20:54

Copilot AI review requested due to automatic review settings May 4, 2026 20:54

lusoris marked this pull request as draft May 4, 2026 20:54

lusoris marked this pull request as ready for review May 4, 2026 20:54

Copilot started reviewing on behalf of lusoris May 4, 2026 20:55 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

chore(ci): trigger workflow re-run

9162f78

Co-Authored-By: Claude <noreply@anthropic.com>

lusoris merged commit 7f543ce into master May 4, 2026
54 checks passed

lusoris deleted the research/vmaf-tune-capability-audit branch May 4, 2026 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

research(tools): vmaf-tune capability audit — what else can it do?#354

research(tools): vmaf-tune capability audit — what else can it do?#354
lusoris merged 2 commits intomasterfrom
research/vmaf-tune-capability-audit

lusoris commented May 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		user-facing flag (`--target-vmaf`, exists; `--minimise bitrate`,
		doesn't).

Uh oh!

Conversation

lusoris commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type

Checklist

Bug-status hygiene

Netflix golden-data gate

Deep-dive deliverables (ADR-0108)

Reproducer

Out of scope

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lusoris commented May 3, 2026 •

edited

Loading