research(tools): vmaf-tune capability audit — what else can it do?#354
Merged
research(tools): vmaf-tune capability audit — what else can it do?#354
Conversation
2375000 to
e133d98
Compare
This was referenced May 3, 2026
lusoris
pushed a commit
that referenced
this pull request
May 3, 2026
Adds the first sibling codec to the ADR-0237 Phase A `libx264` scaffold: a one-file `X265Adapter` mirroring the `x264.py` shape (10 presets including `placebo`, 0..51 CRF window, `profile_for(pix_fmt)` helper that maps `yuv420p10le` → `main10` for downstream HDR work). Registered under `libx265` in `codec_adapters/__init__.py`; CLI `--encoder` now accepts `libx264 | libx265` via `choices=list(known_codecs())`. `encode.parse_versions` gains an encoder-aware regex so corpus rows record `libx265-<version>` correctly (default remains `libx264` for backward compatibility). No `SCHEMA_VERSION` bump — the existing `encoder` row column already carries codec identity. Phase B/C consumers receive the new codec without any contract change. 14 new subprocess-mocked smoke tests under `tools/vmaf-tune/tests/test_codec_adapter_x265.py` (29 of 30 vmaf-tune tests pass green; the one skipped case is the real-binary integration test gated on `VMAF_TUNE_INTEGRATION=1`). Unblocks ADR-0235 codec-aware FR regressor and PR #354 audit's buckets #6 (bitrate-ladder), #7 (codec-comparison), #9 (HDR), #15 (Pareto). Six deep-dive deliverables (ADR-0108): 1. research digest: no digest needed — trivial mirror of `x264.py`; alternatives matrix is exhausted in ADR-0276. 2. decision matrix: ADR-0276 §Alternatives considered. 3. AGENTS.md invariant note: tools/vmaf-tune/AGENTS.md updated to document the per-codec banner-regex carve-out in `parse_versions` and the wired-codecs list (libx264, libx265). 4. reproducer: `python -m pytest tools/vmaf-tune/tests/`. 5. CHANGELOG: changelog.d/added/ADR-0276-vmaf-tune-x265-adapter.md. 6. rebase-notes: docs/rebase-notes.md entry 0228. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 3, 2026
Adds the first sibling codec to the ADR-0237 Phase A `libx264` scaffold: a one-file `X265Adapter` mirroring the `x264.py` shape (10 presets including `placebo`, 0..51 CRF window, `profile_for(pix_fmt)` helper that maps `yuv420p10le` → `main10` for downstream HDR work). Registered under `libx265` in `codec_adapters/__init__.py`; CLI `--encoder` now accepts `libx264 | libx265` via `choices=list(known_codecs())`. `encode.parse_versions` gains an encoder-aware regex so corpus rows record `libx265-<version>` correctly (default remains `libx264` for backward compatibility). No `SCHEMA_VERSION` bump — the existing `encoder` row column already carries codec identity. Phase B/C consumers receive the new codec without any contract change. 14 new subprocess-mocked smoke tests under `tools/vmaf-tune/tests/test_codec_adapter_x265.py` (29 of 30 vmaf-tune tests pass green; the one skipped case is the real-binary integration test gated on `VMAF_TUNE_INTEGRATION=1`). Unblocks ADR-0235 codec-aware FR regressor and PR #354 audit's buckets #6 (bitrate-ladder), #7 (codec-comparison), #9 (HDR), #15 (Pareto). Six deep-dive deliverables (ADR-0108): 1. research digest: no digest needed — trivial mirror of `x264.py`; alternatives matrix is exhausted in ADR-0276. 2. decision matrix: ADR-0276 §Alternatives considered. 3. AGENTS.md invariant note: tools/vmaf-tune/AGENTS.md updated to document the per-codec banner-regex carve-out in `parse_versions` and the wired-codecs list (libx264, libx265). 4. reproducer: `python -m pytest tools/vmaf-tune/tests/`. 5. CHANGELOG: changelog.d/added/ADR-0276-vmaf-tune-x265-adapter.md. 6. rebase-notes: docs/rebase-notes.md entry 0228. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 3, 2026
Adds tools/vmaf-tune/src/vmaftune/resolution.py exposing select_vmaf_model_version (height>=2160 -> vmaf_4k_v0.6.1, else vmaf_v0.6.1), select_vmaf_model (Path), and crf_offset_for_resolution (4K -2, 1080p 0, 720p +2, sub-720p +4). corpus.iter_rows auto-picks the model per encode resolution; CLI gains --resolution-aware / --no-resolution-aware (default on). Emitted JSONL row's vmaf_model field now records the *effective* model used per row, not the global option — required for mixed-ladder corpora to be unambiguous downstream. Closes PR #354 audit Bucket #8. Decision rule mirrors Netflix's published guidance. Refs: ADR-0237 (parent), ADR-0280 (this), Research-0054. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 3, 2026
lusoris
pushed a commit
that referenced
this pull request
May 3, 2026
Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
12 tasks
lusoris
pushed a commit
that referenced
this pull request
May 3, 2026
…al scaffold) Adds a probabilistic head on top of the codec-aware fr_regressor_v2 (parent: ADR-0272 / PR #347 in flight) so producers can drive the in-flight `vmaf-tune --quality-confidence 0.95` flag (ADR-0237) off a calibrated prediction interval instead of v2's bare MOS scalar. PR #354 audit Bucket #18 (top-3 ranked). Trainer (`ai/scripts/train_fr_regressor_v2_ensemble.py`) trains N=5 copies of the v2 architecture (`FRRegressor(num_codecs=NUM_CODECS)`) under distinct seeds, exports each as a separate two-input ONNX (`features [N, 6]` + `codec_onehot [N, NUM_CODECS]`), and writes an ensemble manifest sidecar that pins per-member sha256s, feature standardisation, codec vocab, nominal coverage, and an optional split-conformal residual quantile from a held-out calibration split. Inference rule is `mu ± q · σ` with `q = 1.96` (Gaussian) or the empirical conformal quantile (Vovk 2005, Romano 2019 — distribution-free marginal coverage on exchangeable data). Evaluator (`ai/scripts/eval_probabilistic_proxy.py`) reports empirical coverage at 50/80/95 % nominal levels, mean interval width, and the mean-prediction PLCC; reports the conformal-interval row when the manifest carries a conformal scalar. Smoke-only ship: synthetic 100-row corpus, 1 epoch / member. Production training is gated on the multi-codec Phase A corpus (T7-FR-REGRESSOR-V2-PROBABILISTIC). Six ADR-0108 deliverables: 1. Research digest: docs/research/0054-fr-regressor-v2-probabilistic.md. 2. Decision matrix: ADR-0279 § Alternatives considered. 3. AGENTS.md invariant note: appended to ai/AGENTS.md. 4. Reproducer: `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` followed by `python ai/scripts/eval_probabilistic_proxy.py --smoke`. 5. CHANGELOG ### Added entry under Unreleased — lusoris fork. 6. Rebase-notes entry: ### 0229 in docs/rebase-notes.md. Test plan: - `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` produces 5 valid two-input ONNX members + manifest sidecar (ran locally). - `python ai/scripts/eval_probabilistic_proxy.py --smoke` aggregates the 5 ONNX outputs into (mu, sigma) and reports coverage at 50/80/95 %. - `python ai/scripts/validate_model_registry.py` → 15 entries valid. - `pre-commit run --files <changed>` → Passed (black / isort / ruff / json-check / secrets / semgrep). - `markdownlint-cli2` on all new docs → 0 errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
12 tasks
lusoris
pushed a commit
that referenced
this pull request
May 3, 2026
Wires the fork-trained `saliency_student_v1` ONNX model (ADR-0286 / PR #359) into vmaf-tune so a single command can produce an encode that biases bits toward salient regions. New surfaces: - `tools/vmaf-tune/src/vmaftune/saliency.py` — pure-NumPy signal-blend pipeline (sample frames -> ImageNet-normalised RGB -> ONNX inference -> per-pixel saliency mean -> per-MB QP-offset map clamped to ±12 -> x264 ASCII qpfile). - `vmaf-tune recommend --saliency-aware [--saliency-offset -4] [--saliency-model PATH]` CLI subcommand. Falls back to a plain encode when onnxruntime or the model file is unavailable; the flag surface is wired so Phase B (target-VMAF bisect) can drop in without renaming flags. - 13 unit tests under `tests/test_saliency.py` mocking both the ONNX session and the encode runner — runs without onnxruntime or ffmpeg installed. Bucket #2 of the PR #354 audit. Six ADR-0108 deliverables: 1. Research-0046 (digest) 2. ADR-0287 §"Alternatives considered" (decision matrix) 3. tools/vmaf-tune/AGENTS.md (saliency invariant) 4. `pytest tools/vmaf-tune/tests/test_saliency.py -v` (smoke) 5. changelog.d/added/T-VMAF-TUNE-saliency-aware.md 6. docs/rebase-notes.md §0287 User docs: docs/usage/vmaf-tune.md §"Saliency-aware encoding". Decision: ADR-0287. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 3, 2026
lusoris
pushed a commit
that referenced
this pull request
May 3, 2026
…#9, ADR-0261) Closes Bucket #9 of the PR #354 vmaf-tune capability audit. Adds ffprobe-driven HDR detection, codec-specific HDR encode flag dispatch, and HDR-VMAF model resolution to the Phase A corpus driver. New module ``tools/vmaf-tune/src/vmaftune/hdr.py``: - ``detect_hdr(path)`` — runs ``ffprobe -show_streams -of json``, classifies the first video stream as PQ / HLG / SDR. Strict BT.2020-primaries gate so malformed signaling falls back to SDR. - ``hdr_codec_args(encoder, info)`` — per-encoder dispatch table covering libx264 (container ``-color_*``), libx265 (``-x265-params`` with master-display + max-cll + hdr10-opt), libsvtav1 (AV1 enums via ``-svtav1-params``), hevc_nvenc (``-pix_fmt p010le -profile:v main10``), libvvenc. - ``select_hdr_vmaf_model()`` — globs ``model/vmaf_hdr_*.json``; returns ``None`` when none shipped (current state — fork hasn't ported Netflix's HDR model yet). Corpus driver wiring: - ``CorpusOptions.hdr_mode`` ∈ {``auto``, ``force-sdr``, ``force-hdr-pq``, ``force-hdr-hlg``}; CLI flags ``--auto-hdr`` / ``--force-sdr`` / ``--force-hdr-pq`` / ``--force-hdr-hlg`` (mutually exclusive). Auto is the default. - New schema-v2 row keys ``hdr_transfer`` / ``hdr_primaries`` / ``hdr_forced``; ``SCHEMA_VERSION`` bumped 1 → 2. Phase B / C loaders treat missing keys as SDR (additive change, v1 rows remain readable). - ``score._model_arg`` now passes pre-formatted ``path=`` / ``version=`` strings through unchanged so the HDR model path can be injected via ``vmaf --model``. - HDR detected but no HDR model shipped → log warning, fall back to SDR model with notice that scores trend low. Tests (``tools/vmaf-tune/tests/test_hdr.py``, 21 cases): - detection: SDR / PQ / HLG / mismatched-primaries / missing-file / ffprobe-failure / invalid-JSON - codec dispatch: shape per encoder (x264, x265 PQ + HLG, SVT-AV1 PQ + HLG, NVENC HEVC, unknown encoder) - model resolution: empty dir / shipped / multi-version pick-latest / missing dir - corpus integration: end-to-end ``force-hdr-pq`` (verify HDR fields in row + ``-color_*`` in encode argv) and ``force-sdr`` ADR-0261 (Accepted, encode-side; HDR-VMAF scoring deferred until fork-local model port). Research-0054 digest, rebase-notes 0261, AGENTS.md invariant note, docs/usage/vmaf-tune.md HDR section, changelog fragment all included.
Survey 18 capability buckets beyond Phases A/B + the fast-path proposal. Rank by impact / effort. Recommend execution order for Phases C–H of the vmaf-tune umbrella (ADR-0237). Top-5 by impact-÷-effort: bitrate-budget mode, quality-floor mode, bitrate-ladder optimisation (Phase E, the game-changer), per-shot CRF (Phase D, on roadmap), probabilistic proxy. Biggest blocker called out: codec adapter coverage — five buckets degrade to x264-only until x265/SVT-AV1/libaom/libvvenc adapters land. Recommends opening the adapter stream in parallel with Phase D so multi-codec capabilities don't all stack at the end. Pure scoping pass — no code, no implementation. Numbers are back-of-envelope hypotheses; ADRs that consume this digest must re-validate against real corpora. Six deliverables (ADR-0108): - (1) digest = this file - (2) no decision matrix needed: research-only, no decision being made - (3) no rebase-sensitive invariants - (4) reproducer in PR description - (5) no CHANGELOG fragment needed: research-only - (6) no rebase impact: docs-only Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e133d98 to
beff045
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new research digest that audits the potential capability surface of vmaf-tune beyond Phases A/B and the proposed “fast path”, ranking 18 feature buckets by impact vs effort and proposing a Phase C–H execution order.
Changes:
- Introduces a capability audit digest covering 18 potential
vmaf-tunefeature buckets. - Provides an impact/effort ranking and a recommended implementation sequence, highlighting codec-adapter coverage as the main dependency.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+1
to
+6
| # Research-0054: `vmaf-tune` capability audit — beyond Phases A/B + the fast path | ||
|
|
||
| - **Status**: Active | ||
| - **Workstream**: ADR-0237 (`vmaf-tune` umbrella), ADR-0235 (codec-aware FR regressor) | ||
| - **Last updated**: 2026-05-03 | ||
| - **Author**: research scoping pass (no code) |
Comment on lines
+56
to
+75
| ### Bucket 1 — Per-shot CRF tuning (Phase C as written) | ||
|
|
||
| - **Summary**: Use `transnet_v2` to cut the source into shots, run | ||
| Phase B's bisect/proxy per shot, emit `--qpfile` (x264) / | ||
| `--zones` (x265) / SVT-AV1 segment table. | ||
| - **Existing primitives**: `transnet_v2.onnx`, codec adapter | ||
| `emit_per_shot_overrides()` hook (already declared in ADR-0237), | ||
| Phase B bisect. | ||
| - **Effort**: **M** — shot-aware orchestration + per-codec override | ||
| emission; non-trivial because shot-boundary frames near GOP edges | ||
| need encoder-specific handling. | ||
| - **Impact**: **High** — Netflix's per-shot encoding is the canonical | ||
| reference; same-VMAF bitrate savings of 10–30% are the public | ||
| numbers from their 2018 paper. | ||
| - **Open**: do we re-train the proxy on per-shot canonical-6, or is | ||
| the per-title proxy "good enough" if features are computed per | ||
| shot? (Hypothesis: per-title proxy generalises if features are | ||
| recomputed; needs a held-out check on BVI-DVC.) | ||
| - **Already in roadmap**: yes — Phase D in ADR-0237, gated on T6-3b | ||
| per-shot CRF predictor. |
Comment on lines
+141
to
+142
| user-facing flag (`--target-vmaf`, exists; `--minimise bitrate`, | ||
| doesn't). |
Comment on lines
+398
to
+401
| 2. **Phase B docs follow-up** (≤ 1 week) — ship Buckets #4 + #5 | ||
| explicitly as `--target-bitrate` and `--target-vmaf` modes; | ||
| trivial flag work, big perceived feature add. *Highest impact | ||
| ÷ effort in the audit.* |
Co-Authored-By: Claude <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 4, 2026
… flags Implements Buckets 4 + 5 from Research-0061 (vmaf-tune capability audit). Adds a new `recommend` subcommand on `tools/vmaf-tune/` that consumes the Phase A corpus (either pre-built JSONL via `--from-corpus` or generated on the fly from `--source` + grid flags) and applies a user-supplied predicate over the existing `(crf, preset, vmaf_score, bitrate_kbps)` rows. - `--target-vmaf T` returns the row with the smallest CRF whose `vmaf_score >= T`. Falls back to the closest miss (highest VMAF) when no row clears the bar, with the predicate annotated `(UNMET)`. - `--target-bitrate KBPS` returns the row whose `bitrate_kbps` is closest to `KBPS` (absolute distance, ties broken by smaller CRF). The two flags are mutually exclusive at the argparse layer (exit code 2 when both are passed). Default output is a single human-readable line on stdout; `--json` switches to the full corpus row as a JSON object. No schema bump — `recommend` is a pure consumer of `CORPUS_ROW_KEYS`. 13-test suite under `tools/vmaf-tune/tests/test_recommend.py` covers predicate semantics, encoder/preset filtering, NaN/failed-encode rejection, and CLI exit codes; mocks all binaries so it runs in <100 ms. Six ADR-0108 deliverables: 1. Research digest — Research-0061 (parent, in flight via PR #354). 2. Decision matrix — no alternatives: only-one-way fix; the audit's ranked-by-impact/effort table is the alternatives matrix. 3. AGENTS.md invariant note — added in tools/vmaf-tune/AGENTS.md. 4. Reproducer / smoke test — pytest tools/vmaf-tune/tests/. 5. CHANGELOG.md — Added entry. 6. docs/rebase-notes.md — entry 0229. Parent ADR: ADR-0237 (vmaf-tune umbrella). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
May 5, 2026
… flags (#358) * feat(tools): vmaf-tune recommend — --target-vmaf and --target-bitrate flags Implements Buckets 4 + 5 from Research-0061 (vmaf-tune capability audit). Adds a new `recommend` subcommand on `tools/vmaf-tune/` that consumes the Phase A corpus (either pre-built JSONL via `--from-corpus` or generated on the fly from `--source` + grid flags) and applies a user-supplied predicate over the existing `(crf, preset, vmaf_score, bitrate_kbps)` rows. - `--target-vmaf T` returns the row with the smallest CRF whose `vmaf_score >= T`. Falls back to the closest miss (highest VMAF) when no row clears the bar, with the predicate annotated `(UNMET)`. - `--target-bitrate KBPS` returns the row whose `bitrate_kbps` is closest to `KBPS` (absolute distance, ties broken by smaller CRF). The two flags are mutually exclusive at the argparse layer (exit code 2 when both are passed). Default output is a single human-readable line on stdout; `--json` switches to the full corpus row as a JSON object. No schema bump — `recommend` is a pure consumer of `CORPUS_ROW_KEYS`. 13-test suite under `tools/vmaf-tune/tests/test_recommend.py` covers predicate semantics, encoder/preset filtering, NaN/failed-encode rejection, and CLI exit codes; mocks all binaries so it runs in <100 ms. Six ADR-0108 deliverables: 1. Research digest — Research-0061 (parent, in flight via PR #354). 2. Decision matrix — no alternatives: only-one-way fix; the audit's ranked-by-impact/effort table is the alternatives matrix. 3. AGENTS.md invariant note — added in tools/vmaf-tune/AGENTS.md. 4. Reproducer / smoke test — pytest tools/vmaf-tune/tests/. 5. CHANGELOG.md — Added entry. 6. docs/rebase-notes.md — entry 0229. Parent ADR: ADR-0237 (vmaf-tune umbrella). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(ci): trigger workflow re-run Co-Authored-By: Claude <noreply@anthropic.com> * fix(tools): close fast.add_argument before rec parser (rebase residue) The marker-strip during the #358 rebase dropped the closing ')' between fast.add_argument and rec = sub.add_parser, leaving cli.py unparseable. Black + ruff hard-failed in CI. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 5, 2026
Adds the first sibling codec to the ADR-0237 Phase A `libx264` scaffold: a one-file `X265Adapter` mirroring the `x264.py` shape (10 presets including `placebo`, 0..51 CRF window, `profile_for(pix_fmt)` helper that maps `yuv420p10le` → `main10` for downstream HDR work). Registered under `libx265` in `codec_adapters/__init__.py`; CLI `--encoder` now accepts `libx264 | libx265` via `choices=list(known_codecs())`. `encode.parse_versions` gains an encoder-aware regex so corpus rows record `libx265-<version>` correctly (default remains `libx264` for backward compatibility). No `SCHEMA_VERSION` bump — the existing `encoder` row column already carries codec identity. Phase B/C consumers receive the new codec without any contract change. 14 new subprocess-mocked smoke tests under `tools/vmaf-tune/tests/test_codec_adapter_x265.py` (29 of 30 vmaf-tune tests pass green; the one skipped case is the real-binary integration test gated on `VMAF_TUNE_INTEGRATION=1`). Unblocks ADR-0235 codec-aware FR regressor and PR #354 audit's buckets #6 (bitrate-ladder), #7 (codec-comparison), #9 (HDR), #15 (Pareto). Six deep-dive deliverables (ADR-0108): 1. research digest: no digest needed — trivial mirror of `x264.py`; alternatives matrix is exhausted in ADR-0276. 2. decision matrix: ADR-0276 §Alternatives considered. 3. AGENTS.md invariant note: tools/vmaf-tune/AGENTS.md updated to document the per-codec banner-regex carve-out in `parse_versions` and the wired-codecs list (libx264, libx265). 4. reproducer: `python -m pytest tools/vmaf-tune/tests/`. 5. CHANGELOG: changelog.d/added/ADR-0276-vmaf-tune-x265-adapter.md. 6. rebase-notes: docs/rebase-notes.md entry 0228. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
May 5, 2026
* feat(tools): vmaf-tune — x265 codec adapter (ADR-0276) Adds the first sibling codec to the ADR-0237 Phase A `libx264` scaffold: a one-file `X265Adapter` mirroring the `x264.py` shape (10 presets including `placebo`, 0..51 CRF window, `profile_for(pix_fmt)` helper that maps `yuv420p10le` → `main10` for downstream HDR work). Registered under `libx265` in `codec_adapters/__init__.py`; CLI `--encoder` now accepts `libx264 | libx265` via `choices=list(known_codecs())`. `encode.parse_versions` gains an encoder-aware regex so corpus rows record `libx265-<version>` correctly (default remains `libx264` for backward compatibility). No `SCHEMA_VERSION` bump — the existing `encoder` row column already carries codec identity. Phase B/C consumers receive the new codec without any contract change. 14 new subprocess-mocked smoke tests under `tools/vmaf-tune/tests/test_codec_adapter_x265.py` (29 of 30 vmaf-tune tests pass green; the one skipped case is the real-binary integration test gated on `VMAF_TUNE_INTEGRATION=1`). Unblocks ADR-0235 codec-aware FR regressor and PR #354 audit's buckets #6 (bitrate-ladder), #7 (codec-comparison), #9 (HDR), #15 (Pareto). Six deep-dive deliverables (ADR-0108): 1. research digest: no digest needed — trivial mirror of `x264.py`; alternatives matrix is exhausted in ADR-0276. 2. decision matrix: ADR-0276 §Alternatives considered. 3. AGENTS.md invariant note: tools/vmaf-tune/AGENTS.md updated to document the per-codec banner-regex carve-out in `parse_versions` and the wired-codecs list (libx264, libx265). 4. reproducer: `python -m pytest tools/vmaf-tune/tests/`. 5. CHANGELOG: changelog.d/added/ADR-0276-vmaf-tune-x265-adapter.md. 6. rebase-notes: docs/rebase-notes.md entry 0228. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: re-trigger CI after deliverables canonicalised * chore: re-trigger CI after research-digest opt-out * chore(tools): black format test_corpus.py --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 5, 2026
Adds tools/vmaf-tune/src/vmaftune/resolution.py exposing select_vmaf_model_version (height>=2160 -> vmaf_4k_v0.6.1, else vmaf_v0.6.1), select_vmaf_model (Path), and crf_offset_for_resolution (4K -2, 1080p 0, 720p +2, sub-720p +4). corpus.iter_rows auto-picks the model per encode resolution; CLI gains --resolution-aware / --no-resolution-aware (default on). Emitted JSONL row's vmaf_model field now records the *effective* model used per row, not the global option — required for mixed-ladder corpora to be unambiguous downstream. Closes PR #354 audit Bucket #8. Decision rule mirrors Netflix's published guidance. Refs: ADR-0237 (parent), ADR-0280 (this), Research-0054. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
May 5, 2026
…ts (#363) Adds tools/vmaf-tune/src/vmaftune/resolution.py exposing select_vmaf_model_version (height>=2160 -> vmaf_4k_v0.6.1, else vmaf_v0.6.1), select_vmaf_model (Path), and crf_offset_for_resolution (4K -2, 1080p 0, 720p +2, sub-720p +4). corpus.iter_rows auto-picks the model per encode resolution; CLI gains --resolution-aware / --no-resolution-aware (default on). Emitted JSONL row's vmaf_model field now records the *effective* model used per row, not the global option — required for mixed-ladder corpora to be unambiguous downstream. Closes PR #354 audit Bucket #8. Decision rule mirrors Netflix's published guidance. Refs: ADR-0237 (parent), ADR-0280 (this), Research-0054. Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 5, 2026
Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 5, 2026
Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
May 5, 2026
…er) (#371) * feat(tools): vmaf-tune Phase E — per-title bitrate ladder (game-changer) Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(docs): renumber phase-e ADR 0277→0295 + research 0066→0068 (collisions) --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 5, 2026
…al scaffold) Adds a probabilistic head on top of the codec-aware fr_regressor_v2 (parent: ADR-0272 / PR #347 in flight) so producers can drive the in-flight `vmaf-tune --quality-confidence 0.95` flag (ADR-0237) off a calibrated prediction interval instead of v2's bare MOS scalar. PR #354 audit Bucket #18 (top-3 ranked). Trainer (`ai/scripts/train_fr_regressor_v2_ensemble.py`) trains N=5 copies of the v2 architecture (`FRRegressor(num_codecs=NUM_CODECS)`) under distinct seeds, exports each as a separate two-input ONNX (`features [N, 6]` + `codec_onehot [N, NUM_CODECS]`), and writes an ensemble manifest sidecar that pins per-member sha256s, feature standardisation, codec vocab, nominal coverage, and an optional split-conformal residual quantile from a held-out calibration split. Inference rule is `mu ± q · σ` with `q = 1.96` (Gaussian) or the empirical conformal quantile (Vovk 2005, Romano 2019 — distribution-free marginal coverage on exchangeable data). Evaluator (`ai/scripts/eval_probabilistic_proxy.py`) reports empirical coverage at 50/80/95 % nominal levels, mean interval width, and the mean-prediction PLCC; reports the conformal-interval row when the manifest carries a conformal scalar. Smoke-only ship: synthetic 100-row corpus, 1 epoch / member. Production training is gated on the multi-codec Phase A corpus (T7-FR-REGRESSOR-V2-PROBABILISTIC). Six ADR-0108 deliverables: 1. Research digest: docs/research/0054-fr-regressor-v2-probabilistic.md. 2. Decision matrix: ADR-0279 § Alternatives considered. 3. AGENTS.md invariant note: appended to ai/AGENTS.md. 4. Reproducer: `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` followed by `python ai/scripts/eval_probabilistic_proxy.py --smoke`. 5. CHANGELOG ### Added entry under Unreleased — lusoris fork. 6. Rebase-notes entry: ### 0229 in docs/rebase-notes.md. Test plan: - `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` produces 5 valid two-input ONNX members + manifest sidecar (ran locally). - `python ai/scripts/eval_probabilistic_proxy.py --smoke` aggregates the 5 ONNX outputs into (mu, sigma) and reports coverage at 50/80/95 %. - `python ai/scripts/validate_model_registry.py` → 15 entries valid. - `pre-commit run --files <changed>` → Passed (black / isort / ruff / json-check / secrets / semgrep). - `markdownlint-cli2` on all new docs → 0 errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
May 5, 2026
…al scaffold) (#372) * feat(ai): fr_regressor_v2 probabilistic head (deep-ensemble + conformal scaffold) Adds a probabilistic head on top of the codec-aware fr_regressor_v2 (parent: ADR-0272 / PR #347 in flight) so producers can drive the in-flight `vmaf-tune --quality-confidence 0.95` flag (ADR-0237) off a calibrated prediction interval instead of v2's bare MOS scalar. PR #354 audit Bucket #18 (top-3 ranked). Trainer (`ai/scripts/train_fr_regressor_v2_ensemble.py`) trains N=5 copies of the v2 architecture (`FRRegressor(num_codecs=NUM_CODECS)`) under distinct seeds, exports each as a separate two-input ONNX (`features [N, 6]` + `codec_onehot [N, NUM_CODECS]`), and writes an ensemble manifest sidecar that pins per-member sha256s, feature standardisation, codec vocab, nominal coverage, and an optional split-conformal residual quantile from a held-out calibration split. Inference rule is `mu ± q · σ` with `q = 1.96` (Gaussian) or the empirical conformal quantile (Vovk 2005, Romano 2019 — distribution-free marginal coverage on exchangeable data). Evaluator (`ai/scripts/eval_probabilistic_proxy.py`) reports empirical coverage at 50/80/95 % nominal levels, mean interval width, and the mean-prediction PLCC; reports the conformal-interval row when the manifest carries a conformal scalar. Smoke-only ship: synthetic 100-row corpus, 1 epoch / member. Production training is gated on the multi-codec Phase A corpus (T7-FR-REGRESSOR-V2-PROBABILISTIC). Six ADR-0108 deliverables: 1. Research digest: docs/research/0054-fr-regressor-v2-probabilistic.md. 2. Decision matrix: ADR-0279 § Alternatives considered. 3. AGENTS.md invariant note: appended to ai/AGENTS.md. 4. Reproducer: `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` followed by `python ai/scripts/eval_probabilistic_proxy.py --smoke`. 5. CHANGELOG ### Added entry under Unreleased — lusoris fork. 6. Rebase-notes entry: ### 0229 in docs/rebase-notes.md. Test plan: - `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` produces 5 valid two-input ONNX members + manifest sidecar (ran locally). - `python ai/scripts/eval_probabilistic_proxy.py --smoke` aggregates the 5 ONNX outputs into (mu, sigma) and reports coverage at 50/80/95 %. - `python ai/scripts/validate_model_registry.py` → 15 entries valid. - `pre-commit run --files <changed>` → Passed (black / isort / ruff / json-check / secrets / semgrep). - `markdownlint-cli2` on all new docs → 0 errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(registry): split fr_regressor_v2 + ensemble_seed0 into distinct entries --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 5, 2026
Wires the fork-trained `saliency_student_v1` ONNX model (ADR-0286 / PR #359) into vmaf-tune so a single command can produce an encode that biases bits toward salient regions. New surfaces: - `tools/vmaf-tune/src/vmaftune/saliency.py` — pure-NumPy signal-blend pipeline (sample frames -> ImageNet-normalised RGB -> ONNX inference -> per-pixel saliency mean -> per-MB QP-offset map clamped to ±12 -> x264 ASCII qpfile). - `vmaf-tune recommend --saliency-aware [--saliency-offset -4] [--saliency-model PATH]` CLI subcommand. Falls back to a plain encode when onnxruntime or the model file is unavailable; the flag surface is wired so Phase B (target-VMAF bisect) can drop in without renaming flags. - 13 unit tests under `tests/test_saliency.py` mocking both the ONNX session and the encode runner — runs without onnxruntime or ffmpeg installed. Bucket #2 of the PR #354 audit. Six ADR-0108 deliverables: 1. Research-0046 (digest) 2. ADR-0287 §"Alternatives considered" (decision matrix) 3. tools/vmaf-tune/AGENTS.md (saliency invariant) 4. `pytest tools/vmaf-tune/tests/test_saliency.py -v` (smoke) 5. changelog.d/added/T-VMAF-TUNE-saliency-aware.md 6. docs/rebase-notes.md §0287 User docs: docs/usage/vmaf-tune.md §"Saliency-aware encoding". Decision: ADR-0287. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
May 5, 2026
Wires the fork-trained `saliency_student_v1` ONNX model (ADR-0286 / PR #359) into vmaf-tune so a single command can produce an encode that biases bits toward salient regions. New surfaces: - `tools/vmaf-tune/src/vmaftune/saliency.py` — pure-NumPy signal-blend pipeline (sample frames -> ImageNet-normalised RGB -> ONNX inference -> per-pixel saliency mean -> per-MB QP-offset map clamped to ±12 -> x264 ASCII qpfile). - `vmaf-tune recommend --saliency-aware [--saliency-offset -4] [--saliency-model PATH]` CLI subcommand. Falls back to a plain encode when onnxruntime or the model file is unavailable; the flag surface is wired so Phase B (target-VMAF bisect) can drop in without renaming flags. - 13 unit tests under `tests/test_saliency.py` mocking both the ONNX session and the encode runner — runs without onnxruntime or ffmpeg installed. Bucket #2 of the PR #354 audit. Six ADR-0108 deliverables: 1. Research-0046 (digest) 2. ADR-0287 §"Alternatives considered" (decision matrix) 3. tools/vmaf-tune/AGENTS.md (saliency invariant) 4. `pytest tools/vmaf-tune/tests/test_saliency.py -v` (smoke) 5. changelog.d/added/T-VMAF-TUNE-saliency-aware.md 6. docs/rebase-notes.md §0287 User docs: docs/usage/vmaf-tune.md §"Saliency-aware encoding". Decision: ADR-0287. Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 5, 2026
…#9, ADR-0261) Closes Bucket #9 of the PR #354 vmaf-tune capability audit. Adds ffprobe-driven HDR detection, codec-specific HDR encode flag dispatch, and HDR-VMAF model resolution to the Phase A corpus driver. New module ``tools/vmaf-tune/src/vmaftune/hdr.py``: - ``detect_hdr(path)`` — runs ``ffprobe -show_streams -of json``, classifies the first video stream as PQ / HLG / SDR. Strict BT.2020-primaries gate so malformed signaling falls back to SDR. - ``hdr_codec_args(encoder, info)`` — per-encoder dispatch table covering libx264 (container ``-color_*``), libx265 (``-x265-params`` with master-display + max-cll + hdr10-opt), libsvtav1 (AV1 enums via ``-svtav1-params``), hevc_nvenc (``-pix_fmt p010le -profile:v main10``), libvvenc. - ``select_hdr_vmaf_model()`` — globs ``model/vmaf_hdr_*.json``; returns ``None`` when none shipped (current state — fork hasn't ported Netflix's HDR model yet). Corpus driver wiring: - ``CorpusOptions.hdr_mode`` ∈ {``auto``, ``force-sdr``, ``force-hdr-pq``, ``force-hdr-hlg``}; CLI flags ``--auto-hdr`` / ``--force-sdr`` / ``--force-hdr-pq`` / ``--force-hdr-hlg`` (mutually exclusive). Auto is the default. - New schema-v2 row keys ``hdr_transfer`` / ``hdr_primaries`` / ``hdr_forced``; ``SCHEMA_VERSION`` bumped 1 → 2. Phase B / C loaders treat missing keys as SDR (additive change, v1 rows remain readable). - ``score._model_arg`` now passes pre-formatted ``path=`` / ``version=`` strings through unchanged so the HDR model path can be injected via ``vmaf --model``. - HDR detected but no HDR model shipped → log warning, fall back to SDR model with notice that scores trend low. Tests (``tools/vmaf-tune/tests/test_hdr.py``, 21 cases): - detection: SDR / PQ / HLG / mismatched-primaries / missing-file / ffprobe-failure / invalid-JSON - codec dispatch: shape per encoder (x264, x265 PQ + HLG, SVT-AV1 PQ + HLG, NVENC HEVC, unknown encoder) - model resolution: empty dir / shipped / multi-version pick-latest / missing dir - corpus integration: end-to-end ``force-hdr-pq`` (verify HDR fields in row + ``-color_*`` in encode argv) and ``force-sdr`` ADR-0261 (Accepted, encode-side; HDR-VMAF scoring deferred until fork-local model port). Research-0054 digest, rebase-notes 0261, AGENTS.md invariant note, docs/usage/vmaf-tune.md HDR section, changelog fragment all included.
lusoris
added a commit
that referenced
this pull request
May 5, 2026
* feat(tools): vmaf-tune — HDR-aware encoding + HDR-VMAF scoring (Bucket #9, ADR-0261) Closes Bucket #9 of the PR #354 vmaf-tune capability audit. Adds ffprobe-driven HDR detection, codec-specific HDR encode flag dispatch, and HDR-VMAF model resolution to the Phase A corpus driver. New module ``tools/vmaf-tune/src/vmaftune/hdr.py``: - ``detect_hdr(path)`` — runs ``ffprobe -show_streams -of json``, classifies the first video stream as PQ / HLG / SDR. Strict BT.2020-primaries gate so malformed signaling falls back to SDR. - ``hdr_codec_args(encoder, info)`` — per-encoder dispatch table covering libx264 (container ``-color_*``), libx265 (``-x265-params`` with master-display + max-cll + hdr10-opt), libsvtav1 (AV1 enums via ``-svtav1-params``), hevc_nvenc (``-pix_fmt p010le -profile:v main10``), libvvenc. - ``select_hdr_vmaf_model()`` — globs ``model/vmaf_hdr_*.json``; returns ``None`` when none shipped (current state — fork hasn't ported Netflix's HDR model yet). Corpus driver wiring: - ``CorpusOptions.hdr_mode`` ∈ {``auto``, ``force-sdr``, ``force-hdr-pq``, ``force-hdr-hlg``}; CLI flags ``--auto-hdr`` / ``--force-sdr`` / ``--force-hdr-pq`` / ``--force-hdr-hlg`` (mutually exclusive). Auto is the default. - New schema-v2 row keys ``hdr_transfer`` / ``hdr_primaries`` / ``hdr_forced``; ``SCHEMA_VERSION`` bumped 1 → 2. Phase B / C loaders treat missing keys as SDR (additive change, v1 rows remain readable). - ``score._model_arg`` now passes pre-formatted ``path=`` / ``version=`` strings through unchanged so the HDR model path can be injected via ``vmaf --model``. - HDR detected but no HDR model shipped → log warning, fall back to SDR model with notice that scores trend low. Tests (``tools/vmaf-tune/tests/test_hdr.py``, 21 cases): - detection: SDR / PQ / HLG / mismatched-primaries / missing-file / ffprobe-failure / invalid-JSON - codec dispatch: shape per encoder (x264, x265 PQ + HLG, SVT-AV1 PQ + HLG, NVENC HEVC, unknown encoder) - model resolution: empty dir / shipped / multi-version pick-latest / missing dir - corpus integration: end-to-end ``force-hdr-pq`` (verify HDR fields in row + ``-color_*`` in encode argv) and ``force-sdr`` ADR-0261 (Accepted, encode-side; HDR-VMAF scoring deferred until fork-local model port). Research-0054 digest, rebase-notes 0261, AGENTS.md invariant note, docs/usage/vmaf-tune.md HDR section, changelog fragment all included. * chore(docs): renumber hdr-aware ADR/research to dodge collisions (0295→0300, 0261→0300, 0071→0072) * fix(tools): drop duplicate vmaf_model dict key in corpus.py --------- Co-authored-by: Lusoris <lusoris@pm.me>
This was referenced May 6, 2026
lusoris
pushed a commit
that referenced
this pull request
May 6, 2026
Wires the fork-trained `saliency_student_v1` ONNX model (ADR-0286 / PR #359) into vmaf-tune so a single command can produce an encode that biases bits toward salient regions. New surfaces: - `tools/vmaf-tune/src/vmaftune/saliency.py` — pure-NumPy signal-blend pipeline (sample frames -> ImageNet-normalised RGB -> ONNX inference -> per-pixel saliency mean -> per-MB QP-offset map clamped to ±12 -> x264 ASCII qpfile). - `vmaf-tune recommend --saliency-aware [--saliency-offset -4] [--saliency-model PATH]` CLI subcommand. Falls back to a plain encode when onnxruntime or the model file is unavailable; the flag surface is wired so Phase B (target-VMAF bisect) can drop in without renaming flags. - 13 unit tests under `tests/test_saliency.py` mocking both the ONNX session and the encode runner — runs without onnxruntime or ffmpeg installed. Bucket #2 of the PR #354 audit. Six ADR-0108 deliverables: 1. Research-0046 (digest) 2. ADR-0287 §"Alternatives considered" (decision matrix) 3. tools/vmaf-tune/AGENTS.md (saliency invariant) 4. `pytest tools/vmaf-tune/tests/test_saliency.py -v` (smoke) 5. changelog.d/added/T-VMAF-TUNE-saliency-aware.md 6. docs/rebase-notes.md §0287 User docs: docs/usage/vmaf-tune.md §"Saliency-aware encoding". Decision: ADR-0287. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
May 6, 2026
* feat(tools): vmaf-tune — saliency-aware ROI encoding (Bucket #2) Wires the fork-trained `saliency_student_v1` ONNX model (ADR-0286 / PR #359) into vmaf-tune so a single command can produce an encode that biases bits toward salient regions. New surfaces: - `tools/vmaf-tune/src/vmaftune/saliency.py` — pure-NumPy signal-blend pipeline (sample frames -> ImageNet-normalised RGB -> ONNX inference -> per-pixel saliency mean -> per-MB QP-offset map clamped to ±12 -> x264 ASCII qpfile). - `vmaf-tune recommend --saliency-aware [--saliency-offset -4] [--saliency-model PATH]` CLI subcommand. Falls back to a plain encode when onnxruntime or the model file is unavailable; the flag surface is wired so Phase B (target-VMAF bisect) can drop in without renaming flags. - 13 unit tests under `tests/test_saliency.py` mocking both the ONNX session and the encode runner — runs without onnxruntime or ffmpeg installed. Bucket #2 of the PR #354 audit. Six ADR-0108 deliverables: 1. Research-0046 (digest) 2. ADR-0287 §"Alternatives considered" (decision matrix) 3. tools/vmaf-tune/AGENTS.md (saliency invariant) 4. `pytest tools/vmaf-tune/tests/test_saliency.py -v` (smoke) 5. changelog.d/added/T-VMAF-TUNE-saliency-aware.md 6. docs/rebase-notes.md §0287 User docs: docs/usage/vmaf-tune.md §"Saliency-aware encoding". Decision: ADR-0287. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(vmaf-tune): CHANGELOG fragment for recommend-saliency CLI ADR-0108 deliverables-checklist gate on PR #432 wants the fragment to be added in this PR's diff, not just present on master. The existing fragment ``T-VMAF-TUNE-saliency-aware.md`` covers the saliency engine that's already merged; this new fragment covers the CLI subcommand specifically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 7, 2026
Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 7, 2026
…#9, ADR-0261) Closes Bucket #9 of the PR #354 vmaf-tune capability audit. Adds ffprobe-driven HDR detection, codec-specific HDR encode flag dispatch, and HDR-VMAF model resolution to the Phase A corpus driver. New module ``tools/vmaf-tune/src/vmaftune/hdr.py``: - ``detect_hdr(path)`` — runs ``ffprobe -show_streams -of json``, classifies the first video stream as PQ / HLG / SDR. Strict BT.2020-primaries gate so malformed signaling falls back to SDR. - ``hdr_codec_args(encoder, info)`` — per-encoder dispatch table covering libx264 (container ``-color_*``), libx265 (``-x265-params`` with master-display + max-cll + hdr10-opt), libsvtav1 (AV1 enums via ``-svtav1-params``), hevc_nvenc (``-pix_fmt p010le -profile:v main10``), libvvenc. - ``select_hdr_vmaf_model()`` — globs ``model/vmaf_hdr_*.json``; returns ``None`` when none shipped (current state — fork hasn't ported Netflix's HDR model yet). Corpus driver wiring: - ``CorpusOptions.hdr_mode`` ∈ {``auto``, ``force-sdr``, ``force-hdr-pq``, ``force-hdr-hlg``}; CLI flags ``--auto-hdr`` / ``--force-sdr`` / ``--force-hdr-pq`` / ``--force-hdr-hlg`` (mutually exclusive). Auto is the default. - New schema-v2 row keys ``hdr_transfer`` / ``hdr_primaries`` / ``hdr_forced``; ``SCHEMA_VERSION`` bumped 1 → 2. Phase B / C loaders treat missing keys as SDR (additive change, v1 rows remain readable). - ``score._model_arg`` now passes pre-formatted ``path=`` / ``version=`` strings through unchanged so the HDR model path can be injected via ``vmaf --model``. - HDR detected but no HDR model shipped → log warning, fall back to SDR model with notice that scores trend low. Tests (``tools/vmaf-tune/tests/test_hdr.py``, 21 cases): - detection: SDR / PQ / HLG / mismatched-primaries / missing-file / ffprobe-failure / invalid-JSON - codec dispatch: shape per encoder (x264, x265 PQ + HLG, SVT-AV1 PQ + HLG, NVENC HEVC, unknown encoder) - model resolution: empty dir / shipped / multi-version pick-latest / missing dir - corpus integration: end-to-end ``force-hdr-pq`` (verify HDR fields in row + ``-color_*`` in encode argv) and ``force-sdr`` ADR-0261 (Accepted, encode-side; HDR-VMAF scoring deferred until fork-local model port). Research-0054 digest, rebase-notes 0261, AGENTS.md invariant note, docs/usage/vmaf-tune.md HDR section, changelog fragment all included.
lusoris
added a commit
that referenced
this pull request
May 7, 2026
…er) (#433) Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
May 7, 2026
…#9, ADR-0261) (#434) Closes Bucket #9 of the PR #354 vmaf-tune capability audit. Adds ffprobe-driven HDR detection, codec-specific HDR encode flag dispatch, and HDR-VMAF model resolution to the Phase A corpus driver. New module ``tools/vmaf-tune/src/vmaftune/hdr.py``: - ``detect_hdr(path)`` — runs ``ffprobe -show_streams -of json``, classifies the first video stream as PQ / HLG / SDR. Strict BT.2020-primaries gate so malformed signaling falls back to SDR. - ``hdr_codec_args(encoder, info)`` — per-encoder dispatch table covering libx264 (container ``-color_*``), libx265 (``-x265-params`` with master-display + max-cll + hdr10-opt), libsvtav1 (AV1 enums via ``-svtav1-params``), hevc_nvenc (``-pix_fmt p010le -profile:v main10``), libvvenc. - ``select_hdr_vmaf_model()`` — globs ``model/vmaf_hdr_*.json``; returns ``None`` when none shipped (current state — fork hasn't ported Netflix's HDR model yet). Corpus driver wiring: - ``CorpusOptions.hdr_mode`` ∈ {``auto``, ``force-sdr``, ``force-hdr-pq``, ``force-hdr-hlg``}; CLI flags ``--auto-hdr`` / ``--force-sdr`` / ``--force-hdr-pq`` / ``--force-hdr-hlg`` (mutually exclusive). Auto is the default. - New schema-v2 row keys ``hdr_transfer`` / ``hdr_primaries`` / ``hdr_forced``; ``SCHEMA_VERSION`` bumped 1 → 2. Phase B / C loaders treat missing keys as SDR (additive change, v1 rows remain readable). - ``score._model_arg`` now passes pre-formatted ``path=`` / ``version=`` strings through unchanged so the HDR model path can be injected via ``vmaf --model``. - HDR detected but no HDR model shipped → log warning, fall back to SDR model with notice that scores trend low. Tests (``tools/vmaf-tune/tests/test_hdr.py``, 21 cases): - detection: SDR / PQ / HLG / mismatched-primaries / missing-file / ffprobe-failure / invalid-JSON - codec dispatch: shape per encoder (x264, x265 PQ + HLG, SVT-AV1 PQ + HLG, NVENC HEVC, unknown encoder) - model resolution: empty dir / shipped / multi-version pick-latest / missing dir - corpus integration: end-to-end ``force-hdr-pq`` (verify HDR fields in row + ``-color_*`` in encode argv) and ``force-sdr`` ADR-0261 (Accepted, encode-side; HDR-VMAF scoring deferred until fork-local model port). Research-0054 digest, rebase-notes 0261, AGENTS.md invariant note, docs/usage/vmaf-tune.md HDR section, changelog fragment all included. Co-authored-by: Lusoris <lusoris@pm.me>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Pure-research scoping pass: audits the
vmaf-tunecapability surfacebeyond Phase A (corpus tooling, PR #329 merged), Phase B
(
fr_regressor_v2codec-aware proxy, in flight via PR #347), and thefast-path proposal (proxy + Bayesian + GPU verify, parallel PR being
scaffolded right now).
Surveys 18 capability buckets, ranks them by impact ÷ effort, and
recommends an execution order for Phases C–H. No code; the digest
extends Research-0044 and feeds ADR-0237's umbrella roadmap.
Top-3 next steps (impact ÷ effort): (1) bitrate-budget mode +
quality-floor mode (tied; both S effort, both reuse Phase B's
bisect with a different predicate), (2) bitrate-ladder optimisation
(Phase E, L effort, game-changer), (3) per-shot CRF tuning
(Phase D, on roadmap, M effort).
Game-changer: bitrate-ladder optimisation re-frames the fork from
"best open-source VMAF measurement" to "only open-source per-title
ladder generator with measured-PLCC proxy".
Biggest blocker: codec adapter coverage. Five buckets degrade to
x264-only until x265 / SVT-AV1 / libaom / libvvenc adapters land.
Recommend running adapter PRs in parallel with Phase D so multi-codec
capabilities don't all stack at the end.
Type
docs— documentation onlyChecklist
Bug-status hygiene
Netflix golden-data gate
assertAlmostEqualscore.Deep-dive deliverables (ADR-0108)
docs/research/0054-vmaf-tune-capability-audit.md(this PR).no alternatives: research-only scoping, no decision is being made here. Decisions land in the per-phase ADRs that will consume this digest.
no rebase-sensitive invariants: docs-only.
no CHANGELOG needed: research-only digest, not a user-visible change.
no rebase impact: docs-only addition under
docs/research/.Reproducer
Out of scope