Skip to content

research(tools): vmaf-tune capability audit — what else can it do?#354

Merged
lusoris merged 2 commits intomasterfrom
research/vmaf-tune-capability-audit
May 4, 2026
Merged

research(tools): vmaf-tune capability audit — what else can it do?#354
lusoris merged 2 commits intomasterfrom
research/vmaf-tune-capability-audit

Conversation

@lusoris
Copy link
Copy Markdown
Owner

@lusoris lusoris commented May 3, 2026

Summary

Pure-research scoping pass: audits the vmaf-tune capability surface
beyond Phase A (corpus tooling, PR #329 merged), Phase B
(fr_regressor_v2 codec-aware proxy, in flight via PR #347), and the
fast-path proposal (proxy + Bayesian + GPU verify, parallel PR being
scaffolded right now).

Surveys 18 capability buckets, ranks them by impact ÷ effort, and
recommends an execution order for Phases C–H. No code; the digest
extends Research-0044 and feeds ADR-0237's umbrella roadmap.

Top-3 next steps (impact ÷ effort): (1) bitrate-budget mode +
quality-floor mode (tied; both S effort, both reuse Phase B's
bisect with a different predicate), (2) bitrate-ladder optimisation
(Phase E, L effort, game-changer), (3) per-shot CRF tuning
(Phase D, on roadmap, M effort).

Game-changer: bitrate-ladder optimisation re-frames the fork from
"best open-source VMAF measurement" to "only open-source per-title
ladder generator with measured-PLCC proxy".

Biggest blocker: codec adapter coverage. Five buckets degrade to
x264-only until x265 / SVT-AV1 / libaom / libvvenc adapters land.
Recommend running adapter PRs in parallel with Phase D so multi-codec
capabilities don't all stack at the end.

Type

  • docs — documentation only

Checklist

  • Commits follow Conventional Commits.
  • No code touched (pure research digest).

Bug-status hygiene

  • no state delta: research scoping, no bug interaction.

Netflix golden-data gate

  • Did not modify any assertAlmostEqual score.

Deep-dive deliverables (ADR-0108)

  • (1) Research digest: docs/research/0054-vmaf-tune-capability-audit.md (this PR).
  • (2) Decision matrix
    no alternatives: research-only scoping, no decision is being made here. Decisions land in the per-phase ADRs that will consume this digest.
  • (3) AGENTS.md invariant note
    no rebase-sensitive invariants: docs-only.
  • (4) Reproducer / smoke-test command: see below under "Reproducer".
  • (5) CHANGELOG fragment
    no CHANGELOG needed: research-only digest, not a user-visible change.
  • (6) Rebase note
    no rebase impact: docs-only addition under docs/research/.

Reproducer

# Verify the digest renders and links resolve
mkdocs build --strict 2>&1 | grep -i "0054-vmaf-tune-capability-audit" || echo "renders clean"
# Or just:
ls docs/research/0054-vmaf-tune-capability-audit.md

Out of scope

  • No implementation.
  • No promised speedup numbers without back-of-envelope justification (every "X×" hypothesis is flagged as such in the digest).
  • No Netflix-internal data / services.

@lusoris lusoris force-pushed the research/vmaf-tune-capability-audit branch from 2375000 to e133d98 Compare May 3, 2026 19:24
lusoris pushed a commit that referenced this pull request May 3, 2026
Adds the first sibling codec to the ADR-0237 Phase A `libx264` scaffold:
a one-file `X265Adapter` mirroring the `x264.py` shape (10 presets
including `placebo`, 0..51 CRF window, `profile_for(pix_fmt)` helper
that maps `yuv420p10le` → `main10` for downstream HDR work).
Registered under `libx265` in `codec_adapters/__init__.py`; CLI
`--encoder` now accepts `libx264 | libx265` via
`choices=list(known_codecs())`. `encode.parse_versions` gains an
encoder-aware regex so corpus rows record `libx265-<version>` correctly
(default remains `libx264` for backward compatibility).

No `SCHEMA_VERSION` bump — the existing `encoder` row column already
carries codec identity. Phase B/C consumers receive the new codec
without any contract change.

14 new subprocess-mocked smoke tests under
`tools/vmaf-tune/tests/test_codec_adapter_x265.py` (29 of 30 vmaf-tune
tests pass green; the one skipped case is the real-binary integration
test gated on `VMAF_TUNE_INTEGRATION=1`).

Unblocks ADR-0235 codec-aware FR regressor and PR #354 audit's
buckets #6 (bitrate-ladder), #7 (codec-comparison), #9 (HDR), #15
(Pareto).

Six deep-dive deliverables (ADR-0108):
1. research digest: no digest needed — trivial mirror of `x264.py`;
   alternatives matrix is exhausted in ADR-0276.
2. decision matrix: ADR-0276 §Alternatives considered.
3. AGENTS.md invariant note: tools/vmaf-tune/AGENTS.md updated to
   document the per-codec banner-regex carve-out in `parse_versions`
   and the wired-codecs list (libx264, libx265).
4. reproducer: `python -m pytest tools/vmaf-tune/tests/`.
5. CHANGELOG: changelog.d/added/ADR-0276-vmaf-tune-x265-adapter.md.
6. rebase-notes: docs/rebase-notes.md entry 0228.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 3, 2026
Adds the first sibling codec to the ADR-0237 Phase A `libx264` scaffold:
a one-file `X265Adapter` mirroring the `x264.py` shape (10 presets
including `placebo`, 0..51 CRF window, `profile_for(pix_fmt)` helper
that maps `yuv420p10le` → `main10` for downstream HDR work).
Registered under `libx265` in `codec_adapters/__init__.py`; CLI
`--encoder` now accepts `libx264 | libx265` via
`choices=list(known_codecs())`. `encode.parse_versions` gains an
encoder-aware regex so corpus rows record `libx265-<version>` correctly
(default remains `libx264` for backward compatibility).

No `SCHEMA_VERSION` bump — the existing `encoder` row column already
carries codec identity. Phase B/C consumers receive the new codec
without any contract change.

14 new subprocess-mocked smoke tests under
`tools/vmaf-tune/tests/test_codec_adapter_x265.py` (29 of 30 vmaf-tune
tests pass green; the one skipped case is the real-binary integration
test gated on `VMAF_TUNE_INTEGRATION=1`).

Unblocks ADR-0235 codec-aware FR regressor and PR #354 audit's
buckets #6 (bitrate-ladder), #7 (codec-comparison), #9 (HDR), #15
(Pareto).

Six deep-dive deliverables (ADR-0108):
1. research digest: no digest needed — trivial mirror of `x264.py`;
   alternatives matrix is exhausted in ADR-0276.
2. decision matrix: ADR-0276 §Alternatives considered.
3. AGENTS.md invariant note: tools/vmaf-tune/AGENTS.md updated to
   document the per-codec banner-regex carve-out in `parse_versions`
   and the wired-codecs list (libx264, libx265).
4. reproducer: `python -m pytest tools/vmaf-tune/tests/`.
5. CHANGELOG: changelog.d/added/ADR-0276-vmaf-tune-x265-adapter.md.
6. rebase-notes: docs/rebase-notes.md entry 0228.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 3, 2026
Adds tools/vmaf-tune/src/vmaftune/resolution.py exposing
select_vmaf_model_version (height>=2160 -> vmaf_4k_v0.6.1, else
vmaf_v0.6.1), select_vmaf_model (Path), and crf_offset_for_resolution
(4K -2, 1080p 0, 720p +2, sub-720p +4).

corpus.iter_rows auto-picks the model per encode resolution; CLI gains
--resolution-aware / --no-resolution-aware (default on). Emitted JSONL
row's vmaf_model field now records the *effective* model used per row,
not the global option — required for mixed-ladder corpora to be
unambiguous downstream.

Closes PR #354 audit Bucket #8. Decision rule mirrors Netflix's
published guidance.

Refs: ADR-0237 (parent), ADR-0280 (this), Research-0054.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 3, 2026
Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage
gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the
Netflix per-title encoding paper: sample (resolution × target-VMAF),
take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along
the hull, emit an HLS / DASH / JSON manifest.

Currently scaffold-only: the production sampler that drives Phase B's
target-VMAF bisect (PR #347) lands once that PR merges. Default sampler
raises NotImplementedError; tests inject a synthetic stub modelled on
the Netflix paper's R-D curves.

- New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder,
  convex_hull (Pareto filter + diminishing-returns envelope),
  select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH
  / JSON), and a build_and_emit convenience.
- New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung
  1080p/720p/480p/360p/240p default rendition set.
- 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull
  correctness on a synthetic Netflix-paper-shaped cloud, knee selection
  invariants, and HLS / DASH / JSON manifest emit shape.
- ADR-0277 (Proposed; flips to Accepted once Phase B integration PR
  lands and a real-corpus PLCC validation digest reports the delta).
- Research-0054 surveys the algorithm space (Netflix per-title paper,
  Apple HLS authoring spec, JND-spaced, BO sampling).
- docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section
  with the canonical invocation.
- CHANGELOG, rebase-notes (#229), AGENTS.md invariant note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 3, 2026
…al scaffold)

Adds a probabilistic head on top of the codec-aware fr_regressor_v2
(parent: ADR-0272 / PR #347 in flight) so producers can drive the
in-flight `vmaf-tune --quality-confidence 0.95` flag (ADR-0237) off a
calibrated prediction interval instead of v2's bare MOS scalar. PR #354
audit Bucket #18 (top-3 ranked).

Trainer (`ai/scripts/train_fr_regressor_v2_ensemble.py`) trains N=5
copies of the v2 architecture (`FRRegressor(num_codecs=NUM_CODECS)`)
under distinct seeds, exports each as a separate two-input ONNX
(`features [N, 6]` + `codec_onehot [N, NUM_CODECS]`), and writes an
ensemble manifest sidecar that pins per-member sha256s, feature
standardisation, codec vocab, nominal coverage, and an optional
split-conformal residual quantile from a held-out calibration split.
Inference rule is `mu ± q · σ` with `q = 1.96` (Gaussian) or the
empirical conformal quantile (Vovk 2005, Romano 2019 — distribution-free
marginal coverage on exchangeable data).

Evaluator (`ai/scripts/eval_probabilistic_proxy.py`) reports empirical
coverage at 50/80/95 % nominal levels, mean interval width, and the
mean-prediction PLCC; reports the conformal-interval row when the
manifest carries a conformal scalar.

Smoke-only ship: synthetic 100-row corpus, 1 epoch / member. Production
training is gated on the multi-codec Phase A corpus (T7-FR-REGRESSOR-V2-PROBABILISTIC).

Six ADR-0108 deliverables:
1. Research digest: docs/research/0054-fr-regressor-v2-probabilistic.md.
2. Decision matrix: ADR-0279 § Alternatives considered.
3. AGENTS.md invariant note: appended to ai/AGENTS.md.
4. Reproducer: `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke`
   followed by `python ai/scripts/eval_probabilistic_proxy.py --smoke`.
5. CHANGELOG ### Added entry under Unreleased — lusoris fork.
6. Rebase-notes entry: ### 0229 in docs/rebase-notes.md.

Test plan:
- `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` produces
  5 valid two-input ONNX members + manifest sidecar (ran locally).
- `python ai/scripts/eval_probabilistic_proxy.py --smoke` aggregates the
  5 ONNX outputs into (mu, sigma) and reports coverage at 50/80/95 %.
- `python ai/scripts/validate_model_registry.py` → 15 entries valid.
- `pre-commit run --files <changed>` → Passed (black / isort / ruff /
  json-check / secrets / semgrep).
- `markdownlint-cli2` on all new docs → 0 errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 3, 2026
Wires the fork-trained `saliency_student_v1` ONNX model
(ADR-0286 / PR #359) into vmaf-tune so a single command can
produce an encode that biases bits toward salient regions.

New surfaces:
- `tools/vmaf-tune/src/vmaftune/saliency.py` — pure-NumPy
  signal-blend pipeline (sample frames -> ImageNet-normalised RGB
  -> ONNX inference -> per-pixel saliency mean -> per-MB QP-offset
  map clamped to ±12 -> x264 ASCII qpfile).
- `vmaf-tune recommend --saliency-aware [--saliency-offset -4]
  [--saliency-model PATH]` CLI subcommand. Falls back to a plain
  encode when onnxruntime or the model file is unavailable; the
  flag surface is wired so Phase B (target-VMAF bisect) can drop
  in without renaming flags.
- 13 unit tests under `tests/test_saliency.py` mocking both the
  ONNX session and the encode runner — runs without onnxruntime
  or ffmpeg installed.

Bucket #2 of the PR #354 audit. Six ADR-0108 deliverables:
1. Research-0046 (digest)
2. ADR-0287 §"Alternatives considered" (decision matrix)
3. tools/vmaf-tune/AGENTS.md (saliency invariant)
4. `pytest tools/vmaf-tune/tests/test_saliency.py -v` (smoke)
5. changelog.d/added/T-VMAF-TUNE-saliency-aware.md
6. docs/rebase-notes.md §0287

User docs: docs/usage/vmaf-tune.md §"Saliency-aware encoding".
Decision: ADR-0287.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 3, 2026
…#9, ADR-0261)

Closes Bucket #9 of the PR #354 vmaf-tune capability audit. Adds
ffprobe-driven HDR detection, codec-specific HDR encode flag dispatch,
and HDR-VMAF model resolution to the Phase A corpus driver.

New module ``tools/vmaf-tune/src/vmaftune/hdr.py``:

- ``detect_hdr(path)`` — runs ``ffprobe -show_streams -of json``,
  classifies the first video stream as PQ / HLG / SDR. Strict
  BT.2020-primaries gate so malformed signaling falls back to SDR.
- ``hdr_codec_args(encoder, info)`` — per-encoder dispatch table
  covering libx264 (container ``-color_*``), libx265 (``-x265-params``
  with master-display + max-cll + hdr10-opt), libsvtav1 (AV1 enums
  via ``-svtav1-params``), hevc_nvenc (``-pix_fmt p010le -profile:v
  main10``), libvvenc.
- ``select_hdr_vmaf_model()`` — globs ``model/vmaf_hdr_*.json``;
  returns ``None`` when none shipped (current state — fork hasn't
  ported Netflix's HDR model yet).

Corpus driver wiring:

- ``CorpusOptions.hdr_mode`` ∈ {``auto``, ``force-sdr``,
  ``force-hdr-pq``, ``force-hdr-hlg``}; CLI flags
  ``--auto-hdr`` / ``--force-sdr`` / ``--force-hdr-pq`` /
  ``--force-hdr-hlg`` (mutually exclusive). Auto is the default.
- New schema-v2 row keys ``hdr_transfer`` / ``hdr_primaries`` /
  ``hdr_forced``; ``SCHEMA_VERSION`` bumped 1 → 2. Phase B / C
  loaders treat missing keys as SDR (additive change, v1 rows
  remain readable).
- ``score._model_arg`` now passes pre-formatted ``path=`` /
  ``version=`` strings through unchanged so the HDR model path can
  be injected via ``vmaf --model``.
- HDR detected but no HDR model shipped → log warning, fall back
  to SDR model with notice that scores trend low.

Tests (``tools/vmaf-tune/tests/test_hdr.py``, 21 cases):

- detection: SDR / PQ / HLG / mismatched-primaries / missing-file /
  ffprobe-failure / invalid-JSON
- codec dispatch: shape per encoder (x264, x265 PQ + HLG, SVT-AV1
  PQ + HLG, NVENC HEVC, unknown encoder)
- model resolution: empty dir / shipped / multi-version pick-latest
  / missing dir
- corpus integration: end-to-end ``force-hdr-pq`` (verify HDR fields
  in row + ``-color_*`` in encode argv) and ``force-sdr``

ADR-0261 (Accepted, encode-side; HDR-VMAF scoring deferred until
fork-local model port). Research-0054 digest, rebase-notes 0261,
AGENTS.md invariant note, docs/usage/vmaf-tune.md HDR section,
changelog fragment all included.
Survey 18 capability buckets beyond Phases A/B + the fast-path
proposal. Rank by impact / effort. Recommend execution order
for Phases C–H of the vmaf-tune umbrella (ADR-0237).

Top-5 by impact-÷-effort: bitrate-budget mode, quality-floor mode,
bitrate-ladder optimisation (Phase E, the game-changer), per-shot
CRF (Phase D, on roadmap), probabilistic proxy.

Biggest blocker called out: codec adapter coverage — five buckets
degrade to x264-only until x265/SVT-AV1/libaom/libvvenc adapters
land. Recommends opening the adapter stream in parallel with
Phase D so multi-codec capabilities don't all stack at the end.

Pure scoping pass — no code, no implementation. Numbers are
back-of-envelope hypotheses; ADRs that consume this digest must
re-validate against real corpora.

Six deliverables (ADR-0108):
- (1) digest = this file
- (2) no decision matrix needed: research-only, no decision being made
- (3) no rebase-sensitive invariants
- (4) reproducer in PR description
- (5) no CHANGELOG fragment needed: research-only
- (6) no rebase impact: docs-only

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lusoris lusoris force-pushed the research/vmaf-tune-capability-audit branch from e133d98 to beff045 Compare May 4, 2026 20:53
@lusoris lusoris marked this pull request as ready for review May 4, 2026 20:54
Copilot AI review requested due to automatic review settings May 4, 2026 20:54
@lusoris lusoris marked this pull request as draft May 4, 2026 20:54
@lusoris lusoris marked this pull request as ready for review May 4, 2026 20:54
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new research digest that audits the potential capability surface of vmaf-tune beyond Phases A/B and the proposed “fast path”, ranking 18 feature buckets by impact vs effort and proposing a Phase C–H execution order.

Changes:

  • Introduces a capability audit digest covering 18 potential vmaf-tune feature buckets.
  • Provides an impact/effort ranking and a recommended implementation sequence, highlighting codec-adapter coverage as the main dependency.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +6
# Research-0054: `vmaf-tune` capability audit — beyond Phases A/B + the fast path

- **Status**: Active
- **Workstream**: ADR-0237 (`vmaf-tune` umbrella), ADR-0235 (codec-aware FR regressor)
- **Last updated**: 2026-05-03
- **Author**: research scoping pass (no code)
Comment on lines +56 to +75
### Bucket 1 — Per-shot CRF tuning (Phase C as written)

- **Summary**: Use `transnet_v2` to cut the source into shots, run
Phase B's bisect/proxy per shot, emit `--qpfile` (x264) /
`--zones` (x265) / SVT-AV1 segment table.
- **Existing primitives**: `transnet_v2.onnx`, codec adapter
`emit_per_shot_overrides()` hook (already declared in ADR-0237),
Phase B bisect.
- **Effort**: **M** — shot-aware orchestration + per-codec override
emission; non-trivial because shot-boundary frames near GOP edges
need encoder-specific handling.
- **Impact**: **High** — Netflix's per-shot encoding is the canonical
reference; same-VMAF bitrate savings of 10–30% are the public
numbers from their 2018 paper.
- **Open**: do we re-train the proxy on per-shot canonical-6, or is
the per-title proxy "good enough" if features are computed per
shot? (Hypothesis: per-title proxy generalises if features are
recomputed; needs a held-out check on BVI-DVC.)
- **Already in roadmap**: yes — Phase D in ADR-0237, gated on T6-3b
per-shot CRF predictor.
Comment on lines +141 to +142
user-facing flag (`--target-vmaf`, exists; `--minimise bitrate`,
doesn't).
Comment on lines +398 to +401
2. **Phase B docs follow-up** (≤ 1 week) — ship Buckets #4 + #5
explicitly as `--target-bitrate` and `--target-vmaf` modes;
trivial flag work, big perceived feature add. *Highest impact
÷ effort in the audit.*
Co-Authored-By: Claude <noreply@anthropic.com>
@lusoris lusoris merged commit 7f543ce into master May 4, 2026
54 checks passed
@lusoris lusoris deleted the research/vmaf-tune-capability-audit branch May 4, 2026 21:35
lusoris pushed a commit that referenced this pull request May 4, 2026
… flags

Implements Buckets 4 + 5 from Research-0061 (vmaf-tune capability
audit). Adds a new `recommend` subcommand on `tools/vmaf-tune/` that
consumes the Phase A corpus (either pre-built JSONL via
`--from-corpus` or generated on the fly from `--source` + grid flags)
and applies a user-supplied predicate over the existing
`(crf, preset, vmaf_score, bitrate_kbps)` rows.

- `--target-vmaf T` returns the row with the smallest CRF whose
  `vmaf_score >= T`. Falls back to the closest miss (highest VMAF)
  when no row clears the bar, with the predicate annotated `(UNMET)`.
- `--target-bitrate KBPS` returns the row whose `bitrate_kbps` is
  closest to `KBPS` (absolute distance, ties broken by smaller CRF).

The two flags are mutually exclusive at the argparse layer (exit
code 2 when both are passed). Default output is a single
human-readable line on stdout; `--json` switches to the full corpus
row as a JSON object.

No schema bump — `recommend` is a pure consumer of `CORPUS_ROW_KEYS`.
13-test suite under `tools/vmaf-tune/tests/test_recommend.py` covers
predicate semantics, encoder/preset filtering, NaN/failed-encode
rejection, and CLI exit codes; mocks all binaries so it runs in
<100 ms.

Six ADR-0108 deliverables:
1. Research digest — Research-0061 (parent, in flight via PR #354).
2. Decision matrix — no alternatives: only-one-way fix; the audit's
   ranked-by-impact/effort table is the alternatives matrix.
3. AGENTS.md invariant note — added in tools/vmaf-tune/AGENTS.md.
4. Reproducer / smoke test — pytest tools/vmaf-tune/tests/.
5. CHANGELOG.md — Added entry.
6. docs/rebase-notes.md — entry 0229.

Parent ADR: ADR-0237 (vmaf-tune umbrella).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 5, 2026
… flags (#358)

* feat(tools): vmaf-tune recommend — --target-vmaf and --target-bitrate flags

Implements Buckets 4 + 5 from Research-0061 (vmaf-tune capability
audit). Adds a new `recommend` subcommand on `tools/vmaf-tune/` that
consumes the Phase A corpus (either pre-built JSONL via
`--from-corpus` or generated on the fly from `--source` + grid flags)
and applies a user-supplied predicate over the existing
`(crf, preset, vmaf_score, bitrate_kbps)` rows.

- `--target-vmaf T` returns the row with the smallest CRF whose
  `vmaf_score >= T`. Falls back to the closest miss (highest VMAF)
  when no row clears the bar, with the predicate annotated `(UNMET)`.
- `--target-bitrate KBPS` returns the row whose `bitrate_kbps` is
  closest to `KBPS` (absolute distance, ties broken by smaller CRF).

The two flags are mutually exclusive at the argparse layer (exit
code 2 when both are passed). Default output is a single
human-readable line on stdout; `--json` switches to the full corpus
row as a JSON object.

No schema bump — `recommend` is a pure consumer of `CORPUS_ROW_KEYS`.
13-test suite under `tools/vmaf-tune/tests/test_recommend.py` covers
predicate semantics, encoder/preset filtering, NaN/failed-encode
rejection, and CLI exit codes; mocks all binaries so it runs in
<100 ms.

Six ADR-0108 deliverables:
1. Research digest — Research-0061 (parent, in flight via PR #354).
2. Decision matrix — no alternatives: only-one-way fix; the audit's
   ranked-by-impact/effort table is the alternatives matrix.
3. AGENTS.md invariant note — added in tools/vmaf-tune/AGENTS.md.
4. Reproducer / smoke test — pytest tools/vmaf-tune/tests/.
5. CHANGELOG.md — Added entry.
6. docs/rebase-notes.md — entry 0229.

Parent ADR: ADR-0237 (vmaf-tune umbrella).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(ci): trigger workflow re-run

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(tools): close fast.add_argument before rec parser (rebase residue)

The marker-strip during the #358 rebase dropped the closing ')'
between fast.add_argument and rec = sub.add_parser, leaving cli.py
unparseable. Black + ruff hard-failed in CI.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 5, 2026
Adds the first sibling codec to the ADR-0237 Phase A `libx264` scaffold:
a one-file `X265Adapter` mirroring the `x264.py` shape (10 presets
including `placebo`, 0..51 CRF window, `profile_for(pix_fmt)` helper
that maps `yuv420p10le` → `main10` for downstream HDR work).
Registered under `libx265` in `codec_adapters/__init__.py`; CLI
`--encoder` now accepts `libx264 | libx265` via
`choices=list(known_codecs())`. `encode.parse_versions` gains an
encoder-aware regex so corpus rows record `libx265-<version>` correctly
(default remains `libx264` for backward compatibility).

No `SCHEMA_VERSION` bump — the existing `encoder` row column already
carries codec identity. Phase B/C consumers receive the new codec
without any contract change.

14 new subprocess-mocked smoke tests under
`tools/vmaf-tune/tests/test_codec_adapter_x265.py` (29 of 30 vmaf-tune
tests pass green; the one skipped case is the real-binary integration
test gated on `VMAF_TUNE_INTEGRATION=1`).

Unblocks ADR-0235 codec-aware FR regressor and PR #354 audit's
buckets #6 (bitrate-ladder), #7 (codec-comparison), #9 (HDR), #15
(Pareto).

Six deep-dive deliverables (ADR-0108):
1. research digest: no digest needed — trivial mirror of `x264.py`;
   alternatives matrix is exhausted in ADR-0276.
2. decision matrix: ADR-0276 §Alternatives considered.
3. AGENTS.md invariant note: tools/vmaf-tune/AGENTS.md updated to
   document the per-codec banner-regex carve-out in `parse_versions`
   and the wired-codecs list (libx264, libx265).
4. reproducer: `python -m pytest tools/vmaf-tune/tests/`.
5. CHANGELOG: changelog.d/added/ADR-0276-vmaf-tune-x265-adapter.md.
6. rebase-notes: docs/rebase-notes.md entry 0228.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 5, 2026
* feat(tools): vmaf-tune — x265 codec adapter (ADR-0276)

Adds the first sibling codec to the ADR-0237 Phase A `libx264` scaffold:
a one-file `X265Adapter` mirroring the `x264.py` shape (10 presets
including `placebo`, 0..51 CRF window, `profile_for(pix_fmt)` helper
that maps `yuv420p10le` → `main10` for downstream HDR work).
Registered under `libx265` in `codec_adapters/__init__.py`; CLI
`--encoder` now accepts `libx264 | libx265` via
`choices=list(known_codecs())`. `encode.parse_versions` gains an
encoder-aware regex so corpus rows record `libx265-<version>` correctly
(default remains `libx264` for backward compatibility).

No `SCHEMA_VERSION` bump — the existing `encoder` row column already
carries codec identity. Phase B/C consumers receive the new codec
without any contract change.

14 new subprocess-mocked smoke tests under
`tools/vmaf-tune/tests/test_codec_adapter_x265.py` (29 of 30 vmaf-tune
tests pass green; the one skipped case is the real-binary integration
test gated on `VMAF_TUNE_INTEGRATION=1`).

Unblocks ADR-0235 codec-aware FR regressor and PR #354 audit's
buckets #6 (bitrate-ladder), #7 (codec-comparison), #9 (HDR), #15
(Pareto).

Six deep-dive deliverables (ADR-0108):
1. research digest: no digest needed — trivial mirror of `x264.py`;
   alternatives matrix is exhausted in ADR-0276.
2. decision matrix: ADR-0276 §Alternatives considered.
3. AGENTS.md invariant note: tools/vmaf-tune/AGENTS.md updated to
   document the per-codec banner-regex carve-out in `parse_versions`
   and the wired-codecs list (libx264, libx265).
4. reproducer: `python -m pytest tools/vmaf-tune/tests/`.
5. CHANGELOG: changelog.d/added/ADR-0276-vmaf-tune-x265-adapter.md.
6. rebase-notes: docs/rebase-notes.md entry 0228.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: re-trigger CI after deliverables canonicalised

* chore: re-trigger CI after research-digest opt-out

* chore(tools): black format test_corpus.py

---------

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 5, 2026
Adds tools/vmaf-tune/src/vmaftune/resolution.py exposing
select_vmaf_model_version (height>=2160 -> vmaf_4k_v0.6.1, else
vmaf_v0.6.1), select_vmaf_model (Path), and crf_offset_for_resolution
(4K -2, 1080p 0, 720p +2, sub-720p +4).

corpus.iter_rows auto-picks the model per encode resolution; CLI gains
--resolution-aware / --no-resolution-aware (default on). Emitted JSONL
row's vmaf_model field now records the *effective* model used per row,
not the global option — required for mixed-ladder corpora to be
unambiguous downstream.

Closes PR #354 audit Bucket #8. Decision rule mirrors Netflix's
published guidance.

Refs: ADR-0237 (parent), ADR-0280 (this), Research-0054.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 5, 2026
…ts (#363)

Adds tools/vmaf-tune/src/vmaftune/resolution.py exposing
select_vmaf_model_version (height>=2160 -> vmaf_4k_v0.6.1, else
vmaf_v0.6.1), select_vmaf_model (Path), and crf_offset_for_resolution
(4K -2, 1080p 0, 720p +2, sub-720p +4).

corpus.iter_rows auto-picks the model per encode resolution; CLI gains
--resolution-aware / --no-resolution-aware (default on). Emitted JSONL
row's vmaf_model field now records the *effective* model used per row,
not the global option — required for mixed-ladder corpora to be
unambiguous downstream.

Closes PR #354 audit Bucket #8. Decision rule mirrors Netflix's
published guidance.

Refs: ADR-0237 (parent), ADR-0280 (this), Research-0054.

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 5, 2026
Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage
gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the
Netflix per-title encoding paper: sample (resolution × target-VMAF),
take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along
the hull, emit an HLS / DASH / JSON manifest.

Currently scaffold-only: the production sampler that drives Phase B's
target-VMAF bisect (PR #347) lands once that PR merges. Default sampler
raises NotImplementedError; tests inject a synthetic stub modelled on
the Netflix paper's R-D curves.

- New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder,
  convex_hull (Pareto filter + diminishing-returns envelope),
  select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH
  / JSON), and a build_and_emit convenience.
- New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung
  1080p/720p/480p/360p/240p default rendition set.
- 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull
  correctness on a synthetic Netflix-paper-shaped cloud, knee selection
  invariants, and HLS / DASH / JSON manifest emit shape.
- ADR-0277 (Proposed; flips to Accepted once Phase B integration PR
  lands and a real-corpus PLCC validation digest reports the delta).
- Research-0054 surveys the algorithm space (Netflix per-title paper,
  Apple HLS authoring spec, JND-spaced, BO sampling).
- docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section
  with the canonical invocation.
- CHANGELOG, rebase-notes (#229), AGENTS.md invariant note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 5, 2026
Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage
gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the
Netflix per-title encoding paper: sample (resolution × target-VMAF),
take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along
the hull, emit an HLS / DASH / JSON manifest.

Currently scaffold-only: the production sampler that drives Phase B's
target-VMAF bisect (PR #347) lands once that PR merges. Default sampler
raises NotImplementedError; tests inject a synthetic stub modelled on
the Netflix paper's R-D curves.

- New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder,
  convex_hull (Pareto filter + diminishing-returns envelope),
  select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH
  / JSON), and a build_and_emit convenience.
- New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung
  1080p/720p/480p/360p/240p default rendition set.
- 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull
  correctness on a synthetic Netflix-paper-shaped cloud, knee selection
  invariants, and HLS / DASH / JSON manifest emit shape.
- ADR-0277 (Proposed; flips to Accepted once Phase B integration PR
  lands and a real-corpus PLCC validation digest reports the delta).
- Research-0054 surveys the algorithm space (Netflix per-title paper,
  Apple HLS authoring spec, JND-spaced, BO sampling).
- docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section
  with the canonical invocation.
- CHANGELOG, rebase-notes (#229), AGENTS.md invariant note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 5, 2026
…er) (#371)

* feat(tools): vmaf-tune Phase E — per-title bitrate ladder (game-changer)

Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage
gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the
Netflix per-title encoding paper: sample (resolution × target-VMAF),
take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along
the hull, emit an HLS / DASH / JSON manifest.

Currently scaffold-only: the production sampler that drives Phase B's
target-VMAF bisect (PR #347) lands once that PR merges. Default sampler
raises NotImplementedError; tests inject a synthetic stub modelled on
the Netflix paper's R-D curves.

- New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder,
  convex_hull (Pareto filter + diminishing-returns envelope),
  select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH
  / JSON), and a build_and_emit convenience.
- New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung
  1080p/720p/480p/360p/240p default rendition set.
- 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull
  correctness on a synthetic Netflix-paper-shaped cloud, knee selection
  invariants, and HLS / DASH / JSON manifest emit shape.
- ADR-0277 (Proposed; flips to Accepted once Phase B integration PR
  lands and a real-corpus PLCC validation digest reports the delta).
- Research-0054 surveys the algorithm space (Netflix per-title paper,
  Apple HLS authoring spec, JND-spaced, BO sampling).
- docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section
  with the canonical invocation.
- CHANGELOG, rebase-notes (#229), AGENTS.md invariant note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(docs): renumber phase-e ADR 0277→0295 + research 0066→0068 (collisions)

---------

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 5, 2026
…al scaffold)

Adds a probabilistic head on top of the codec-aware fr_regressor_v2
(parent: ADR-0272 / PR #347 in flight) so producers can drive the
in-flight `vmaf-tune --quality-confidence 0.95` flag (ADR-0237) off a
calibrated prediction interval instead of v2's bare MOS scalar. PR #354
audit Bucket #18 (top-3 ranked).

Trainer (`ai/scripts/train_fr_regressor_v2_ensemble.py`) trains N=5
copies of the v2 architecture (`FRRegressor(num_codecs=NUM_CODECS)`)
under distinct seeds, exports each as a separate two-input ONNX
(`features [N, 6]` + `codec_onehot [N, NUM_CODECS]`), and writes an
ensemble manifest sidecar that pins per-member sha256s, feature
standardisation, codec vocab, nominal coverage, and an optional
split-conformal residual quantile from a held-out calibration split.
Inference rule is `mu ± q · σ` with `q = 1.96` (Gaussian) or the
empirical conformal quantile (Vovk 2005, Romano 2019 — distribution-free
marginal coverage on exchangeable data).

Evaluator (`ai/scripts/eval_probabilistic_proxy.py`) reports empirical
coverage at 50/80/95 % nominal levels, mean interval width, and the
mean-prediction PLCC; reports the conformal-interval row when the
manifest carries a conformal scalar.

Smoke-only ship: synthetic 100-row corpus, 1 epoch / member. Production
training is gated on the multi-codec Phase A corpus (T7-FR-REGRESSOR-V2-PROBABILISTIC).

Six ADR-0108 deliverables:
1. Research digest: docs/research/0054-fr-regressor-v2-probabilistic.md.
2. Decision matrix: ADR-0279 § Alternatives considered.
3. AGENTS.md invariant note: appended to ai/AGENTS.md.
4. Reproducer: `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke`
   followed by `python ai/scripts/eval_probabilistic_proxy.py --smoke`.
5. CHANGELOG ### Added entry under Unreleased — lusoris fork.
6. Rebase-notes entry: ### 0229 in docs/rebase-notes.md.

Test plan:
- `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` produces
  5 valid two-input ONNX members + manifest sidecar (ran locally).
- `python ai/scripts/eval_probabilistic_proxy.py --smoke` aggregates the
  5 ONNX outputs into (mu, sigma) and reports coverage at 50/80/95 %.
- `python ai/scripts/validate_model_registry.py` → 15 entries valid.
- `pre-commit run --files <changed>` → Passed (black / isort / ruff /
  json-check / secrets / semgrep).
- `markdownlint-cli2` on all new docs → 0 errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 5, 2026
…al scaffold) (#372)

* feat(ai): fr_regressor_v2 probabilistic head (deep-ensemble + conformal scaffold)

Adds a probabilistic head on top of the codec-aware fr_regressor_v2
(parent: ADR-0272 / PR #347 in flight) so producers can drive the
in-flight `vmaf-tune --quality-confidence 0.95` flag (ADR-0237) off a
calibrated prediction interval instead of v2's bare MOS scalar. PR #354
audit Bucket #18 (top-3 ranked).

Trainer (`ai/scripts/train_fr_regressor_v2_ensemble.py`) trains N=5
copies of the v2 architecture (`FRRegressor(num_codecs=NUM_CODECS)`)
under distinct seeds, exports each as a separate two-input ONNX
(`features [N, 6]` + `codec_onehot [N, NUM_CODECS]`), and writes an
ensemble manifest sidecar that pins per-member sha256s, feature
standardisation, codec vocab, nominal coverage, and an optional
split-conformal residual quantile from a held-out calibration split.
Inference rule is `mu ± q · σ` with `q = 1.96` (Gaussian) or the
empirical conformal quantile (Vovk 2005, Romano 2019 — distribution-free
marginal coverage on exchangeable data).

Evaluator (`ai/scripts/eval_probabilistic_proxy.py`) reports empirical
coverage at 50/80/95 % nominal levels, mean interval width, and the
mean-prediction PLCC; reports the conformal-interval row when the
manifest carries a conformal scalar.

Smoke-only ship: synthetic 100-row corpus, 1 epoch / member. Production
training is gated on the multi-codec Phase A corpus (T7-FR-REGRESSOR-V2-PROBABILISTIC).

Six ADR-0108 deliverables:
1. Research digest: docs/research/0054-fr-regressor-v2-probabilistic.md.
2. Decision matrix: ADR-0279 § Alternatives considered.
3. AGENTS.md invariant note: appended to ai/AGENTS.md.
4. Reproducer: `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke`
   followed by `python ai/scripts/eval_probabilistic_proxy.py --smoke`.
5. CHANGELOG ### Added entry under Unreleased — lusoris fork.
6. Rebase-notes entry: ### 0229 in docs/rebase-notes.md.

Test plan:
- `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` produces
  5 valid two-input ONNX members + manifest sidecar (ran locally).
- `python ai/scripts/eval_probabilistic_proxy.py --smoke` aggregates the
  5 ONNX outputs into (mu, sigma) and reports coverage at 50/80/95 %.
- `python ai/scripts/validate_model_registry.py` → 15 entries valid.
- `pre-commit run --files <changed>` → Passed (black / isort / ruff /
  json-check / secrets / semgrep).
- `markdownlint-cli2` on all new docs → 0 errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(registry): split fr_regressor_v2 + ensemble_seed0 into distinct entries

---------

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 5, 2026
Wires the fork-trained `saliency_student_v1` ONNX model
(ADR-0286 / PR #359) into vmaf-tune so a single command can
produce an encode that biases bits toward salient regions.

New surfaces:
- `tools/vmaf-tune/src/vmaftune/saliency.py` — pure-NumPy
  signal-blend pipeline (sample frames -> ImageNet-normalised RGB
  -> ONNX inference -> per-pixel saliency mean -> per-MB QP-offset
  map clamped to ±12 -> x264 ASCII qpfile).
- `vmaf-tune recommend --saliency-aware [--saliency-offset -4]
  [--saliency-model PATH]` CLI subcommand. Falls back to a plain
  encode when onnxruntime or the model file is unavailable; the
  flag surface is wired so Phase B (target-VMAF bisect) can drop
  in without renaming flags.
- 13 unit tests under `tests/test_saliency.py` mocking both the
  ONNX session and the encode runner — runs without onnxruntime
  or ffmpeg installed.

Bucket #2 of the PR #354 audit. Six ADR-0108 deliverables:
1. Research-0046 (digest)
2. ADR-0287 §"Alternatives considered" (decision matrix)
3. tools/vmaf-tune/AGENTS.md (saliency invariant)
4. `pytest tools/vmaf-tune/tests/test_saliency.py -v` (smoke)
5. changelog.d/added/T-VMAF-TUNE-saliency-aware.md
6. docs/rebase-notes.md §0287

User docs: docs/usage/vmaf-tune.md §"Saliency-aware encoding".
Decision: ADR-0287.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 5, 2026
Wires the fork-trained `saliency_student_v1` ONNX model
(ADR-0286 / PR #359) into vmaf-tune so a single command can
produce an encode that biases bits toward salient regions.

New surfaces:
- `tools/vmaf-tune/src/vmaftune/saliency.py` — pure-NumPy
  signal-blend pipeline (sample frames -> ImageNet-normalised RGB
  -> ONNX inference -> per-pixel saliency mean -> per-MB QP-offset
  map clamped to ±12 -> x264 ASCII qpfile).
- `vmaf-tune recommend --saliency-aware [--saliency-offset -4]
  [--saliency-model PATH]` CLI subcommand. Falls back to a plain
  encode when onnxruntime or the model file is unavailable; the
  flag surface is wired so Phase B (target-VMAF bisect) can drop
  in without renaming flags.
- 13 unit tests under `tests/test_saliency.py` mocking both the
  ONNX session and the encode runner — runs without onnxruntime
  or ffmpeg installed.

Bucket #2 of the PR #354 audit. Six ADR-0108 deliverables:
1. Research-0046 (digest)
2. ADR-0287 §"Alternatives considered" (decision matrix)
3. tools/vmaf-tune/AGENTS.md (saliency invariant)
4. `pytest tools/vmaf-tune/tests/test_saliency.py -v` (smoke)
5. changelog.d/added/T-VMAF-TUNE-saliency-aware.md
6. docs/rebase-notes.md §0287

User docs: docs/usage/vmaf-tune.md §"Saliency-aware encoding".
Decision: ADR-0287.

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 5, 2026
…#9, ADR-0261)

Closes Bucket #9 of the PR #354 vmaf-tune capability audit. Adds
ffprobe-driven HDR detection, codec-specific HDR encode flag dispatch,
and HDR-VMAF model resolution to the Phase A corpus driver.

New module ``tools/vmaf-tune/src/vmaftune/hdr.py``:

- ``detect_hdr(path)`` — runs ``ffprobe -show_streams -of json``,
  classifies the first video stream as PQ / HLG / SDR. Strict
  BT.2020-primaries gate so malformed signaling falls back to SDR.
- ``hdr_codec_args(encoder, info)`` — per-encoder dispatch table
  covering libx264 (container ``-color_*``), libx265 (``-x265-params``
  with master-display + max-cll + hdr10-opt), libsvtav1 (AV1 enums
  via ``-svtav1-params``), hevc_nvenc (``-pix_fmt p010le -profile:v
  main10``), libvvenc.
- ``select_hdr_vmaf_model()`` — globs ``model/vmaf_hdr_*.json``;
  returns ``None`` when none shipped (current state — fork hasn't
  ported Netflix's HDR model yet).

Corpus driver wiring:

- ``CorpusOptions.hdr_mode`` ∈ {``auto``, ``force-sdr``,
  ``force-hdr-pq``, ``force-hdr-hlg``}; CLI flags
  ``--auto-hdr`` / ``--force-sdr`` / ``--force-hdr-pq`` /
  ``--force-hdr-hlg`` (mutually exclusive). Auto is the default.
- New schema-v2 row keys ``hdr_transfer`` / ``hdr_primaries`` /
  ``hdr_forced``; ``SCHEMA_VERSION`` bumped 1 → 2. Phase B / C
  loaders treat missing keys as SDR (additive change, v1 rows
  remain readable).
- ``score._model_arg`` now passes pre-formatted ``path=`` /
  ``version=`` strings through unchanged so the HDR model path can
  be injected via ``vmaf --model``.
- HDR detected but no HDR model shipped → log warning, fall back
  to SDR model with notice that scores trend low.

Tests (``tools/vmaf-tune/tests/test_hdr.py``, 21 cases):

- detection: SDR / PQ / HLG / mismatched-primaries / missing-file /
  ffprobe-failure / invalid-JSON
- codec dispatch: shape per encoder (x264, x265 PQ + HLG, SVT-AV1
  PQ + HLG, NVENC HEVC, unknown encoder)
- model resolution: empty dir / shipped / multi-version pick-latest
  / missing dir
- corpus integration: end-to-end ``force-hdr-pq`` (verify HDR fields
  in row + ``-color_*`` in encode argv) and ``force-sdr``

ADR-0261 (Accepted, encode-side; HDR-VMAF scoring deferred until
fork-local model port). Research-0054 digest, rebase-notes 0261,
AGENTS.md invariant note, docs/usage/vmaf-tune.md HDR section,
changelog fragment all included.
lusoris added a commit that referenced this pull request May 5, 2026
* feat(tools): vmaf-tune — HDR-aware encoding + HDR-VMAF scoring (Bucket #9, ADR-0261)

Closes Bucket #9 of the PR #354 vmaf-tune capability audit. Adds
ffprobe-driven HDR detection, codec-specific HDR encode flag dispatch,
and HDR-VMAF model resolution to the Phase A corpus driver.

New module ``tools/vmaf-tune/src/vmaftune/hdr.py``:

- ``detect_hdr(path)`` — runs ``ffprobe -show_streams -of json``,
  classifies the first video stream as PQ / HLG / SDR. Strict
  BT.2020-primaries gate so malformed signaling falls back to SDR.
- ``hdr_codec_args(encoder, info)`` — per-encoder dispatch table
  covering libx264 (container ``-color_*``), libx265 (``-x265-params``
  with master-display + max-cll + hdr10-opt), libsvtav1 (AV1 enums
  via ``-svtav1-params``), hevc_nvenc (``-pix_fmt p010le -profile:v
  main10``), libvvenc.
- ``select_hdr_vmaf_model()`` — globs ``model/vmaf_hdr_*.json``;
  returns ``None`` when none shipped (current state — fork hasn't
  ported Netflix's HDR model yet).

Corpus driver wiring:

- ``CorpusOptions.hdr_mode`` ∈ {``auto``, ``force-sdr``,
  ``force-hdr-pq``, ``force-hdr-hlg``}; CLI flags
  ``--auto-hdr`` / ``--force-sdr`` / ``--force-hdr-pq`` /
  ``--force-hdr-hlg`` (mutually exclusive). Auto is the default.
- New schema-v2 row keys ``hdr_transfer`` / ``hdr_primaries`` /
  ``hdr_forced``; ``SCHEMA_VERSION`` bumped 1 → 2. Phase B / C
  loaders treat missing keys as SDR (additive change, v1 rows
  remain readable).
- ``score._model_arg`` now passes pre-formatted ``path=`` /
  ``version=`` strings through unchanged so the HDR model path can
  be injected via ``vmaf --model``.
- HDR detected but no HDR model shipped → log warning, fall back
  to SDR model with notice that scores trend low.

Tests (``tools/vmaf-tune/tests/test_hdr.py``, 21 cases):

- detection: SDR / PQ / HLG / mismatched-primaries / missing-file /
  ffprobe-failure / invalid-JSON
- codec dispatch: shape per encoder (x264, x265 PQ + HLG, SVT-AV1
  PQ + HLG, NVENC HEVC, unknown encoder)
- model resolution: empty dir / shipped / multi-version pick-latest
  / missing dir
- corpus integration: end-to-end ``force-hdr-pq`` (verify HDR fields
  in row + ``-color_*`` in encode argv) and ``force-sdr``

ADR-0261 (Accepted, encode-side; HDR-VMAF scoring deferred until
fork-local model port). Research-0054 digest, rebase-notes 0261,
AGENTS.md invariant note, docs/usage/vmaf-tune.md HDR section,
changelog fragment all included.

* chore(docs): renumber hdr-aware ADR/research to dodge collisions (0295→0300, 0261→0300, 0071→0072)

* fix(tools): drop duplicate vmaf_model dict key in corpus.py

---------

Co-authored-by: Lusoris <lusoris@pm.me>
lusoris pushed a commit that referenced this pull request May 6, 2026
Wires the fork-trained `saliency_student_v1` ONNX model
(ADR-0286 / PR #359) into vmaf-tune so a single command can
produce an encode that biases bits toward salient regions.

New surfaces:
- `tools/vmaf-tune/src/vmaftune/saliency.py` — pure-NumPy
  signal-blend pipeline (sample frames -> ImageNet-normalised RGB
  -> ONNX inference -> per-pixel saliency mean -> per-MB QP-offset
  map clamped to ±12 -> x264 ASCII qpfile).
- `vmaf-tune recommend --saliency-aware [--saliency-offset -4]
  [--saliency-model PATH]` CLI subcommand. Falls back to a plain
  encode when onnxruntime or the model file is unavailable; the
  flag surface is wired so Phase B (target-VMAF bisect) can drop
  in without renaming flags.
- 13 unit tests under `tests/test_saliency.py` mocking both the
  ONNX session and the encode runner — runs without onnxruntime
  or ffmpeg installed.

Bucket #2 of the PR #354 audit. Six ADR-0108 deliverables:
1. Research-0046 (digest)
2. ADR-0287 §"Alternatives considered" (decision matrix)
3. tools/vmaf-tune/AGENTS.md (saliency invariant)
4. `pytest tools/vmaf-tune/tests/test_saliency.py -v` (smoke)
5. changelog.d/added/T-VMAF-TUNE-saliency-aware.md
6. docs/rebase-notes.md §0287

User docs: docs/usage/vmaf-tune.md §"Saliency-aware encoding".
Decision: ADR-0287.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 6, 2026
* feat(tools): vmaf-tune — saliency-aware ROI encoding (Bucket #2)

Wires the fork-trained `saliency_student_v1` ONNX model
(ADR-0286 / PR #359) into vmaf-tune so a single command can
produce an encode that biases bits toward salient regions.

New surfaces:
- `tools/vmaf-tune/src/vmaftune/saliency.py` — pure-NumPy
  signal-blend pipeline (sample frames -> ImageNet-normalised RGB
  -> ONNX inference -> per-pixel saliency mean -> per-MB QP-offset
  map clamped to ±12 -> x264 ASCII qpfile).
- `vmaf-tune recommend --saliency-aware [--saliency-offset -4]
  [--saliency-model PATH]` CLI subcommand. Falls back to a plain
  encode when onnxruntime or the model file is unavailable; the
  flag surface is wired so Phase B (target-VMAF bisect) can drop
  in without renaming flags.
- 13 unit tests under `tests/test_saliency.py` mocking both the
  ONNX session and the encode runner — runs without onnxruntime
  or ffmpeg installed.

Bucket #2 of the PR #354 audit. Six ADR-0108 deliverables:
1. Research-0046 (digest)
2. ADR-0287 §"Alternatives considered" (decision matrix)
3. tools/vmaf-tune/AGENTS.md (saliency invariant)
4. `pytest tools/vmaf-tune/tests/test_saliency.py -v` (smoke)
5. changelog.d/added/T-VMAF-TUNE-saliency-aware.md
6. docs/rebase-notes.md §0287

User docs: docs/usage/vmaf-tune.md §"Saliency-aware encoding".
Decision: ADR-0287.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(vmaf-tune): CHANGELOG fragment for recommend-saliency CLI

ADR-0108 deliverables-checklist gate on PR #432 wants the fragment
to be added in this PR's diff, not just present on master. The
existing fragment ``T-VMAF-TUNE-saliency-aware.md`` covers the
saliency engine that's already merged; this new fragment covers
the CLI subcommand specifically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 7, 2026
Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage
gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the
Netflix per-title encoding paper: sample (resolution × target-VMAF),
take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along
the hull, emit an HLS / DASH / JSON manifest.

Currently scaffold-only: the production sampler that drives Phase B's
target-VMAF bisect (PR #347) lands once that PR merges. Default sampler
raises NotImplementedError; tests inject a synthetic stub modelled on
the Netflix paper's R-D curves.

- New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder,
  convex_hull (Pareto filter + diminishing-returns envelope),
  select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH
  / JSON), and a build_and_emit convenience.
- New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung
  1080p/720p/480p/360p/240p default rendition set.
- 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull
  correctness on a synthetic Netflix-paper-shaped cloud, knee selection
  invariants, and HLS / DASH / JSON manifest emit shape.
- ADR-0277 (Proposed; flips to Accepted once Phase B integration PR
  lands and a real-corpus PLCC validation digest reports the delta).
- Research-0054 surveys the algorithm space (Netflix per-title paper,
  Apple HLS authoring spec, JND-spaced, BO sampling).
- docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section
  with the canonical invocation.
- CHANGELOG, rebase-notes (#229), AGENTS.md invariant note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 7, 2026
…#9, ADR-0261)

Closes Bucket #9 of the PR #354 vmaf-tune capability audit. Adds
ffprobe-driven HDR detection, codec-specific HDR encode flag dispatch,
and HDR-VMAF model resolution to the Phase A corpus driver.

New module ``tools/vmaf-tune/src/vmaftune/hdr.py``:

- ``detect_hdr(path)`` — runs ``ffprobe -show_streams -of json``,
  classifies the first video stream as PQ / HLG / SDR. Strict
  BT.2020-primaries gate so malformed signaling falls back to SDR.
- ``hdr_codec_args(encoder, info)`` — per-encoder dispatch table
  covering libx264 (container ``-color_*``), libx265 (``-x265-params``
  with master-display + max-cll + hdr10-opt), libsvtav1 (AV1 enums
  via ``-svtav1-params``), hevc_nvenc (``-pix_fmt p010le -profile:v
  main10``), libvvenc.
- ``select_hdr_vmaf_model()`` — globs ``model/vmaf_hdr_*.json``;
  returns ``None`` when none shipped (current state — fork hasn't
  ported Netflix's HDR model yet).

Corpus driver wiring:

- ``CorpusOptions.hdr_mode`` ∈ {``auto``, ``force-sdr``,
  ``force-hdr-pq``, ``force-hdr-hlg``}; CLI flags
  ``--auto-hdr`` / ``--force-sdr`` / ``--force-hdr-pq`` /
  ``--force-hdr-hlg`` (mutually exclusive). Auto is the default.
- New schema-v2 row keys ``hdr_transfer`` / ``hdr_primaries`` /
  ``hdr_forced``; ``SCHEMA_VERSION`` bumped 1 → 2. Phase B / C
  loaders treat missing keys as SDR (additive change, v1 rows
  remain readable).
- ``score._model_arg`` now passes pre-formatted ``path=`` /
  ``version=`` strings through unchanged so the HDR model path can
  be injected via ``vmaf --model``.
- HDR detected but no HDR model shipped → log warning, fall back
  to SDR model with notice that scores trend low.

Tests (``tools/vmaf-tune/tests/test_hdr.py``, 21 cases):

- detection: SDR / PQ / HLG / mismatched-primaries / missing-file /
  ffprobe-failure / invalid-JSON
- codec dispatch: shape per encoder (x264, x265 PQ + HLG, SVT-AV1
  PQ + HLG, NVENC HEVC, unknown encoder)
- model resolution: empty dir / shipped / multi-version pick-latest
  / missing dir
- corpus integration: end-to-end ``force-hdr-pq`` (verify HDR fields
  in row + ``-color_*`` in encode argv) and ``force-sdr``

ADR-0261 (Accepted, encode-side; HDR-VMAF scoring deferred until
fork-local model port). Research-0054 digest, rebase-notes 0261,
AGENTS.md invariant note, docs/usage/vmaf-tune.md HDR section,
changelog fragment all included.
lusoris added a commit that referenced this pull request May 7, 2026
…er) (#433)

Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage
gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the
Netflix per-title encoding paper: sample (resolution × target-VMAF),
take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along
the hull, emit an HLS / DASH / JSON manifest.

Currently scaffold-only: the production sampler that drives Phase B's
target-VMAF bisect (PR #347) lands once that PR merges. Default sampler
raises NotImplementedError; tests inject a synthetic stub modelled on
the Netflix paper's R-D curves.

- New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder,
  convex_hull (Pareto filter + diminishing-returns envelope),
  select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH
  / JSON), and a build_and_emit convenience.
- New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung
  1080p/720p/480p/360p/240p default rendition set.
- 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull
  correctness on a synthetic Netflix-paper-shaped cloud, knee selection
  invariants, and HLS / DASH / JSON manifest emit shape.
- ADR-0277 (Proposed; flips to Accepted once Phase B integration PR
  lands and a real-corpus PLCC validation digest reports the delta).
- Research-0054 surveys the algorithm space (Netflix per-title paper,
  Apple HLS authoring spec, JND-spaced, BO sampling).
- docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section
  with the canonical invocation.
- CHANGELOG, rebase-notes (#229), AGENTS.md invariant note.

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 7, 2026
…#9, ADR-0261) (#434)

Closes Bucket #9 of the PR #354 vmaf-tune capability audit. Adds
ffprobe-driven HDR detection, codec-specific HDR encode flag dispatch,
and HDR-VMAF model resolution to the Phase A corpus driver.

New module ``tools/vmaf-tune/src/vmaftune/hdr.py``:

- ``detect_hdr(path)`` — runs ``ffprobe -show_streams -of json``,
  classifies the first video stream as PQ / HLG / SDR. Strict
  BT.2020-primaries gate so malformed signaling falls back to SDR.
- ``hdr_codec_args(encoder, info)`` — per-encoder dispatch table
  covering libx264 (container ``-color_*``), libx265 (``-x265-params``
  with master-display + max-cll + hdr10-opt), libsvtav1 (AV1 enums
  via ``-svtav1-params``), hevc_nvenc (``-pix_fmt p010le -profile:v
  main10``), libvvenc.
- ``select_hdr_vmaf_model()`` — globs ``model/vmaf_hdr_*.json``;
  returns ``None`` when none shipped (current state — fork hasn't
  ported Netflix's HDR model yet).

Corpus driver wiring:

- ``CorpusOptions.hdr_mode`` ∈ {``auto``, ``force-sdr``,
  ``force-hdr-pq``, ``force-hdr-hlg``}; CLI flags
  ``--auto-hdr`` / ``--force-sdr`` / ``--force-hdr-pq`` /
  ``--force-hdr-hlg`` (mutually exclusive). Auto is the default.
- New schema-v2 row keys ``hdr_transfer`` / ``hdr_primaries`` /
  ``hdr_forced``; ``SCHEMA_VERSION`` bumped 1 → 2. Phase B / C
  loaders treat missing keys as SDR (additive change, v1 rows
  remain readable).
- ``score._model_arg`` now passes pre-formatted ``path=`` /
  ``version=`` strings through unchanged so the HDR model path can
  be injected via ``vmaf --model``.
- HDR detected but no HDR model shipped → log warning, fall back
  to SDR model with notice that scores trend low.

Tests (``tools/vmaf-tune/tests/test_hdr.py``, 21 cases):

- detection: SDR / PQ / HLG / mismatched-primaries / missing-file /
  ffprobe-failure / invalid-JSON
- codec dispatch: shape per encoder (x264, x265 PQ + HLG, SVT-AV1
  PQ + HLG, NVENC HEVC, unknown encoder)
- model resolution: empty dir / shipped / multi-version pick-latest
  / missing dir
- corpus integration: end-to-end ``force-hdr-pq`` (verify HDR fields
  in row + ``-color_*`` in encode argv) and ``force-sdr``

ADR-0261 (Accepted, encode-side; HDR-VMAF scoring deferred until
fork-local model port). Research-0054 digest, rebase-notes 0261,
AGENTS.md invariant note, docs/usage/vmaf-tune.md HDR section,
changelog fragment all included.

Co-authored-by: Lusoris <lusoris@pm.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants