Skip to content

feat(ai): fr_regressor_v2 codec-aware scaffold (Phase B prereq)#347

Merged
lusoris merged 2 commits intomasterfrom
feat/ai-fr-regressor-v2-codec-aware-scaffold
May 4, 2026
Merged

feat(ai): fr_regressor_v2 codec-aware scaffold (Phase B prereq)#347
lusoris merged 2 commits intomasterfrom
feat/ai-fr-regressor-v2-codec-aware-scaffold

Conversation

@lusoris
Copy link
Copy Markdown
Owner

@lusoris lusoris commented May 3, 2026

Summary

  • Scaffold-only ship of fr_regressor_v2 — codec-aware successor to v1
    (ADR-0249) and the first downstream consumer of the vmaf-tune corpus
    JSONL emitted by Phase A (ADR-0237). Adds
    ai/scripts/train_fr_regressor_v2.py
    with --corpus PATH and --smoke modes, plus the two-input ONNX
    export plumbing that mirrors the LPIPS-Sq precedent (ADR-0040 /
    ADR-0041).
  • Two-input shape: features (N, 6) canonical-6 libvmaf features
    (adm2, vif_scale0..3, motion2, StandardScaler-normalised) +
    codec (N, 8) block — [encoder_onehot(6), preset_norm, crf_norm],
    both normalised into [0, 1]. Re-uses
    FRRegressor(num_codecs=8) plumbed by ADR-0235 rather than minting
    a new class.
  • Registry row registers fr_regressor_v2 with smoke: true. The
    shipped ONNX is the --smoke output (synthetic 100-row corpus, 1
    epoch); it is a load-path probe, not a quality model. Production
    training run is a follow-up PR gated on a multi-codec Phase A
    corpus + per-frame feature emission + clearing v1's 0.95 LOSO PLCC
    ship floor with the ≥0.005 multi-codec lift required by ADR-0235.

Six deep-dive deliverables (ADR-0108)

Test plan

  • python ai/scripts/train_fr_regressor_v2.py --smoke produces a
    valid opset-17 two-input ONNX, op-allowlist clean, torch-vs-ORT
    roundtrip within 1e-4 atol (ran locally).
  • python ai/scripts/validate_model_registry.py — 11 entries
    valid against registry.schema.json.
  • pre-commit run --files <changed> — Passed (black / isort /
    ruff / json-check / secrets / semgrep).
  • markdownlint-cli2 on all new docs — 0 errors.
  • Production training run on a real Phase A corpus — deferred,
    tracked as backlog item T7-FR-REGRESSOR-V2-PROD (per ADR-0261
    ## Consequences).

🤖 Generated with Claude Code

@lusoris lusoris force-pushed the feat/ai-fr-regressor-v2-codec-aware-scaffold branch from 2c33aa3 to 6737537 Compare May 3, 2026 19:02
lusoris pushed a commit that referenced this pull request May 3, 2026
Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage
gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the
Netflix per-title encoding paper: sample (resolution × target-VMAF),
take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along
the hull, emit an HLS / DASH / JSON manifest.

Currently scaffold-only: the production sampler that drives Phase B's
target-VMAF bisect (PR #347) lands once that PR merges. Default sampler
raises NotImplementedError; tests inject a synthetic stub modelled on
the Netflix paper's R-D curves.

- New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder,
  convex_hull (Pareto filter + diminishing-returns envelope),
  select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH
  / JSON), and a build_and_emit convenience.
- New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung
  1080p/720p/480p/360p/240p default rendition set.
- 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull
  correctness on a synthetic Netflix-paper-shaped cloud, knee selection
  invariants, and HLS / DASH / JSON manifest emit shape.
- ADR-0277 (Proposed; flips to Accepted once Phase B integration PR
  lands and a real-corpus PLCC validation digest reports the delta).
- Research-0054 surveys the algorithm space (Netflix per-title paper,
  Apple HLS authoring spec, JND-spaced, BO sampling).
- docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section
  with the canonical invocation.
- CHANGELOG, rebase-notes (#229), AGENTS.md invariant note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 3, 2026
…al scaffold)

Adds a probabilistic head on top of the codec-aware fr_regressor_v2
(parent: ADR-0272 / PR #347 in flight) so producers can drive the
in-flight `vmaf-tune --quality-confidence 0.95` flag (ADR-0237) off a
calibrated prediction interval instead of v2's bare MOS scalar. PR #354
audit Bucket #18 (top-3 ranked).

Trainer (`ai/scripts/train_fr_regressor_v2_ensemble.py`) trains N=5
copies of the v2 architecture (`FRRegressor(num_codecs=NUM_CODECS)`)
under distinct seeds, exports each as a separate two-input ONNX
(`features [N, 6]` + `codec_onehot [N, NUM_CODECS]`), and writes an
ensemble manifest sidecar that pins per-member sha256s, feature
standardisation, codec vocab, nominal coverage, and an optional
split-conformal residual quantile from a held-out calibration split.
Inference rule is `mu ± q · σ` with `q = 1.96` (Gaussian) or the
empirical conformal quantile (Vovk 2005, Romano 2019 — distribution-free
marginal coverage on exchangeable data).

Evaluator (`ai/scripts/eval_probabilistic_proxy.py`) reports empirical
coverage at 50/80/95 % nominal levels, mean interval width, and the
mean-prediction PLCC; reports the conformal-interval row when the
manifest carries a conformal scalar.

Smoke-only ship: synthetic 100-row corpus, 1 epoch / member. Production
training is gated on the multi-codec Phase A corpus (T7-FR-REGRESSOR-V2-PROBABILISTIC).

Six ADR-0108 deliverables:
1. Research digest: docs/research/0054-fr-regressor-v2-probabilistic.md.
2. Decision matrix: ADR-0279 § Alternatives considered.
3. AGENTS.md invariant note: appended to ai/AGENTS.md.
4. Reproducer: `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke`
   followed by `python ai/scripts/eval_probabilistic_proxy.py --smoke`.
5. CHANGELOG ### Added entry under Unreleased — lusoris fork.
6. Rebase-notes entry: ### 0229 in docs/rebase-notes.md.

Test plan:
- `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` produces
  5 valid two-input ONNX members + manifest sidecar (ran locally).
- `python ai/scripts/eval_probabilistic_proxy.py --smoke` aggregates the
  5 ONNX outputs into (mu, sigma) and reports coverage at 50/80/95 %.
- `python ai/scripts/validate_model_registry.py` → 15 entries valid.
- `pre-commit run --files <changed>` → Passed (black / isort / ruff /
  json-check / secrets / semgrep).
- `markdownlint-cli2` on all new docs → 0 errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Scaffold-only ship of `fr_regressor_v2` — the codec-aware successor to
`fr_regressor_v1` (ADR-0249) and the first downstream consumer of the
`vmaf-tune corpus` JSONL emitted by Phase A (ADR-0237). Adds the
training script, smoke-mode synthetic-corpus path, two-input ONNX
export plumbing (mirrors LPIPS-Sq pattern from ADR-0040 / ADR-0041),
sidecar JSON, registry row gated `smoke: true`, and the full
ADR-0042 / ADR-0108 doc surface (model card, research digest, ADR,
AGENTS.md invariant note, rebase-notes entry, CHANGELOG).

Two-input shape: `features` (N, 6) canonical-6 libvmaf features +
`codec` (N, 8) block — `[encoder_onehot(6), preset_norm, crf_norm]`,
both normalised into `[0, 1]`. ENCODER_VOCAB is closed and ordered
(libx264, libx265, libsvtav1, libvvenc, libvpx-vp9, unknown); CRF
normalised by 63 (union upper bound across encoders); preset by 9.
Re-uses `FRRegressor(num_codecs=8)` plumbed by ADR-0235 rather than
minting a new class.

`--smoke` mode synthesises 100 fake corpus rows and trains 1 epoch so
the pipeline is end-to-end exercisable (JSONL ingest → 9-D
materialisation → MLP train → ONNX export → op-allowlist check →
torch-vs-ORT roundtrip) without burning hours on a real Phase A run.
The shipped ONNX is the smoke output and is registered with
`smoke: true` so the quality-metric harness skips it. Production
training run is a follow-up PR (T7-FR-REGRESSOR-V2-PROD) gated on
(1) a multi-codec Phase A corpus with ≥50 refs / ≥5 encoders, (2)
per-frame feature emission in the Phase A schema, and (3) clearing
v1's 0.95 LOSO PLCC ship floor with the ≥0.005 multi-codec lift
required by ADR-0235.

Reproducer: `python ai/scripts/train_fr_regressor_v2.py --smoke`

Refs: ADR-0261, ADR-0235, ADR-0237, ADR-0249, Research-0054.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lusoris lusoris force-pushed the feat/ai-fr-regressor-v2-codec-aware-scaffold branch from 6737537 to 7bfdd33 Compare May 4, 2026 13:37
@lusoris lusoris marked this pull request as ready for review May 4, 2026 13:37
Copilot AI review requested due to automatic review settings May 4, 2026 13:37
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Scaffold-only addition of a codec-aware tiny-AI full-reference regressor (fr_regressor_v2) intended as the first downstream consumer of the vmaf-tune Phase A JSONL corpus, including a smoke-trained ONNX + sidecar metadata and accompanying ADR/research/model-card documentation.

Changes:

  • Add ai/scripts/train_fr_regressor_v2.py to train/export a two-input ONNX (features + codec) with a --smoke mode.
  • Register the new smoke model in model/tiny/registry.json and add the model/tiny/fr_regressor_v2.json sidecar.
  • Add/extend docs (ADR, research digest, model card, rebase notes, AGENTS note, changelog entry) describing the new contract.

Reviewed changes

Copilot reviewed 10 out of 12 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
ai/scripts/train_fr_regressor_v2.py New trainer/exporter for fr_regressor_v2 with smoke mode and registry/sidecar writing.
model/tiny/registry.json Adds a fr_regressor_v2 registry entry marked smoke: true.
model/tiny/fr_regressor_v2.json New sidecar describing the model inputs, scaling, codec vocab, and training metadata.
docs/adr/0272-fr-regressor-v2-codec-aware-scaffold.md New ADR documenting the scaffold decision and contract.
docs/adr/README.md Adds ADR index row for ADR-0272 (but file is generated).
docs/research/0058-fr-regressor-v2-feasibility.md New feasibility digest (ID/title mismatch noted).
docs/ai/models/fr_regressor_v2.md New model card describing inputs/outputs, corpus expectations, and usage.
ai/AGENTS.md Records load-bearing invariants for the codec block and vocab ordering.
docs/rebase-notes.md Adds a rebase-notes entry for the scaffold landing.
CHANGELOG.md Adds an Unreleased entry (but changelog is fragment-generated).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CHANGELOG.md
Comment on lines +11 to +35
- **`fr_regressor_v2` codec-aware scaffold — first downstream consumer
of the vmaf-tune Phase A JSONL corpus (ADR-0272, prereq for
Phase B).** Ships
[`ai/scripts/train_fr_regressor_v2.py`](ai/scripts/train_fr_regressor_v2.py)
— a scaffold-only trainer that consumes the JSONL corpus emitted by
`vmaf-tune corpus` (ADR-0237 Phase A) and trains the codec-aware
variant of the v1 FR regressor. Two-input ONNX (`features` shape
`(N, 6)` canonical-6 + `codec` shape `(N, 8)` block —
`[encoder_onehot(6), preset_norm, crf_norm]`); reuses the existing
`FRRegressor(num_codecs=8)` class plumbed by ADR-0235. A `--smoke`
mode synthesises 100 fake corpus rows and trains 1 epoch so the
pipeline is end-to-end exercisable in CI without hours of encode
time. Registers `fr_regressor_v2` in `model/tiny/registry.json`
with `smoke: true` until a follow-up PR runs production training on
a real Phase A corpus and clears the ADR-0235 ship gate (≥0.005
multi-codec PLCC lift over v1's 0.95 LOSO floor). Doc surface:
[model card](docs/ai/models/fr_regressor_v2.md),
[research digest](docs/research/0058-fr-regressor-v2-feasibility.md),
[ADR-0272](docs/adr/0272-fr-regressor-v2-codec-aware-scaffold.md),
`ai/AGENTS.md` invariant note pinning the codec block layout and
encoder vocabulary. Smoke validated locally (`python
ai/scripts/train_fr_regressor_v2.py --smoke` produces a valid
opset-17 two-input ONNX, op-allowlist clean, torch-vs-ORT roundtrip
within 1e-4 atol). No upstream-mirror file touched; pure additive
fork-local PR.
Comment thread docs/adr/README.md
Comment on lines +274 to +276
| [ADR-0259](0259-hip-third-consumer-ciede.md) | T7-10b third-consumer PR — `ciede_hip` host scaffolding via the kernel-template mirror established by [ADR-0241](0241-hip-first-consumer-psnr.md). Ships [`libvmaf/src/feature/hip/ciede_hip.{c,h}`](../../libvmaf/src/feature/hip/ciede_hip.c) — mirrors `libvmaf/src/feature/cuda/integer_ciede_cuda.c`'s init/submit/collect/close call graph verbatim, including the **intentional bypass** of `submit_pre_launch` (ciede's kernel writes one float per block, no atomic, no memset required). Same scaffold posture as ADR-0241 / ADR-0254: registration succeeds, `init()` returns `-ENOSYS` until T7-10b flips the kernel-template helper bodies to real HIP calls. New `vmaf_fex_ciede_hip` row in `feature_extractor_list` under `#if HAVE_HIP`; `VMAF_FEATURE_EXTRACTOR_HIP` flag stays cleared. Smoke test grows by one sub-test (`test_ciede_hip_extractor_registered`). Pins the kernel-template's "no-memset bypass" path so the runtime PR can flip helper bodies without inventing a new template variant for ciede. Picks `integer_ciede_cuda` (243 LOC) over `integer_motion_cuda` (503 LOC, stateful) and `float_ansnr_cuda` (298 LOC, duplicates ADR-0254's precision posture). | Accepted | gpu, hip, rocm, amd, kernel-template, fork-local |
| [ADR-0272](0272-fr-regressor-v2-codec-aware-scaffold.md) | `fr_regressor_v2` codec-aware scaffold — first downstream consumer of the vmaf-tune Phase A JSONL corpus ([ADR-0237](0237-quality-aware-encode-automation.md)). Ships [`ai/scripts/train_fr_regressor_v2.py`](../../ai/scripts/train_fr_regressor_v2.py), a smoke ONNX (`fr_regressor_v2.onnx` registered with `smoke: true`), sidecar JSON, and full doc surface ([model card](../ai/models/fr_regressor_v2.md), [research digest](../research/0058-fr-regressor-v2-feasibility.md)). Two-input ONNX: 6 canonical libvmaf features (`adm2`, `vif_scale0..3`, `motion2`, StandardScaler-normalised) + 8-D codec block (6-way encoder one-hot + preset_norm + crf_norm, both in `[0, 1]`). MLP shape `6 -> 16 -> 16 -> 1` with codec block concatenated before the first dense layer (matches the existing `FRRegressor(num_codecs=8)` plumbing landed by [ADR-0235](0235-codec-aware-fr-regressor.md)). Registry row stays `smoke: true` until a follow-up PR (T7-FR-REGRESSOR-V2-PROD) re-runs training on a real Phase A corpus and clears v1's 0.95 LOSO PLCC ship gate with the ≥0.005 multi-codec lift required by ADR-0235. | Proposed | ai, dnn, tiny-ai, fr-regressor, codec-aware, vmaf-tune, fork-local |
| [ADR-0260](0260-hip-fourth-consumer-float-moment.md) | T7-10b fourth-consumer PR (sibling to ADR-0259) — `float_moment_hip` host scaffolding via the kernel-template mirror. Ships [`libvmaf/src/feature/hip/float_moment_hip.{c,h}`](../../libvmaf/src/feature/hip/float_moment_hip.c) — mirrors `libvmaf/src/feature/cuda/integer_moment_cuda.c`'s call graph verbatim with the four-uint64 atomic-counter readback (`MOMENT_HIP_COUNTERS = 4u`). Same scaffold posture: registration succeeds, `init()` returns `-ENOSYS` until T7-10b. New `vmaf_fex_float_moment_hip` row registers four `provided_features` (`float_moment_ref{1st,2nd}`, `float_moment_dis{1st,2nd}`); `VMAF_FEATURE_EXTRACTOR_HIP` flag stays cleared. Smoke test grows by one sub-test (`test_float_moment_hip_extractor_registered`). Pins the "memset multiple uint64 counters in one helper call" path so the runtime PR can implement `vmaf_hip_kernel_submit_pre_launch` as a single `hipMemsetAsync` of `rb.bytes`, knowing both the 1-counter (psnr_hip) and 4-counter (moment_hip) consumers exercise that code path. Picks `integer_moment_cuda` (230 LOC, smallest available CUDA twin) over `integer_motion_v2_cuda` (321 LOC, stateful) and `float_ansnr_cuda` (298 LOC, duplicates ADR-0254). | Accepted | gpu, hip, rocm, amd, kernel-template, fork-local |
@@ -0,0 +1,158 @@
# Research-0054: FR regressor v2 (codec-aware) feasibility
> in the Phase A schema, and (3) clearing v1's 0.95 LOSO PLCC ship
> threshold with a ≥0.005 multi-codec lift per
> [ADR-0235](../../adr/0235-codec-aware-fr-regressor.md). See
> [Research-0054](../../research/0058-fr-regressor-v2-feasibility.md).
Comment on lines +58 to +59
CRF up to 51; values above their per-encoder max are clipped at
read time.
Comment on lines +87 to +99
# Closed encoder vocabulary. Order is load-bearing — index baked into
# the trained ONNX. Append-only; bump SCHEMA_VERSION to retrain.
ENCODER_VOCAB: tuple[str, ...] = (
"libx264",
"libx265",
"libsvtav1",
"libvvenc",
"libvpx-vp9",
"unknown",
)
ENCODER_VOCAB_VERSION = 1
N_ENCODERS = len(ENCODER_VOCAB)
UNKNOWN_ENCODER_INDEX = ENCODER_VOCAB.index("unknown")
Comment on lines +197 to +210
pf = row.get("per_frame_features") or {}
canon = np.zeros(6, dtype=np.float32)
have_pf = False
for i, name in enumerate(CANONICAL6):
if name in pf:
canon[i] = float(pf[name])
have_pf = True
if not have_pf and warn_missing:
# Phase A's current schema does not emit per-frame features —
# the corpus stores aggregate vmaf_score only. The smoke path
# uses synthetic features; real corpora will need a Phase A
# follow-up to attach per-frame features (tracked in ADR-0272).
pass

Comment on lines +211 to +221
enc_idx = _encoder_index(row.get("encoder"))
preset_norm = _preset_ordinal(str(row.get("encoder", "unknown")), row.get("preset", "medium"))
crf = row.get("crf", 23)
crf_norm = float(crf) / CRF_MAX

codec_block = np.concatenate(
[
_encoder_onehot(enc_idx),
np.asarray([preset_norm, crf_norm], dtype=np.float32),
]
)
Comment on lines +599 to +604
print(f"[fr-v2] materialising {len(rows)} rows -> 9-D feature space", flush=True)
x_canon, x_codec, y = _materialise(rows)
print(
f"[fr-v2] shapes: canon={x_canon.shape} codec={x_codec.shape} y={y.shape} "
f"(canonical6={x_canon.shape[1]}, codec_block={x_codec.shape[1]})",
flush=True,
Comment on lines +526 to +574
def main() -> int:
ap = argparse.ArgumentParser(prog="train_fr_regressor_v2.py")
ap.add_argument(
"--corpus",
type=Path,
default=None,
help="Path to a vmaf-tune Phase A JSONL corpus. Mutually exclusive with --smoke.",
)
ap.add_argument(
"--smoke",
action="store_true",
help="Synthesise 100 fake corpus rows and train 1 epoch. Pipeline validation only.",
)
ap.add_argument("--epochs", type=int, default=30)
ap.add_argument("--batch-size", type=int, default=64)
ap.add_argument("--lr", type=float, default=1e-3)
ap.add_argument("--weight-decay", type=float, default=1e-5)
ap.add_argument(
"--hidden",
type=int,
default=16,
help="MLP hidden width. v2 default 16 (matches the user's 6->16->8->1 spec).",
)
ap.add_argument("--depth", type=int, default=2)
ap.add_argument("--seed", type=int, default=0)
ap.add_argument(
"--out-onnx",
type=Path,
default=REPO_ROOT / "model" / "tiny" / "fr_regressor_v2.onnx",
)
ap.add_argument(
"--out-sidecar",
type=Path,
default=REPO_ROOT / "model" / "tiny" / "fr_regressor_v2.json",
)
ap.add_argument(
"--registry",
type=Path,
default=REPO_ROOT / "model" / "tiny" / "registry.json",
)
ap.add_argument(
"--metrics-out",
type=Path,
default=REPO_ROOT / "runs" / "fr_regressor_v2_metrics.json",
)
ap.add_argument(
"--no-export", action="store_true", help="Skip ONNX export + registry update (dev mode)."
)
args = ap.parse_args()
Co-Authored-By: Claude <noreply@anthropic.com>
@lusoris lusoris merged commit b40d63a into master May 4, 2026
54 of 55 checks passed
@lusoris lusoris deleted the feat/ai-fr-regressor-v2-codec-aware-scaffold branch May 4, 2026 14:50
lusoris pushed a commit that referenced this pull request May 4, 2026
…old)

Phase A.5 of `tools/vmaf-tune/` (ADR-0276 Proposed, Research-0060). Adds
an opt-in `vmaf-tune fast` subcommand that combines three acceleration
levers — VMAF proxy via `fr_regressor_v2` (ADR-0272), Bayesian search
via Optuna's TPE sampler, GPU-accelerated VMAF verify (ADR-0157,
ADR-0186) — to collapse the recommendation use case from the Phase A
grid's hours-long wall-time to seconds-to-minutes (~20-50× without
NVENC, ~100-500× with NVENC follow-up).

Slow Phase A grid stays canonical as the ground-truth corpus generator
(ADR-0237 contract); fast-path is opt-in via `pip install
vmaf-tune[fast]`. This PR ships the scaffold only — Optuna search loop,
smoke-mode synthetic predictor, CLI subcommand, production-shape entry
point. Real encode + ONNX inference + GPU verify wiring is a follow-up
PR gated on Phase A corpus existence and `fr_regressor_v2` weights
training (PR #347).

Smoke test: `vmaf-tune fast --smoke --target-vmaf 92` — runs Optuna
over a synthetic x264-shaped CRF→VMAF curve without ffmpeg, ONNX
Runtime, or a GPU. 5 new tests in `tests/test_fast.py`; full
`tools/vmaf-tune/tests/` suite is 18/18 green.

ADR-0108 deliverables:
- (1) Research digest: `docs/research/0060-vmaf-tune-fast-path.md`
- (2) Decision matrix: ADR-0276 §Alternatives considered
- (3) AGENTS.md invariants: `tools/vmaf-tune/AGENTS.md` (fast-path is
      opt-in; Optuna stays lazy-imported)
- (4) Reproducer: `vmaf-tune fast --smoke --target-vmaf 92` (in PR body)
- (5) CHANGELOG fragment: `changelog.d/added/vmaf-tune-fast-path-scaffold.md`
- (6) Rebase-notes entry: 0229 (no upstream impact; entirely fork-local)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 4, 2026
…old)

Phase A.5 of `tools/vmaf-tune/` (ADR-0276 Proposed, Research-0060). Adds
an opt-in `vmaf-tune fast` subcommand that combines three acceleration
levers — VMAF proxy via `fr_regressor_v2` (ADR-0272), Bayesian search
via Optuna's TPE sampler, GPU-accelerated VMAF verify (ADR-0157,
ADR-0186) — to collapse the recommendation use case from the Phase A
grid's hours-long wall-time to seconds-to-minutes (~20-50× without
NVENC, ~100-500× with NVENC follow-up).

Slow Phase A grid stays canonical as the ground-truth corpus generator
(ADR-0237 contract); fast-path is opt-in via `pip install
vmaf-tune[fast]`. This PR ships the scaffold only — Optuna search loop,
smoke-mode synthetic predictor, CLI subcommand, production-shape entry
point. Real encode + ONNX inference + GPU verify wiring is a follow-up
PR gated on Phase A corpus existence and `fr_regressor_v2` weights
training (PR #347).

Smoke test: `vmaf-tune fast --smoke --target-vmaf 92` — runs Optuna
over a synthetic x264-shaped CRF→VMAF curve without ffmpeg, ONNX
Runtime, or a GPU. 5 new tests in `tests/test_fast.py`; full
`tools/vmaf-tune/tests/` suite is 18/18 green.

ADR-0108 deliverables:
- (1) Research digest: `docs/research/0060-vmaf-tune-fast-path.md`
- (2) Decision matrix: ADR-0276 §Alternatives considered
- (3) AGENTS.md invariants: `tools/vmaf-tune/AGENTS.md` (fast-path is
      opt-in; Optuna stays lazy-imported)
- (4) Reproducer: `vmaf-tune fast --smoke --target-vmaf 92` (in PR body)
- (5) CHANGELOG fragment: `changelog.d/added/vmaf-tune-fast-path-scaffold.md`
- (6) Rebase-notes entry: 0229 (no upstream impact; entirely fork-local)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 4, 2026
…old) (#355)

* feat(tools): vmaf-tune fast — proxy-based recommend (research + scaffold)

Phase A.5 of `tools/vmaf-tune/` (ADR-0276 Proposed, Research-0060). Adds
an opt-in `vmaf-tune fast` subcommand that combines three acceleration
levers — VMAF proxy via `fr_regressor_v2` (ADR-0272), Bayesian search
via Optuna's TPE sampler, GPU-accelerated VMAF verify (ADR-0157,
ADR-0186) — to collapse the recommendation use case from the Phase A
grid's hours-long wall-time to seconds-to-minutes (~20-50× without
NVENC, ~100-500× with NVENC follow-up).

Slow Phase A grid stays canonical as the ground-truth corpus generator
(ADR-0237 contract); fast-path is opt-in via `pip install
vmaf-tune[fast]`. This PR ships the scaffold only — Optuna search loop,
smoke-mode synthetic predictor, CLI subcommand, production-shape entry
point. Real encode + ONNX inference + GPU verify wiring is a follow-up
PR gated on Phase A corpus existence and `fr_regressor_v2` weights
training (PR #347).

Smoke test: `vmaf-tune fast --smoke --target-vmaf 92` — runs Optuna
over a synthetic x264-shaped CRF→VMAF curve without ffmpeg, ONNX
Runtime, or a GPU. 5 new tests in `tests/test_fast.py`; full
`tools/vmaf-tune/tests/` suite is 18/18 green.

ADR-0108 deliverables:
- (1) Research digest: `docs/research/0060-vmaf-tune-fast-path.md`
- (2) Decision matrix: ADR-0276 §Alternatives considered
- (3) AGENTS.md invariants: `tools/vmaf-tune/AGENTS.md` (fast-path is
      opt-in; Optuna stays lazy-imported)
- (4) Reproducer: `vmaf-tune fast --smoke --target-vmaf 92` (in PR body)
- (5) CHANGELOG fragment: `changelog.d/added/vmaf-tune-fast-path-scaffold.md`
- (6) Rebase-notes entry: 0229 (no upstream impact; entirely fork-local)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(ci): trigger workflow re-run

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 5, 2026
Closes the orchestration layer for Bucket #1 of Research-0061's
`vmaf-tune` capability audit (the Netflix-style table-stakes per-shot
encoding feature). Ships `tools/vmaf-tune/src/vmaftune/per_shot.py`
plus the `vmaf-tune tune-per-shot` CLI subcommand:

* `detect_shots()` wraps the C-side `vmaf-perShot` binary
  (ADR-0222 / TransNet V2 ADR-0223) with a single-shot fallback
  when the binary is unavailable or fails.
* `tune_per_shot()` exposes a pluggable predicate seam Phase B's
  bisect (PR #347) drops into. Default predicate returns the codec
  adapter's default CRF so the scaffold round-trips before Phase B
  lands as code.
* `merge_shots()` emits one `ffmpeg` argv per shot (`-ss` +
  `-frames:v`) plus a final concat-demuxer command.

Scaffold-only — does not run encodes, does not yet emit native
per-codec mechanisms (`--qpfile` for x264, `--zones` for x265,
SVT-AV1 segment tables); per-segment + concat is the portable
fallback. Per-codec native emission lands per-codec alongside each
new adapter. 16 new tests pass with mocked `vmaf-perShot` + mocked
encoder; total `vmaf-tune` suite is 29 tests, zero binaries
required.

First per-phase split off ADR-0237. Updates ADR index, CHANGELOG,
docs/usage/vmaf-tune.md (new "Phase D" section + flag table + plan
JSON schema), tools/vmaf-tune/AGENTS.md (per-shot rebase invariants),
and docs/rebase-notes.md (entry 0228).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 5, 2026
)

* feat(tools): vmaf-tune Phase D scaffold — per-shot CRF tuning (ADR-0276)

Closes the orchestration layer for Bucket #1 of Research-0061's
`vmaf-tune` capability audit (the Netflix-style table-stakes per-shot
encoding feature). Ships `tools/vmaf-tune/src/vmaftune/per_shot.py`
plus the `vmaf-tune tune-per-shot` CLI subcommand:

* `detect_shots()` wraps the C-side `vmaf-perShot` binary
  (ADR-0222 / TransNet V2 ADR-0223) with a single-shot fallback
  when the binary is unavailable or fails.
* `tune_per_shot()` exposes a pluggable predicate seam Phase B's
  bisect (PR #347) drops into. Default predicate returns the codec
  adapter's default CRF so the scaffold round-trips before Phase B
  lands as code.
* `merge_shots()` emits one `ffmpeg` argv per shot (`-ss` +
  `-frames:v`) plus a final concat-demuxer command.

Scaffold-only — does not run encodes, does not yet emit native
per-codec mechanisms (`--qpfile` for x264, `--zones` for x265,
SVT-AV1 segment tables); per-segment + concat is the portable
fallback. Per-codec native emission lands per-codec alongside each
new adapter. 16 new tests pass with mocked `vmaf-perShot` + mocked
encoder; total `vmaf-tune` suite is 29 tests, zero binaries
required.

First per-phase split off ADR-0237. Updates ADR index, CHANGELOG,
docs/usage/vmaf-tune.md (new "Phase D" section + flag table + plan
JSON schema), tools/vmaf-tune/AGENTS.md (per-shot rebase invariants),
and docs/rebase-notes.md (entry 0228).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: re-trigger CI after research-digest opt-out

* fix(tools): close rec.add_argument paren before per_shot subparser block

---------

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 5, 2026
Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage
gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the
Netflix per-title encoding paper: sample (resolution × target-VMAF),
take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along
the hull, emit an HLS / DASH / JSON manifest.

Currently scaffold-only: the production sampler that drives Phase B's
target-VMAF bisect (PR #347) lands once that PR merges. Default sampler
raises NotImplementedError; tests inject a synthetic stub modelled on
the Netflix paper's R-D curves.

- New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder,
  convex_hull (Pareto filter + diminishing-returns envelope),
  select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH
  / JSON), and a build_and_emit convenience.
- New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung
  1080p/720p/480p/360p/240p default rendition set.
- 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull
  correctness on a synthetic Netflix-paper-shaped cloud, knee selection
  invariants, and HLS / DASH / JSON manifest emit shape.
- ADR-0277 (Proposed; flips to Accepted once Phase B integration PR
  lands and a real-corpus PLCC validation digest reports the delta).
- Research-0054 surveys the algorithm space (Netflix per-title paper,
  Apple HLS authoring spec, JND-spaced, BO sampling).
- docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section
  with the canonical invocation.
- CHANGELOG, rebase-notes (#229), AGENTS.md invariant note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 5, 2026
Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage
gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the
Netflix per-title encoding paper: sample (resolution × target-VMAF),
take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along
the hull, emit an HLS / DASH / JSON manifest.

Currently scaffold-only: the production sampler that drives Phase B's
target-VMAF bisect (PR #347) lands once that PR merges. Default sampler
raises NotImplementedError; tests inject a synthetic stub modelled on
the Netflix paper's R-D curves.

- New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder,
  convex_hull (Pareto filter + diminishing-returns envelope),
  select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH
  / JSON), and a build_and_emit convenience.
- New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung
  1080p/720p/480p/360p/240p default rendition set.
- 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull
  correctness on a synthetic Netflix-paper-shaped cloud, knee selection
  invariants, and HLS / DASH / JSON manifest emit shape.
- ADR-0277 (Proposed; flips to Accepted once Phase B integration PR
  lands and a real-corpus PLCC validation digest reports the delta).
- Research-0054 surveys the algorithm space (Netflix per-title paper,
  Apple HLS authoring spec, JND-spaced, BO sampling).
- docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section
  with the canonical invocation.
- CHANGELOG, rebase-notes (#229), AGENTS.md invariant note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 5, 2026
…er) (#371)

* feat(tools): vmaf-tune Phase E — per-title bitrate ladder (game-changer)

Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage
gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the
Netflix per-title encoding paper: sample (resolution × target-VMAF),
take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along
the hull, emit an HLS / DASH / JSON manifest.

Currently scaffold-only: the production sampler that drives Phase B's
target-VMAF bisect (PR #347) lands once that PR merges. Default sampler
raises NotImplementedError; tests inject a synthetic stub modelled on
the Netflix paper's R-D curves.

- New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder,
  convex_hull (Pareto filter + diminishing-returns envelope),
  select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH
  / JSON), and a build_and_emit convenience.
- New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung
  1080p/720p/480p/360p/240p default rendition set.
- 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull
  correctness on a synthetic Netflix-paper-shaped cloud, knee selection
  invariants, and HLS / DASH / JSON manifest emit shape.
- ADR-0277 (Proposed; flips to Accepted once Phase B integration PR
  lands and a real-corpus PLCC validation digest reports the delta).
- Research-0054 surveys the algorithm space (Netflix per-title paper,
  Apple HLS authoring spec, JND-spaced, BO sampling).
- docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section
  with the canonical invocation.
- CHANGELOG, rebase-notes (#229), AGENTS.md invariant note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(docs): renumber phase-e ADR 0277→0295 + research 0066→0068 (collisions)

---------

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 5, 2026
…al scaffold)

Adds a probabilistic head on top of the codec-aware fr_regressor_v2
(parent: ADR-0272 / PR #347 in flight) so producers can drive the
in-flight `vmaf-tune --quality-confidence 0.95` flag (ADR-0237) off a
calibrated prediction interval instead of v2's bare MOS scalar. PR #354
audit Bucket #18 (top-3 ranked).

Trainer (`ai/scripts/train_fr_regressor_v2_ensemble.py`) trains N=5
copies of the v2 architecture (`FRRegressor(num_codecs=NUM_CODECS)`)
under distinct seeds, exports each as a separate two-input ONNX
(`features [N, 6]` + `codec_onehot [N, NUM_CODECS]`), and writes an
ensemble manifest sidecar that pins per-member sha256s, feature
standardisation, codec vocab, nominal coverage, and an optional
split-conformal residual quantile from a held-out calibration split.
Inference rule is `mu ± q · σ` with `q = 1.96` (Gaussian) or the
empirical conformal quantile (Vovk 2005, Romano 2019 — distribution-free
marginal coverage on exchangeable data).

Evaluator (`ai/scripts/eval_probabilistic_proxy.py`) reports empirical
coverage at 50/80/95 % nominal levels, mean interval width, and the
mean-prediction PLCC; reports the conformal-interval row when the
manifest carries a conformal scalar.

Smoke-only ship: synthetic 100-row corpus, 1 epoch / member. Production
training is gated on the multi-codec Phase A corpus (T7-FR-REGRESSOR-V2-PROBABILISTIC).

Six ADR-0108 deliverables:
1. Research digest: docs/research/0054-fr-regressor-v2-probabilistic.md.
2. Decision matrix: ADR-0279 § Alternatives considered.
3. AGENTS.md invariant note: appended to ai/AGENTS.md.
4. Reproducer: `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke`
   followed by `python ai/scripts/eval_probabilistic_proxy.py --smoke`.
5. CHANGELOG ### Added entry under Unreleased — lusoris fork.
6. Rebase-notes entry: ### 0229 in docs/rebase-notes.md.

Test plan:
- `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` produces
  5 valid two-input ONNX members + manifest sidecar (ran locally).
- `python ai/scripts/eval_probabilistic_proxy.py --smoke` aggregates the
  5 ONNX outputs into (mu, sigma) and reports coverage at 50/80/95 %.
- `python ai/scripts/validate_model_registry.py` → 15 entries valid.
- `pre-commit run --files <changed>` → Passed (black / isort / ruff /
  json-check / secrets / semgrep).
- `markdownlint-cli2` on all new docs → 0 errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 5, 2026
…al scaffold) (#372)

* feat(ai): fr_regressor_v2 probabilistic head (deep-ensemble + conformal scaffold)

Adds a probabilistic head on top of the codec-aware fr_regressor_v2
(parent: ADR-0272 / PR #347 in flight) so producers can drive the
in-flight `vmaf-tune --quality-confidence 0.95` flag (ADR-0237) off a
calibrated prediction interval instead of v2's bare MOS scalar. PR #354
audit Bucket #18 (top-3 ranked).

Trainer (`ai/scripts/train_fr_regressor_v2_ensemble.py`) trains N=5
copies of the v2 architecture (`FRRegressor(num_codecs=NUM_CODECS)`)
under distinct seeds, exports each as a separate two-input ONNX
(`features [N, 6]` + `codec_onehot [N, NUM_CODECS]`), and writes an
ensemble manifest sidecar that pins per-member sha256s, feature
standardisation, codec vocab, nominal coverage, and an optional
split-conformal residual quantile from a held-out calibration split.
Inference rule is `mu ± q · σ` with `q = 1.96` (Gaussian) or the
empirical conformal quantile (Vovk 2005, Romano 2019 — distribution-free
marginal coverage on exchangeable data).

Evaluator (`ai/scripts/eval_probabilistic_proxy.py`) reports empirical
coverage at 50/80/95 % nominal levels, mean interval width, and the
mean-prediction PLCC; reports the conformal-interval row when the
manifest carries a conformal scalar.

Smoke-only ship: synthetic 100-row corpus, 1 epoch / member. Production
training is gated on the multi-codec Phase A corpus (T7-FR-REGRESSOR-V2-PROBABILISTIC).

Six ADR-0108 deliverables:
1. Research digest: docs/research/0054-fr-regressor-v2-probabilistic.md.
2. Decision matrix: ADR-0279 § Alternatives considered.
3. AGENTS.md invariant note: appended to ai/AGENTS.md.
4. Reproducer: `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke`
   followed by `python ai/scripts/eval_probabilistic_proxy.py --smoke`.
5. CHANGELOG ### Added entry under Unreleased — lusoris fork.
6. Rebase-notes entry: ### 0229 in docs/rebase-notes.md.

Test plan:
- `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` produces
  5 valid two-input ONNX members + manifest sidecar (ran locally).
- `python ai/scripts/eval_probabilistic_proxy.py --smoke` aggregates the
  5 ONNX outputs into (mu, sigma) and reports coverage at 50/80/95 %.
- `python ai/scripts/validate_model_registry.py` → 15 entries valid.
- `pre-commit run --files <changed>` → Passed (black / isort / ruff /
  json-check / secrets / semgrep).
- `markdownlint-cli2` on all new docs → 0 errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(registry): split fr_regressor_v2 + ensemble_seed0 into distinct entries

---------

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 6, 2026
… gap)

`tools/vmaf-tune/src/vmaftune/ladder.py::_default_sampler` no longer
raises `NotImplementedError`. It composes Phase A's
`corpus.iter_rows` (encode + score) with the Phase B-equivalent
`recommend.pick_target_vmaf` predicate (smallest CRF whose VMAF clears
the target) over the canonical 5-point CRF sweep
`DEFAULT_SAMPLER_CRF_SWEEP = (18, 23, 28, 33, 38)` at the codec
adapter's mid-range preset (`"medium"` for libx264 / libx265 /
libsvtav1).

The `SamplerFn` seam stays open. Callers needing a finer grid, a
Bayesian bisect, or a precomputed corpus stream pass an explicit
`sampler=`. Tests stub `iter_rows` via `monkeypatch.setattr`; no live
ffmpeg / vmaf binaries are required.

Closes the Phase B/E gap left by ADR-0295. The original raise
docstring claimed PR #347 was Phase B's bisect — it was not (PR #347
shipped the `fr_regressor_v2` codec-aware scaffold). The actual
Phase B-equivalent (`recommend.pick_target_vmaf` + `corpus.iter_rows`)
shipped via ADR-0306, so the missing piece is a small composition.

ADR-0307 + Research-0079 + AGENTS.md invariant + rebase-notes §0307
+ changelog fragment land in the same PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 6, 2026
… gap) (#404)

`tools/vmaf-tune/src/vmaftune/ladder.py::_default_sampler` no longer
raises `NotImplementedError`. It composes Phase A's
`corpus.iter_rows` (encode + score) with the Phase B-equivalent
`recommend.pick_target_vmaf` predicate (smallest CRF whose VMAF clears
the target) over the canonical 5-point CRF sweep
`DEFAULT_SAMPLER_CRF_SWEEP = (18, 23, 28, 33, 38)` at the codec
adapter's mid-range preset (`"medium"` for libx264 / libx265 /
libsvtav1).

The `SamplerFn` seam stays open. Callers needing a finer grid, a
Bayesian bisect, or a precomputed corpus stream pass an explicit
`sampler=`. Tests stub `iter_rows` via `monkeypatch.setattr`; no live
ffmpeg / vmaf binaries are required.

Closes the Phase B/E gap left by ADR-0295. The original raise
docstring claimed PR #347 was Phase B's bisect — it was not (PR #347
shipped the `fr_regressor_v2` codec-aware scaffold). The actual
Phase B-equivalent (`recommend.pick_target_vmaf` + `corpus.iter_rows`)
shipped via ADR-0306, so the missing piece is a small composition.

ADR-0307 + Research-0079 + AGENTS.md invariant + rebase-notes §0307
+ changelog fragment land in the same PR.

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 6, 2026
Closes the orchestration layer for Bucket #1 of Research-0061's
`vmaf-tune` capability audit (the Netflix-style table-stakes per-shot
encoding feature). Ships `tools/vmaf-tune/src/vmaftune/per_shot.py`
plus the `vmaf-tune tune-per-shot` CLI subcommand:

* `detect_shots()` wraps the C-side `vmaf-perShot` binary
  (ADR-0222 / TransNet V2 ADR-0223) with a single-shot fallback
  when the binary is unavailable or fails.
* `tune_per_shot()` exposes a pluggable predicate seam Phase B's
  bisect (PR #347) drops into. Default predicate returns the codec
  adapter's default CRF so the scaffold round-trips before Phase B
  lands as code.
* `merge_shots()` emits one `ffmpeg` argv per shot (`-ss` +
  `-frames:v`) plus a final concat-demuxer command.

Scaffold-only — does not run encodes, does not yet emit native
per-codec mechanisms (`--qpfile` for x264, `--zones` for x265,
SVT-AV1 segment tables); per-segment + concat is the portable
fallback. Per-codec native emission lands per-codec alongside each
new adapter. 16 new tests pass with mocked `vmaf-perShot` + mocked
encoder; total `vmaf-tune` suite is 29 tests, zero binaries
required.

First per-phase split off ADR-0237. Updates ADR index, CHANGELOG,
docs/usage/vmaf-tune.md (new "Phase D" section + flag table + plan
JSON schema), tools/vmaf-tune/AGENTS.md (per-shot rebase invariants),
and docs/rebase-notes.md (entry 0228).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 6, 2026
* feat(tools): vmaf-tune Phase D scaffold — per-shot CRF tuning (ADR-0276)

Closes the orchestration layer for Bucket #1 of Research-0061's
`vmaf-tune` capability audit (the Netflix-style table-stakes per-shot
encoding feature). Ships `tools/vmaf-tune/src/vmaftune/per_shot.py`
plus the `vmaf-tune tune-per-shot` CLI subcommand:

* `detect_shots()` wraps the C-side `vmaf-perShot` binary
  (ADR-0222 / TransNet V2 ADR-0223) with a single-shot fallback
  when the binary is unavailable or fails.
* `tune_per_shot()` exposes a pluggable predicate seam Phase B's
  bisect (PR #347) drops into. Default predicate returns the codec
  adapter's default CRF so the scaffold round-trips before Phase B
  lands as code.
* `merge_shots()` emits one `ffmpeg` argv per shot (`-ss` +
  `-frames:v`) plus a final concat-demuxer command.

Scaffold-only — does not run encodes, does not yet emit native
per-codec mechanisms (`--qpfile` for x264, `--zones` for x265,
SVT-AV1 segment tables); per-segment + concat is the portable
fallback. Per-codec native emission lands per-codec alongside each
new adapter. 16 new tests pass with mocked `vmaf-perShot` + mocked
encoder; total `vmaf-tune` suite is 29 tests, zero binaries
required.

First per-phase split off ADR-0237. Updates ADR index, CHANGELOG,
docs/usage/vmaf-tune.md (new "Phase D" section + flag table + plan
JSON schema), tools/vmaf-tune/AGENTS.md (per-shot rebase invariants),
and docs/rebase-notes.md (entry 0228).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: re-trigger CI after research-digest opt-out

* fix(vmaf-tune): import json + adapter-aware quality_range test (post-rebase)

Two fixes the post-rebase Phase-D branch needs to pass against the
new master:

* Master's cli.py uses ``json.dumps`` in ``_run_predict`` but the
  module was missing ``import json``. The pre-rebase Phase D branch's
  ``_run_tune_per_shot`` also uses it. Add the top-level
  ``import json`` so both functions work and master's pre-existing
  bug clears at the same time.
* ``test_tune_per_shot_clamps_to_codec_quality_range`` hard-coded
  the clamp window ``[15, 40]`` from the pre-rebase x264 adapter.
  Master changed libx264's ``quality_range`` to ``(0, 51)`` (the
  full encoder range, ADR-0306 coarse-to-fine domain). Read the
  range from ``get_adapter("libx264").quality_range`` so the test
  tracks the adapter's source of truth instead of duplicating the
  literal — also future-proofs against further range tweaks.

All 16 per-shot tests now pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(vmaf-tune): CHANGELOG fragment for tune-per-shot CLI

ADR-0108 deliverables-checklist gate on PR #431 flagged the missing
changelog fragment. Add it under `changelog.d/added/` per the
ADR-0221 fragment pattern so the next release-please rendering picks
up the entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 7, 2026
Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage
gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the
Netflix per-title encoding paper: sample (resolution × target-VMAF),
take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along
the hull, emit an HLS / DASH / JSON manifest.

Currently scaffold-only: the production sampler that drives Phase B's
target-VMAF bisect (PR #347) lands once that PR merges. Default sampler
raises NotImplementedError; tests inject a synthetic stub modelled on
the Netflix paper's R-D curves.

- New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder,
  convex_hull (Pareto filter + diminishing-returns envelope),
  select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH
  / JSON), and a build_and_emit convenience.
- New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung
  1080p/720p/480p/360p/240p default rendition set.
- 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull
  correctness on a synthetic Netflix-paper-shaped cloud, knee selection
  invariants, and HLS / DASH / JSON manifest emit shape.
- ADR-0277 (Proposed; flips to Accepted once Phase B integration PR
  lands and a real-corpus PLCC validation digest reports the delta).
- Research-0054 surveys the algorithm space (Netflix per-title paper,
  Apple HLS authoring spec, JND-spaced, BO sampling).
- docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section
  with the canonical invocation.
- CHANGELOG, rebase-notes (#229), AGENTS.md invariant note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 7, 2026
…er) (#433)

Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage
gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the
Netflix per-title encoding paper: sample (resolution × target-VMAF),
take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along
the hull, emit an HLS / DASH / JSON manifest.

Currently scaffold-only: the production sampler that drives Phase B's
target-VMAF bisect (PR #347) lands once that PR merges. Default sampler
raises NotImplementedError; tests inject a synthetic stub modelled on
the Netflix paper's R-D curves.

- New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder,
  convex_hull (Pareto filter + diminishing-returns envelope),
  select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH
  / JSON), and a build_and_emit convenience.
- New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung
  1080p/720p/480p/360p/240p default rendition set.
- 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull
  correctness on a synthetic Netflix-paper-shaped cloud, knee selection
  invariants, and HLS / DASH / JSON manifest emit shape.
- ADR-0277 (Proposed; flips to Accepted once Phase B integration PR
  lands and a real-corpus PLCC validation digest reports the delta).
- Research-0054 surveys the algorithm space (Netflix per-title paper,
  Apple HLS authoring spec, JND-spaced, BO sampling).
- docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section
  with the canonical invocation.
- CHANGELOG, rebase-notes (#229), AGENTS.md invariant note.

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants