feat(ai): fr_regressor_v2 codec-aware scaffold (Phase B prereq)#347
Merged
feat(ai): fr_regressor_v2 codec-aware scaffold (Phase B prereq)#347
Conversation
2c33aa3 to
6737537
Compare
This was referenced May 3, 2026
lusoris
pushed a commit
that referenced
this pull request
May 3, 2026
Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
12 tasks
lusoris
pushed a commit
that referenced
this pull request
May 3, 2026
…al scaffold) Adds a probabilistic head on top of the codec-aware fr_regressor_v2 (parent: ADR-0272 / PR #347 in flight) so producers can drive the in-flight `vmaf-tune --quality-confidence 0.95` flag (ADR-0237) off a calibrated prediction interval instead of v2's bare MOS scalar. PR #354 audit Bucket #18 (top-3 ranked). Trainer (`ai/scripts/train_fr_regressor_v2_ensemble.py`) trains N=5 copies of the v2 architecture (`FRRegressor(num_codecs=NUM_CODECS)`) under distinct seeds, exports each as a separate two-input ONNX (`features [N, 6]` + `codec_onehot [N, NUM_CODECS]`), and writes an ensemble manifest sidecar that pins per-member sha256s, feature standardisation, codec vocab, nominal coverage, and an optional split-conformal residual quantile from a held-out calibration split. Inference rule is `mu ± q · σ` with `q = 1.96` (Gaussian) or the empirical conformal quantile (Vovk 2005, Romano 2019 — distribution-free marginal coverage on exchangeable data). Evaluator (`ai/scripts/eval_probabilistic_proxy.py`) reports empirical coverage at 50/80/95 % nominal levels, mean interval width, and the mean-prediction PLCC; reports the conformal-interval row when the manifest carries a conformal scalar. Smoke-only ship: synthetic 100-row corpus, 1 epoch / member. Production training is gated on the multi-codec Phase A corpus (T7-FR-REGRESSOR-V2-PROBABILISTIC). Six ADR-0108 deliverables: 1. Research digest: docs/research/0054-fr-regressor-v2-probabilistic.md. 2. Decision matrix: ADR-0279 § Alternatives considered. 3. AGENTS.md invariant note: appended to ai/AGENTS.md. 4. Reproducer: `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` followed by `python ai/scripts/eval_probabilistic_proxy.py --smoke`. 5. CHANGELOG ### Added entry under Unreleased — lusoris fork. 6. Rebase-notes entry: ### 0229 in docs/rebase-notes.md. Test plan: - `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` produces 5 valid two-input ONNX members + manifest sidecar (ran locally). - `python ai/scripts/eval_probabilistic_proxy.py --smoke` aggregates the 5 ONNX outputs into (mu, sigma) and reports coverage at 50/80/95 %. - `python ai/scripts/validate_model_registry.py` → 15 entries valid. - `pre-commit run --files <changed>` → Passed (black / isort / ruff / json-check / secrets / semgrep). - `markdownlint-cli2` on all new docs → 0 errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Scaffold-only ship of `fr_regressor_v2` — the codec-aware successor to `fr_regressor_v1` (ADR-0249) and the first downstream consumer of the `vmaf-tune corpus` JSONL emitted by Phase A (ADR-0237). Adds the training script, smoke-mode synthetic-corpus path, two-input ONNX export plumbing (mirrors LPIPS-Sq pattern from ADR-0040 / ADR-0041), sidecar JSON, registry row gated `smoke: true`, and the full ADR-0042 / ADR-0108 doc surface (model card, research digest, ADR, AGENTS.md invariant note, rebase-notes entry, CHANGELOG). Two-input shape: `features` (N, 6) canonical-6 libvmaf features + `codec` (N, 8) block — `[encoder_onehot(6), preset_norm, crf_norm]`, both normalised into `[0, 1]`. ENCODER_VOCAB is closed and ordered (libx264, libx265, libsvtav1, libvvenc, libvpx-vp9, unknown); CRF normalised by 63 (union upper bound across encoders); preset by 9. Re-uses `FRRegressor(num_codecs=8)` plumbed by ADR-0235 rather than minting a new class. `--smoke` mode synthesises 100 fake corpus rows and trains 1 epoch so the pipeline is end-to-end exercisable (JSONL ingest → 9-D materialisation → MLP train → ONNX export → op-allowlist check → torch-vs-ORT roundtrip) without burning hours on a real Phase A run. The shipped ONNX is the smoke output and is registered with `smoke: true` so the quality-metric harness skips it. Production training run is a follow-up PR (T7-FR-REGRESSOR-V2-PROD) gated on (1) a multi-codec Phase A corpus with ≥50 refs / ≥5 encoders, (2) per-frame feature emission in the Phase A schema, and (3) clearing v1's 0.95 LOSO PLCC ship floor with the ≥0.005 multi-codec lift required by ADR-0235. Reproducer: `python ai/scripts/train_fr_regressor_v2.py --smoke` Refs: ADR-0261, ADR-0235, ADR-0237, ADR-0249, Research-0054. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6737537 to
7bfdd33
Compare
There was a problem hiding this comment.
Pull request overview
Scaffold-only addition of a codec-aware tiny-AI full-reference regressor (fr_regressor_v2) intended as the first downstream consumer of the vmaf-tune Phase A JSONL corpus, including a smoke-trained ONNX + sidecar metadata and accompanying ADR/research/model-card documentation.
Changes:
- Add
ai/scripts/train_fr_regressor_v2.pyto train/export a two-input ONNX (features+codec) with a--smokemode. - Register the new smoke model in
model/tiny/registry.jsonand add themodel/tiny/fr_regressor_v2.jsonsidecar. - Add/extend docs (ADR, research digest, model card, rebase notes, AGENTS note, changelog entry) describing the new contract.
Reviewed changes
Copilot reviewed 10 out of 12 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
ai/scripts/train_fr_regressor_v2.py |
New trainer/exporter for fr_regressor_v2 with smoke mode and registry/sidecar writing. |
model/tiny/registry.json |
Adds a fr_regressor_v2 registry entry marked smoke: true. |
model/tiny/fr_regressor_v2.json |
New sidecar describing the model inputs, scaling, codec vocab, and training metadata. |
docs/adr/0272-fr-regressor-v2-codec-aware-scaffold.md |
New ADR documenting the scaffold decision and contract. |
docs/adr/README.md |
Adds ADR index row for ADR-0272 (but file is generated). |
docs/research/0058-fr-regressor-v2-feasibility.md |
New feasibility digest (ID/title mismatch noted). |
docs/ai/models/fr_regressor_v2.md |
New model card describing inputs/outputs, corpus expectations, and usage. |
ai/AGENTS.md |
Records load-bearing invariants for the codec block and vocab ordering. |
docs/rebase-notes.md |
Adds a rebase-notes entry for the scaffold landing. |
CHANGELOG.md |
Adds an Unreleased entry (but changelog is fragment-generated). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+11
to
+35
| - **`fr_regressor_v2` codec-aware scaffold — first downstream consumer | ||
| of the vmaf-tune Phase A JSONL corpus (ADR-0272, prereq for | ||
| Phase B).** Ships | ||
| [`ai/scripts/train_fr_regressor_v2.py`](ai/scripts/train_fr_regressor_v2.py) | ||
| — a scaffold-only trainer that consumes the JSONL corpus emitted by | ||
| `vmaf-tune corpus` (ADR-0237 Phase A) and trains the codec-aware | ||
| variant of the v1 FR regressor. Two-input ONNX (`features` shape | ||
| `(N, 6)` canonical-6 + `codec` shape `(N, 8)` block — | ||
| `[encoder_onehot(6), preset_norm, crf_norm]`); reuses the existing | ||
| `FRRegressor(num_codecs=8)` class plumbed by ADR-0235. A `--smoke` | ||
| mode synthesises 100 fake corpus rows and trains 1 epoch so the | ||
| pipeline is end-to-end exercisable in CI without hours of encode | ||
| time. Registers `fr_regressor_v2` in `model/tiny/registry.json` | ||
| with `smoke: true` until a follow-up PR runs production training on | ||
| a real Phase A corpus and clears the ADR-0235 ship gate (≥0.005 | ||
| multi-codec PLCC lift over v1's 0.95 LOSO floor). Doc surface: | ||
| [model card](docs/ai/models/fr_regressor_v2.md), | ||
| [research digest](docs/research/0058-fr-regressor-v2-feasibility.md), | ||
| [ADR-0272](docs/adr/0272-fr-regressor-v2-codec-aware-scaffold.md), | ||
| `ai/AGENTS.md` invariant note pinning the codec block layout and | ||
| encoder vocabulary. Smoke validated locally (`python | ||
| ai/scripts/train_fr_regressor_v2.py --smoke` produces a valid | ||
| opset-17 two-input ONNX, op-allowlist clean, torch-vs-ORT roundtrip | ||
| within 1e-4 atol). No upstream-mirror file touched; pure additive | ||
| fork-local PR. |
Comment on lines
+274
to
+276
| | [ADR-0259](0259-hip-third-consumer-ciede.md) | T7-10b third-consumer PR — `ciede_hip` host scaffolding via the kernel-template mirror established by [ADR-0241](0241-hip-first-consumer-psnr.md). Ships [`libvmaf/src/feature/hip/ciede_hip.{c,h}`](../../libvmaf/src/feature/hip/ciede_hip.c) — mirrors `libvmaf/src/feature/cuda/integer_ciede_cuda.c`'s init/submit/collect/close call graph verbatim, including the **intentional bypass** of `submit_pre_launch` (ciede's kernel writes one float per block, no atomic, no memset required). Same scaffold posture as ADR-0241 / ADR-0254: registration succeeds, `init()` returns `-ENOSYS` until T7-10b flips the kernel-template helper bodies to real HIP calls. New `vmaf_fex_ciede_hip` row in `feature_extractor_list` under `#if HAVE_HIP`; `VMAF_FEATURE_EXTRACTOR_HIP` flag stays cleared. Smoke test grows by one sub-test (`test_ciede_hip_extractor_registered`). Pins the kernel-template's "no-memset bypass" path so the runtime PR can flip helper bodies without inventing a new template variant for ciede. Picks `integer_ciede_cuda` (243 LOC) over `integer_motion_cuda` (503 LOC, stateful) and `float_ansnr_cuda` (298 LOC, duplicates ADR-0254's precision posture). | Accepted | gpu, hip, rocm, amd, kernel-template, fork-local | | ||
| | [ADR-0272](0272-fr-regressor-v2-codec-aware-scaffold.md) | `fr_regressor_v2` codec-aware scaffold — first downstream consumer of the vmaf-tune Phase A JSONL corpus ([ADR-0237](0237-quality-aware-encode-automation.md)). Ships [`ai/scripts/train_fr_regressor_v2.py`](../../ai/scripts/train_fr_regressor_v2.py), a smoke ONNX (`fr_regressor_v2.onnx` registered with `smoke: true`), sidecar JSON, and full doc surface ([model card](../ai/models/fr_regressor_v2.md), [research digest](../research/0058-fr-regressor-v2-feasibility.md)). Two-input ONNX: 6 canonical libvmaf features (`adm2`, `vif_scale0..3`, `motion2`, StandardScaler-normalised) + 8-D codec block (6-way encoder one-hot + preset_norm + crf_norm, both in `[0, 1]`). MLP shape `6 -> 16 -> 16 -> 1` with codec block concatenated before the first dense layer (matches the existing `FRRegressor(num_codecs=8)` plumbing landed by [ADR-0235](0235-codec-aware-fr-regressor.md)). Registry row stays `smoke: true` until a follow-up PR (T7-FR-REGRESSOR-V2-PROD) re-runs training on a real Phase A corpus and clears v1's 0.95 LOSO PLCC ship gate with the ≥0.005 multi-codec lift required by ADR-0235. | Proposed | ai, dnn, tiny-ai, fr-regressor, codec-aware, vmaf-tune, fork-local | | ||
| | [ADR-0260](0260-hip-fourth-consumer-float-moment.md) | T7-10b fourth-consumer PR (sibling to ADR-0259) — `float_moment_hip` host scaffolding via the kernel-template mirror. Ships [`libvmaf/src/feature/hip/float_moment_hip.{c,h}`](../../libvmaf/src/feature/hip/float_moment_hip.c) — mirrors `libvmaf/src/feature/cuda/integer_moment_cuda.c`'s call graph verbatim with the four-uint64 atomic-counter readback (`MOMENT_HIP_COUNTERS = 4u`). Same scaffold posture: registration succeeds, `init()` returns `-ENOSYS` until T7-10b. New `vmaf_fex_float_moment_hip` row registers four `provided_features` (`float_moment_ref{1st,2nd}`, `float_moment_dis{1st,2nd}`); `VMAF_FEATURE_EXTRACTOR_HIP` flag stays cleared. Smoke test grows by one sub-test (`test_float_moment_hip_extractor_registered`). Pins the "memset multiple uint64 counters in one helper call" path so the runtime PR can implement `vmaf_hip_kernel_submit_pre_launch` as a single `hipMemsetAsync` of `rb.bytes`, knowing both the 1-counter (psnr_hip) and 4-counter (moment_hip) consumers exercise that code path. Picks `integer_moment_cuda` (230 LOC, smallest available CUDA twin) over `integer_motion_v2_cuda` (321 LOC, stateful) and `float_ansnr_cuda` (298 LOC, duplicates ADR-0254). | Accepted | gpu, hip, rocm, amd, kernel-template, fork-local | |
| @@ -0,0 +1,158 @@ | |||
| # Research-0054: FR regressor v2 (codec-aware) feasibility | |||
| > in the Phase A schema, and (3) clearing v1's 0.95 LOSO PLCC ship | ||
| > threshold with a ≥0.005 multi-codec lift per | ||
| > [ADR-0235](../../adr/0235-codec-aware-fr-regressor.md). See | ||
| > [Research-0054](../../research/0058-fr-regressor-v2-feasibility.md). |
Comment on lines
+58
to
+59
| CRF up to 51; values above their per-encoder max are clipped at | ||
| read time. |
Comment on lines
+87
to
+99
| # Closed encoder vocabulary. Order is load-bearing — index baked into | ||
| # the trained ONNX. Append-only; bump SCHEMA_VERSION to retrain. | ||
| ENCODER_VOCAB: tuple[str, ...] = ( | ||
| "libx264", | ||
| "libx265", | ||
| "libsvtav1", | ||
| "libvvenc", | ||
| "libvpx-vp9", | ||
| "unknown", | ||
| ) | ||
| ENCODER_VOCAB_VERSION = 1 | ||
| N_ENCODERS = len(ENCODER_VOCAB) | ||
| UNKNOWN_ENCODER_INDEX = ENCODER_VOCAB.index("unknown") |
Comment on lines
+197
to
+210
| pf = row.get("per_frame_features") or {} | ||
| canon = np.zeros(6, dtype=np.float32) | ||
| have_pf = False | ||
| for i, name in enumerate(CANONICAL6): | ||
| if name in pf: | ||
| canon[i] = float(pf[name]) | ||
| have_pf = True | ||
| if not have_pf and warn_missing: | ||
| # Phase A's current schema does not emit per-frame features — | ||
| # the corpus stores aggregate vmaf_score only. The smoke path | ||
| # uses synthetic features; real corpora will need a Phase A | ||
| # follow-up to attach per-frame features (tracked in ADR-0272). | ||
| pass | ||
|
|
Comment on lines
+211
to
+221
| enc_idx = _encoder_index(row.get("encoder")) | ||
| preset_norm = _preset_ordinal(str(row.get("encoder", "unknown")), row.get("preset", "medium")) | ||
| crf = row.get("crf", 23) | ||
| crf_norm = float(crf) / CRF_MAX | ||
|
|
||
| codec_block = np.concatenate( | ||
| [ | ||
| _encoder_onehot(enc_idx), | ||
| np.asarray([preset_norm, crf_norm], dtype=np.float32), | ||
| ] | ||
| ) |
Comment on lines
+599
to
+604
| print(f"[fr-v2] materialising {len(rows)} rows -> 9-D feature space", flush=True) | ||
| x_canon, x_codec, y = _materialise(rows) | ||
| print( | ||
| f"[fr-v2] shapes: canon={x_canon.shape} codec={x_codec.shape} y={y.shape} " | ||
| f"(canonical6={x_canon.shape[1]}, codec_block={x_codec.shape[1]})", | ||
| flush=True, |
Comment on lines
+526
to
+574
| def main() -> int: | ||
| ap = argparse.ArgumentParser(prog="train_fr_regressor_v2.py") | ||
| ap.add_argument( | ||
| "--corpus", | ||
| type=Path, | ||
| default=None, | ||
| help="Path to a vmaf-tune Phase A JSONL corpus. Mutually exclusive with --smoke.", | ||
| ) | ||
| ap.add_argument( | ||
| "--smoke", | ||
| action="store_true", | ||
| help="Synthesise 100 fake corpus rows and train 1 epoch. Pipeline validation only.", | ||
| ) | ||
| ap.add_argument("--epochs", type=int, default=30) | ||
| ap.add_argument("--batch-size", type=int, default=64) | ||
| ap.add_argument("--lr", type=float, default=1e-3) | ||
| ap.add_argument("--weight-decay", type=float, default=1e-5) | ||
| ap.add_argument( | ||
| "--hidden", | ||
| type=int, | ||
| default=16, | ||
| help="MLP hidden width. v2 default 16 (matches the user's 6->16->8->1 spec).", | ||
| ) | ||
| ap.add_argument("--depth", type=int, default=2) | ||
| ap.add_argument("--seed", type=int, default=0) | ||
| ap.add_argument( | ||
| "--out-onnx", | ||
| type=Path, | ||
| default=REPO_ROOT / "model" / "tiny" / "fr_regressor_v2.onnx", | ||
| ) | ||
| ap.add_argument( | ||
| "--out-sidecar", | ||
| type=Path, | ||
| default=REPO_ROOT / "model" / "tiny" / "fr_regressor_v2.json", | ||
| ) | ||
| ap.add_argument( | ||
| "--registry", | ||
| type=Path, | ||
| default=REPO_ROOT / "model" / "tiny" / "registry.json", | ||
| ) | ||
| ap.add_argument( | ||
| "--metrics-out", | ||
| type=Path, | ||
| default=REPO_ROOT / "runs" / "fr_regressor_v2_metrics.json", | ||
| ) | ||
| ap.add_argument( | ||
| "--no-export", action="store_true", help="Skip ONNX export + registry update (dev mode)." | ||
| ) | ||
| args = ap.parse_args() |
Co-Authored-By: Claude <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 4, 2026
…old) Phase A.5 of `tools/vmaf-tune/` (ADR-0276 Proposed, Research-0060). Adds an opt-in `vmaf-tune fast` subcommand that combines three acceleration levers — VMAF proxy via `fr_regressor_v2` (ADR-0272), Bayesian search via Optuna's TPE sampler, GPU-accelerated VMAF verify (ADR-0157, ADR-0186) — to collapse the recommendation use case from the Phase A grid's hours-long wall-time to seconds-to-minutes (~20-50× without NVENC, ~100-500× with NVENC follow-up). Slow Phase A grid stays canonical as the ground-truth corpus generator (ADR-0237 contract); fast-path is opt-in via `pip install vmaf-tune[fast]`. This PR ships the scaffold only — Optuna search loop, smoke-mode synthetic predictor, CLI subcommand, production-shape entry point. Real encode + ONNX inference + GPU verify wiring is a follow-up PR gated on Phase A corpus existence and `fr_regressor_v2` weights training (PR #347). Smoke test: `vmaf-tune fast --smoke --target-vmaf 92` — runs Optuna over a synthetic x264-shaped CRF→VMAF curve without ffmpeg, ONNX Runtime, or a GPU. 5 new tests in `tests/test_fast.py`; full `tools/vmaf-tune/tests/` suite is 18/18 green. ADR-0108 deliverables: - (1) Research digest: `docs/research/0060-vmaf-tune-fast-path.md` - (2) Decision matrix: ADR-0276 §Alternatives considered - (3) AGENTS.md invariants: `tools/vmaf-tune/AGENTS.md` (fast-path is opt-in; Optuna stays lazy-imported) - (4) Reproducer: `vmaf-tune fast --smoke --target-vmaf 92` (in PR body) - (5) CHANGELOG fragment: `changelog.d/added/vmaf-tune-fast-path-scaffold.md` - (6) Rebase-notes entry: 0229 (no upstream impact; entirely fork-local) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 4, 2026
…old) Phase A.5 of `tools/vmaf-tune/` (ADR-0276 Proposed, Research-0060). Adds an opt-in `vmaf-tune fast` subcommand that combines three acceleration levers — VMAF proxy via `fr_regressor_v2` (ADR-0272), Bayesian search via Optuna's TPE sampler, GPU-accelerated VMAF verify (ADR-0157, ADR-0186) — to collapse the recommendation use case from the Phase A grid's hours-long wall-time to seconds-to-minutes (~20-50× without NVENC, ~100-500× with NVENC follow-up). Slow Phase A grid stays canonical as the ground-truth corpus generator (ADR-0237 contract); fast-path is opt-in via `pip install vmaf-tune[fast]`. This PR ships the scaffold only — Optuna search loop, smoke-mode synthetic predictor, CLI subcommand, production-shape entry point. Real encode + ONNX inference + GPU verify wiring is a follow-up PR gated on Phase A corpus existence and `fr_regressor_v2` weights training (PR #347). Smoke test: `vmaf-tune fast --smoke --target-vmaf 92` — runs Optuna over a synthetic x264-shaped CRF→VMAF curve without ffmpeg, ONNX Runtime, or a GPU. 5 new tests in `tests/test_fast.py`; full `tools/vmaf-tune/tests/` suite is 18/18 green. ADR-0108 deliverables: - (1) Research digest: `docs/research/0060-vmaf-tune-fast-path.md` - (2) Decision matrix: ADR-0276 §Alternatives considered - (3) AGENTS.md invariants: `tools/vmaf-tune/AGENTS.md` (fast-path is opt-in; Optuna stays lazy-imported) - (4) Reproducer: `vmaf-tune fast --smoke --target-vmaf 92` (in PR body) - (5) CHANGELOG fragment: `changelog.d/added/vmaf-tune-fast-path-scaffold.md` - (6) Rebase-notes entry: 0229 (no upstream impact; entirely fork-local) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
May 4, 2026
…old) (#355) * feat(tools): vmaf-tune fast — proxy-based recommend (research + scaffold) Phase A.5 of `tools/vmaf-tune/` (ADR-0276 Proposed, Research-0060). Adds an opt-in `vmaf-tune fast` subcommand that combines three acceleration levers — VMAF proxy via `fr_regressor_v2` (ADR-0272), Bayesian search via Optuna's TPE sampler, GPU-accelerated VMAF verify (ADR-0157, ADR-0186) — to collapse the recommendation use case from the Phase A grid's hours-long wall-time to seconds-to-minutes (~20-50× without NVENC, ~100-500× with NVENC follow-up). Slow Phase A grid stays canonical as the ground-truth corpus generator (ADR-0237 contract); fast-path is opt-in via `pip install vmaf-tune[fast]`. This PR ships the scaffold only — Optuna search loop, smoke-mode synthetic predictor, CLI subcommand, production-shape entry point. Real encode + ONNX inference + GPU verify wiring is a follow-up PR gated on Phase A corpus existence and `fr_regressor_v2` weights training (PR #347). Smoke test: `vmaf-tune fast --smoke --target-vmaf 92` — runs Optuna over a synthetic x264-shaped CRF→VMAF curve without ffmpeg, ONNX Runtime, or a GPU. 5 new tests in `tests/test_fast.py`; full `tools/vmaf-tune/tests/` suite is 18/18 green. ADR-0108 deliverables: - (1) Research digest: `docs/research/0060-vmaf-tune-fast-path.md` - (2) Decision matrix: ADR-0276 §Alternatives considered - (3) AGENTS.md invariants: `tools/vmaf-tune/AGENTS.md` (fast-path is opt-in; Optuna stays lazy-imported) - (4) Reproducer: `vmaf-tune fast --smoke --target-vmaf 92` (in PR body) - (5) CHANGELOG fragment: `changelog.d/added/vmaf-tune-fast-path-scaffold.md` - (6) Rebase-notes entry: 0229 (no upstream impact; entirely fork-local) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(ci): trigger workflow re-run Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 5, 2026
Closes the orchestration layer for Bucket #1 of Research-0061's `vmaf-tune` capability audit (the Netflix-style table-stakes per-shot encoding feature). Ships `tools/vmaf-tune/src/vmaftune/per_shot.py` plus the `vmaf-tune tune-per-shot` CLI subcommand: * `detect_shots()` wraps the C-side `vmaf-perShot` binary (ADR-0222 / TransNet V2 ADR-0223) with a single-shot fallback when the binary is unavailable or fails. * `tune_per_shot()` exposes a pluggable predicate seam Phase B's bisect (PR #347) drops into. Default predicate returns the codec adapter's default CRF so the scaffold round-trips before Phase B lands as code. * `merge_shots()` emits one `ffmpeg` argv per shot (`-ss` + `-frames:v`) plus a final concat-demuxer command. Scaffold-only — does not run encodes, does not yet emit native per-codec mechanisms (`--qpfile` for x264, `--zones` for x265, SVT-AV1 segment tables); per-segment + concat is the portable fallback. Per-codec native emission lands per-codec alongside each new adapter. 16 new tests pass with mocked `vmaf-perShot` + mocked encoder; total `vmaf-tune` suite is 29 tests, zero binaries required. First per-phase split off ADR-0237. Updates ADR index, CHANGELOG, docs/usage/vmaf-tune.md (new "Phase D" section + flag table + plan JSON schema), tools/vmaf-tune/AGENTS.md (per-shot rebase invariants), and docs/rebase-notes.md (entry 0228). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
May 5, 2026
) * feat(tools): vmaf-tune Phase D scaffold — per-shot CRF tuning (ADR-0276) Closes the orchestration layer for Bucket #1 of Research-0061's `vmaf-tune` capability audit (the Netflix-style table-stakes per-shot encoding feature). Ships `tools/vmaf-tune/src/vmaftune/per_shot.py` plus the `vmaf-tune tune-per-shot` CLI subcommand: * `detect_shots()` wraps the C-side `vmaf-perShot` binary (ADR-0222 / TransNet V2 ADR-0223) with a single-shot fallback when the binary is unavailable or fails. * `tune_per_shot()` exposes a pluggable predicate seam Phase B's bisect (PR #347) drops into. Default predicate returns the codec adapter's default CRF so the scaffold round-trips before Phase B lands as code. * `merge_shots()` emits one `ffmpeg` argv per shot (`-ss` + `-frames:v`) plus a final concat-demuxer command. Scaffold-only — does not run encodes, does not yet emit native per-codec mechanisms (`--qpfile` for x264, `--zones` for x265, SVT-AV1 segment tables); per-segment + concat is the portable fallback. Per-codec native emission lands per-codec alongside each new adapter. 16 new tests pass with mocked `vmaf-perShot` + mocked encoder; total `vmaf-tune` suite is 29 tests, zero binaries required. First per-phase split off ADR-0237. Updates ADR index, CHANGELOG, docs/usage/vmaf-tune.md (new "Phase D" section + flag table + plan JSON schema), tools/vmaf-tune/AGENTS.md (per-shot rebase invariants), and docs/rebase-notes.md (entry 0228). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: re-trigger CI after research-digest opt-out * fix(tools): close rec.add_argument paren before per_shot subparser block --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 5, 2026
Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 5, 2026
Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
May 5, 2026
…er) (#371) * feat(tools): vmaf-tune Phase E — per-title bitrate ladder (game-changer) Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(docs): renumber phase-e ADR 0277→0295 + research 0066→0068 (collisions) --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 5, 2026
…al scaffold) Adds a probabilistic head on top of the codec-aware fr_regressor_v2 (parent: ADR-0272 / PR #347 in flight) so producers can drive the in-flight `vmaf-tune --quality-confidence 0.95` flag (ADR-0237) off a calibrated prediction interval instead of v2's bare MOS scalar. PR #354 audit Bucket #18 (top-3 ranked). Trainer (`ai/scripts/train_fr_regressor_v2_ensemble.py`) trains N=5 copies of the v2 architecture (`FRRegressor(num_codecs=NUM_CODECS)`) under distinct seeds, exports each as a separate two-input ONNX (`features [N, 6]` + `codec_onehot [N, NUM_CODECS]`), and writes an ensemble manifest sidecar that pins per-member sha256s, feature standardisation, codec vocab, nominal coverage, and an optional split-conformal residual quantile from a held-out calibration split. Inference rule is `mu ± q · σ` with `q = 1.96` (Gaussian) or the empirical conformal quantile (Vovk 2005, Romano 2019 — distribution-free marginal coverage on exchangeable data). Evaluator (`ai/scripts/eval_probabilistic_proxy.py`) reports empirical coverage at 50/80/95 % nominal levels, mean interval width, and the mean-prediction PLCC; reports the conformal-interval row when the manifest carries a conformal scalar. Smoke-only ship: synthetic 100-row corpus, 1 epoch / member. Production training is gated on the multi-codec Phase A corpus (T7-FR-REGRESSOR-V2-PROBABILISTIC). Six ADR-0108 deliverables: 1. Research digest: docs/research/0054-fr-regressor-v2-probabilistic.md. 2. Decision matrix: ADR-0279 § Alternatives considered. 3. AGENTS.md invariant note: appended to ai/AGENTS.md. 4. Reproducer: `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` followed by `python ai/scripts/eval_probabilistic_proxy.py --smoke`. 5. CHANGELOG ### Added entry under Unreleased — lusoris fork. 6. Rebase-notes entry: ### 0229 in docs/rebase-notes.md. Test plan: - `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` produces 5 valid two-input ONNX members + manifest sidecar (ran locally). - `python ai/scripts/eval_probabilistic_proxy.py --smoke` aggregates the 5 ONNX outputs into (mu, sigma) and reports coverage at 50/80/95 %. - `python ai/scripts/validate_model_registry.py` → 15 entries valid. - `pre-commit run --files <changed>` → Passed (black / isort / ruff / json-check / secrets / semgrep). - `markdownlint-cli2` on all new docs → 0 errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
May 5, 2026
…al scaffold) (#372) * feat(ai): fr_regressor_v2 probabilistic head (deep-ensemble + conformal scaffold) Adds a probabilistic head on top of the codec-aware fr_regressor_v2 (parent: ADR-0272 / PR #347 in flight) so producers can drive the in-flight `vmaf-tune --quality-confidence 0.95` flag (ADR-0237) off a calibrated prediction interval instead of v2's bare MOS scalar. PR #354 audit Bucket #18 (top-3 ranked). Trainer (`ai/scripts/train_fr_regressor_v2_ensemble.py`) trains N=5 copies of the v2 architecture (`FRRegressor(num_codecs=NUM_CODECS)`) under distinct seeds, exports each as a separate two-input ONNX (`features [N, 6]` + `codec_onehot [N, NUM_CODECS]`), and writes an ensemble manifest sidecar that pins per-member sha256s, feature standardisation, codec vocab, nominal coverage, and an optional split-conformal residual quantile from a held-out calibration split. Inference rule is `mu ± q · σ` with `q = 1.96` (Gaussian) or the empirical conformal quantile (Vovk 2005, Romano 2019 — distribution-free marginal coverage on exchangeable data). Evaluator (`ai/scripts/eval_probabilistic_proxy.py`) reports empirical coverage at 50/80/95 % nominal levels, mean interval width, and the mean-prediction PLCC; reports the conformal-interval row when the manifest carries a conformal scalar. Smoke-only ship: synthetic 100-row corpus, 1 epoch / member. Production training is gated on the multi-codec Phase A corpus (T7-FR-REGRESSOR-V2-PROBABILISTIC). Six ADR-0108 deliverables: 1. Research digest: docs/research/0054-fr-regressor-v2-probabilistic.md. 2. Decision matrix: ADR-0279 § Alternatives considered. 3. AGENTS.md invariant note: appended to ai/AGENTS.md. 4. Reproducer: `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` followed by `python ai/scripts/eval_probabilistic_proxy.py --smoke`. 5. CHANGELOG ### Added entry under Unreleased — lusoris fork. 6. Rebase-notes entry: ### 0229 in docs/rebase-notes.md. Test plan: - `python ai/scripts/train_fr_regressor_v2_ensemble.py --smoke` produces 5 valid two-input ONNX members + manifest sidecar (ran locally). - `python ai/scripts/eval_probabilistic_proxy.py --smoke` aggregates the 5 ONNX outputs into (mu, sigma) and reports coverage at 50/80/95 %. - `python ai/scripts/validate_model_registry.py` → 15 entries valid. - `pre-commit run --files <changed>` → Passed (black / isort / ruff / json-check / secrets / semgrep). - `markdownlint-cli2` on all new docs → 0 errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(registry): split fr_regressor_v2 + ensemble_seed0 into distinct entries --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
10 tasks
lusoris
pushed a commit
that referenced
this pull request
May 6, 2026
… gap) `tools/vmaf-tune/src/vmaftune/ladder.py::_default_sampler` no longer raises `NotImplementedError`. It composes Phase A's `corpus.iter_rows` (encode + score) with the Phase B-equivalent `recommend.pick_target_vmaf` predicate (smallest CRF whose VMAF clears the target) over the canonical 5-point CRF sweep `DEFAULT_SAMPLER_CRF_SWEEP = (18, 23, 28, 33, 38)` at the codec adapter's mid-range preset (`"medium"` for libx264 / libx265 / libsvtav1). The `SamplerFn` seam stays open. Callers needing a finer grid, a Bayesian bisect, or a precomputed corpus stream pass an explicit `sampler=`. Tests stub `iter_rows` via `monkeypatch.setattr`; no live ffmpeg / vmaf binaries are required. Closes the Phase B/E gap left by ADR-0295. The original raise docstring claimed PR #347 was Phase B's bisect — it was not (PR #347 shipped the `fr_regressor_v2` codec-aware scaffold). The actual Phase B-equivalent (`recommend.pick_target_vmaf` + `corpus.iter_rows`) shipped via ADR-0306, so the missing piece is a small composition. ADR-0307 + Research-0079 + AGENTS.md invariant + rebase-notes §0307 + changelog fragment land in the same PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
May 6, 2026
… gap) (#404) `tools/vmaf-tune/src/vmaftune/ladder.py::_default_sampler` no longer raises `NotImplementedError`. It composes Phase A's `corpus.iter_rows` (encode + score) with the Phase B-equivalent `recommend.pick_target_vmaf` predicate (smallest CRF whose VMAF clears the target) over the canonical 5-point CRF sweep `DEFAULT_SAMPLER_CRF_SWEEP = (18, 23, 28, 33, 38)` at the codec adapter's mid-range preset (`"medium"` for libx264 / libx265 / libsvtav1). The `SamplerFn` seam stays open. Callers needing a finer grid, a Bayesian bisect, or a precomputed corpus stream pass an explicit `sampler=`. Tests stub `iter_rows` via `monkeypatch.setattr`; no live ffmpeg / vmaf binaries are required. Closes the Phase B/E gap left by ADR-0295. The original raise docstring claimed PR #347 was Phase B's bisect — it was not (PR #347 shipped the `fr_regressor_v2` codec-aware scaffold). The actual Phase B-equivalent (`recommend.pick_target_vmaf` + `corpus.iter_rows`) shipped via ADR-0306, so the missing piece is a small composition. ADR-0307 + Research-0079 + AGENTS.md invariant + rebase-notes §0307 + changelog fragment land in the same PR. Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 6, 2026
lusoris
pushed a commit
that referenced
this pull request
May 6, 2026
Closes the orchestration layer for Bucket #1 of Research-0061's `vmaf-tune` capability audit (the Netflix-style table-stakes per-shot encoding feature). Ships `tools/vmaf-tune/src/vmaftune/per_shot.py` plus the `vmaf-tune tune-per-shot` CLI subcommand: * `detect_shots()` wraps the C-side `vmaf-perShot` binary (ADR-0222 / TransNet V2 ADR-0223) with a single-shot fallback when the binary is unavailable or fails. * `tune_per_shot()` exposes a pluggable predicate seam Phase B's bisect (PR #347) drops into. Default predicate returns the codec adapter's default CRF so the scaffold round-trips before Phase B lands as code. * `merge_shots()` emits one `ffmpeg` argv per shot (`-ss` + `-frames:v`) plus a final concat-demuxer command. Scaffold-only — does not run encodes, does not yet emit native per-codec mechanisms (`--qpfile` for x264, `--zones` for x265, SVT-AV1 segment tables); per-segment + concat is the portable fallback. Per-codec native emission lands per-codec alongside each new adapter. 16 new tests pass with mocked `vmaf-perShot` + mocked encoder; total `vmaf-tune` suite is 29 tests, zero binaries required. First per-phase split off ADR-0237. Updates ADR index, CHANGELOG, docs/usage/vmaf-tune.md (new "Phase D" section + flag table + plan JSON schema), tools/vmaf-tune/AGENTS.md (per-shot rebase invariants), and docs/rebase-notes.md (entry 0228). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
May 6, 2026
* feat(tools): vmaf-tune Phase D scaffold — per-shot CRF tuning (ADR-0276) Closes the orchestration layer for Bucket #1 of Research-0061's `vmaf-tune` capability audit (the Netflix-style table-stakes per-shot encoding feature). Ships `tools/vmaf-tune/src/vmaftune/per_shot.py` plus the `vmaf-tune tune-per-shot` CLI subcommand: * `detect_shots()` wraps the C-side `vmaf-perShot` binary (ADR-0222 / TransNet V2 ADR-0223) with a single-shot fallback when the binary is unavailable or fails. * `tune_per_shot()` exposes a pluggable predicate seam Phase B's bisect (PR #347) drops into. Default predicate returns the codec adapter's default CRF so the scaffold round-trips before Phase B lands as code. * `merge_shots()` emits one `ffmpeg` argv per shot (`-ss` + `-frames:v`) plus a final concat-demuxer command. Scaffold-only — does not run encodes, does not yet emit native per-codec mechanisms (`--qpfile` for x264, `--zones` for x265, SVT-AV1 segment tables); per-segment + concat is the portable fallback. Per-codec native emission lands per-codec alongside each new adapter. 16 new tests pass with mocked `vmaf-perShot` + mocked encoder; total `vmaf-tune` suite is 29 tests, zero binaries required. First per-phase split off ADR-0237. Updates ADR index, CHANGELOG, docs/usage/vmaf-tune.md (new "Phase D" section + flag table + plan JSON schema), tools/vmaf-tune/AGENTS.md (per-shot rebase invariants), and docs/rebase-notes.md (entry 0228). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: re-trigger CI after research-digest opt-out * fix(vmaf-tune): import json + adapter-aware quality_range test (post-rebase) Two fixes the post-rebase Phase-D branch needs to pass against the new master: * Master's cli.py uses ``json.dumps`` in ``_run_predict`` but the module was missing ``import json``. The pre-rebase Phase D branch's ``_run_tune_per_shot`` also uses it. Add the top-level ``import json`` so both functions work and master's pre-existing bug clears at the same time. * ``test_tune_per_shot_clamps_to_codec_quality_range`` hard-coded the clamp window ``[15, 40]`` from the pre-rebase x264 adapter. Master changed libx264's ``quality_range`` to ``(0, 51)`` (the full encoder range, ADR-0306 coarse-to-fine domain). Read the range from ``get_adapter("libx264").quality_range`` so the test tracks the adapter's source of truth instead of duplicating the literal — also future-proofs against further range tweaks. All 16 per-shot tests now pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(vmaf-tune): CHANGELOG fragment for tune-per-shot CLI ADR-0108 deliverables-checklist gate on PR #431 flagged the missing changelog fragment. Add it under `changelog.d/added/` per the ADR-0221 fragment pattern so the next release-please rendering picks up the entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 7, 2026
Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
May 7, 2026
…er) (#433) Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
fr_regressor_v2— codec-aware successor to v1(ADR-0249) and the first downstream consumer of the
vmaf-tune corpusJSONL emitted by Phase A (ADR-0237). Adds
ai/scripts/train_fr_regressor_v2.pywith
--corpus PATHand--smokemodes, plus the two-input ONNXexport plumbing that mirrors the LPIPS-Sq precedent (ADR-0040 /
ADR-0041).
features(N, 6) canonical-6 libvmaf features(
adm2,vif_scale0..3,motion2, StandardScaler-normalised) +codec(N, 8) block —[encoder_onehot(6), preset_norm, crf_norm],both normalised into
[0, 1]. Re-usesFRRegressor(num_codecs=8)plumbed by ADR-0235 rather than mintinga new class.
fr_regressor_v2withsmoke: true. Theshipped ONNX is the
--smokeoutput (synthetic 100-row corpus, 1epoch); it is a load-path probe, not a quality model. Production
training run is a follow-up PR gated on a multi-codec Phase A
corpus + per-frame feature emission + clearing v1's 0.95 LOSO PLCC
ship floor with the ≥0.005 multi-codec lift required by ADR-0235.
Six deep-dive deliverables (ADR-0108)
docs/research/0054-fr-regressor-v2-feasibility.md.## Alternatives consideredindocs/adr/0261-fr-regressor-v2-codec-aware-scaffold.md.ai/AGENTS.md— pins theENCODER_VOCABorder,8-D codec block layout, CRF/preset normalisers, and two-input ONNX
contract.
python ai/scripts/train_fr_regressor_v2.py --smoke(also documented in the model card and the research digest).
### Addedentry under "Unreleased — lusoris fork".### 0229indocs/rebase-notes.md.Test plan
python ai/scripts/train_fr_regressor_v2.py --smokeproduces avalid opset-17 two-input ONNX, op-allowlist clean, torch-vs-ORT
roundtrip within 1e-4 atol (ran locally).
python ai/scripts/validate_model_registry.py— 11 entriesvalid against
registry.schema.json.pre-commit run --files <changed>— Passed (black / isort /ruff / json-check / secrets / semgrep).
markdownlint-cli2on all new docs — 0 errors.tracked as backlog item T7-FR-REGRESSOR-V2-PROD (per ADR-0261
## Consequences).🤖 Generated with Claude Code