feat(tools+ai): VideoToolbox adapters + 16-slot codec schema expansion#373
feat(tools+ai): VideoToolbox adapters + 16-slot codec schema expansion#373
Conversation
cc00fa5 to
e393bfe
Compare
…s 17 adapters) Refactors `tools/vmaf-tune/src/vmaftune/encode.py` away from the Phase A hard-coded `libx264` `-c:v / -preset / -crf` argv. `run_encode` now looks up the codec adapter via `codec_adapters.get_adapter(req.encoder)` and asks it for the FFmpeg argv slice via `adapter.ffmpeg_codec_args(preset, quality)` plus an optional `adapter.extra_params()`. Adapters that don't yet expose `ffmpeg_codec_args` fall back silently to the legacy x264-CRF shape so partial in-flight adapter PRs stay drivable end-to-end. `parse_versions(stderr, encoder=...)` selects a per-codec version probe (libx264, libx265, libsvtav1, libvpx-vp9, libaom-av1, libvvenc, NVENC, QSV, AMF, VideoToolbox); unknown encoders return "unknown" rather than raising. The `EncodeRequest.crf` field is preserved unchanged for the SCHEMA_VERSION=1 row contract; a `quality` property mirrors it for adapter-side codec-agnostic vocabulary. Existing 13-test x264 suite still green; new 19-test multi-codec suite covers 9 representative codec shapes plus the unknown-codec / missing-method fallback paths. Unblocks 17 in-flight codec adapter PRs (#360 libaom, #362 libx265, #364 NVENC, #366 AMF, #367 QSV, #368 libvvenc, #370 libsvtav1, #373 VideoToolbox, plus follow-on waves) which can now drive end-to-end encodes without copying or mutating the harness. Ships ADR-0294 + research digest 0054, vmaf-tune.md "Codec adapter contract" section, rebase-notes #228 invariant, CHANGELOG entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Skipping in merge train — needs rework against shipped v2. Master now has Specific conflicts on rebase:
Recommended path forward: split into two PRs:
|
There was a problem hiding this comment.
Pull request overview
This PR adds Apple VideoToolbox codec adapters to tools/vmaf-tune and expands the codec-conditioning vocabulary for the codec-aware FR regressor from 6 to 16 one-hot slots (including new hardware encoder buckets), plus a smoke-trained fr_regressor_v2_hw ONNX and associated documentation/ADRs.
Changes:
- Add
H264VideoToolboxAdapter/HEVCVideoToolboxAdapterplus shared VideoToolbox preset/quality helpers, and extend the codec adapter registry + tests. - Expand
ai/src/vmaf_train/codec.pyto v2CODEC_VOCAB(16 slots) withCODEC_VOCAB_VERSION = 2, update AI tests accordingly, and add a smoke training/export script. - Register the new smoke ONNX in
model/tiny/registry.jsonand add user-facing/docs updates (usage doc, research digest, ADRs, rebase notes, changelog entry).
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/vmaf-tune/tests/test_corpus.py | Updates codec registry expectations to include VideoToolbox adapters. |
| tools/vmaf-tune/tests/test_codec_one_hot.py | Adds regression gate pinning v2 codec vocab ordering and adapter→slot mapping. |
| tools/vmaf-tune/tests/test_codec_adapter_videotoolbox.py | Adds mocked smoke tests for VideoToolbox adapter contract + encode argv shape. |
| tools/vmaf-tune/src/vmaftune/codec_adapters/h264_videotoolbox.py | Introduces H.264 VideoToolbox adapter. |
| tools/vmaf-tune/src/vmaftune/codec_adapters/hevc_videotoolbox.py | Introduces HEVC VideoToolbox adapter. |
| tools/vmaf-tune/src/vmaftune/codec_adapters/_videotoolbox_common.py | Shared preset→-realtime mapping and -q:v validation/constants. |
| tools/vmaf-tune/src/vmaftune/codec_adapters/init.py | Registers VideoToolbox adapters and exports them via __all__. |
| tools/vmaf-tune/AGENTS.md | Documents new invariants (codec one-hot ordering; VT quality scale). |
| model/tiny/registry.json | Adds fr_regressor_v2_hw model registry entry. |
| model/tiny/fr_regressor_v2_hw.json | Adds sidecar describing vocab + feature layout for the 24-D wide input. |
| docs/usage/vmaf-tune.md | Documents VideoToolbox usage and the 16-slot codec vocabulary layout. |
| docs/research/0068-videotoolbox-and-codec-schema-v2.md | Adds supporting research digest for VT adapters + vocab sizing. |
| docs/rebase-notes.md | Adds fork-local rebase note entry for this workstream. |
| docs/adr/README.md | Appends ADR-0283/0284 rows to the ADR index table. |
| docs/adr/0283-vmaf-tune-videotoolbox-adapters.md | New ADR for VideoToolbox adapters. |
| docs/adr/0284-fr-regressor-v2-codec-schema-expansion.md | New ADR for 6→16 codec vocab expansion. |
| CHANGELOG.md | Adds an Unreleased entry describing the new adapters + vocab expansion + smoke model. |
| ai/tests/test_codec_aware_fr.py | Updates AI tests to the v2 vocab contract and reserved-bucket semantics. |
| ai/src/vmaf_train/codec.py | Implements v2 codec vocab + version bump; keeps v1 vocab as CODEC_VOCAB_V1. |
| ai/scripts/train_fr_regressor_v2.py | Adds smoke-only training + ONNX export + registry/sidecar writer for fr_regressor_v2_hw. |
| ai/AGENTS.md | Documents the v2 vocab invariant and the new regression gate test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| req = EncodeRequest( | ||
| source=src, | ||
| width=1920, | ||
| height=1080, | ||
| pix_fmt="yuv420p", | ||
| framerate=24.0, | ||
| encoder="h264_videotoolbox", | ||
| preset="medium", | ||
| crf=50, | ||
| output=out, | ||
| ) | ||
| # Direct command builder check. | ||
| cmd = build_ffmpeg_command(req) | ||
| assert "h264_videotoolbox" in cmd | ||
| assert "-c:v" in cmd | ||
| # Driver path with the mock. | ||
| res = run_encode(req, runner=fake_run) | ||
| assert res.exit_status == 0 |
| "kind": "fr", | ||
| "notes": "Codec-aware FR regressor (v2_hw, ADR-0284) \u2014 wide-input 24-D vector (canonical-6 features + 16-slot codec one-hot + preset_norm + crf_norm) \u2192 MOS scalar. SMOKE export \u2014 trained on synthetic deterministic data; real multi-codec corpus training tracked under T7-CODEC-AWARE-V2. Vocabulary version 2; v1 single-input fr_regressor_v1.onnx remains shipped and unaffected.", | ||
| "onnx": "fr_regressor_v2_hw.onnx", | ||
| "opset": 17, | ||
| "sha256": "c3ec697ae42b596354167edec041ee4d43b920ce82e5757e244be3223cbca57e", |
| - **`vmaf-tune` Apple VideoToolbox adapters + 16-slot codec one-hot | ||
| schema expansion (ADR-0283 + ADR-0284).** Adds | ||
| `H264VideoToolboxAdapter` and `HEVCVideoToolboxAdapter` under | ||
| [`tools/vmaf-tune/src/vmaftune/codec_adapters/`](tools/vmaf-tune/src/vmaftune/codec_adapters/), | ||
| sharing `_videotoolbox_common.py` for the `-q:v` (0..100, higher = | ||
| better) quality knob and the nine-name preset → `-realtime` boolean | ||
| mapping. AV1 hardware encoding intentionally omitted (unsupported | ||
| on Apple Silicon as of 2026). The codec-aware FR regressor | ||
| vocabulary expands from 6 → 16 slots in | ||
| [`ai/src/vmaf_train/codec.py`](ai/src/vmaf_train/codec.py) | ||
| (`CODEC_VOCAB_VERSION` 1 → 2): software (x264, x265, libsvtav1, | ||
| libaom) + NVENC ×3 + QSV ×3 + AMF ×3 + VideoToolbox ×2 + reserved. | ||
| Ships a SMOKE `fr_regressor_v2_hw.onnx` (24-D wide-input vector = | ||
| 6 features + 16 codec one-hot + 1 preset_norm + 1 crf_norm) trained | ||
| on synthetic deterministic data; T7-CODEC-AWARE-V2 follow-up | ||
| retrains against a real multi-codec corpus. v1 ONNX | ||
| (`fr_regressor_v1.onnx`) and the `CODEC_VOCAB_V1` tuple stay shipped | ||
| and unaffected. New tests under | ||
| [`tools/vmaf-tune/tests/`](tools/vmaf-tune/tests/): | ||
| `test_codec_adapter_videotoolbox.py` and `test_codec_one_hot.py`. |
| | [ADR-0283](0283-vmaf-tune-videotoolbox-adapters.md) | `vmaf-tune` Apple VideoToolbox codec adapters. Adds `H264VideoToolboxAdapter` + `HEVCVideoToolboxAdapter` under `tools/vmaf-tune/src/vmaftune/codec_adapters/`, sharing `_videotoolbox_common.py` for the `-q:v` (0..100, higher = better) quality knob and the nine-name preset → `-realtime` boolean mapping. AV1 hardware encoding intentionally omitted (not available on Apple Silicon as of 2026). Tests mock `subprocess.run` so the suite stays Linux-CI-runnable; the adapters exercise the codec-adapter contract from [ADR-0237](0237-quality-aware-encode-automation.md). Companion: [ADR-0284](0284-fr-regressor-v2-codec-schema-expansion.md). | Accepted | tooling, ai, ffmpeg, codec, hardware-encoder, apple, fork-local | | ||
| | [ADR-0284](0284-fr-regressor-v2-codec-schema-expansion.md) | `fr_regressor_v2` codec one-hot expansion from 6 → 16 slots. Bumps `CODEC_VOCAB_VERSION` from 1 to 2; vocabulary becomes `(x264, x265, libsvtav1, libaom, h264_nvenc, hevc_nvenc, av1_nvenc, h264_qsv, hevc_qsv, av1_qsv, h264_amf, hevc_amf, av1_amf, h264_videotoolbox, hevc_videotoolbox, reserved)`. The shipped `fr_regressor_v2_hw` model concatenates a 24-D wide-input vector (`6 features + 16 codec one-hot + 1 preset_norm + 1 crf_norm`); SMOKE-trained on synthetic data, T7-CODEC-AWARE-V2 follow-up retrains against a real multi-codec corpus. v1 `CODEC_VOCAB_V1` tuple + `fr_regressor_v1.onnx` stay shipped and unaffected. | Accepted | ai, fr-regressor, codec, schema, hardware-encoder, fork-local | |
| (Apple-Silicon hardware) are wired today — `libx265` / `libsvtav1` / | ||
| `libaom` / NVENC / QSV / AMF adapters land alongside `fr_regressor_v2_hw` | ||
| (see [ADR-0283](../adr/0283-vmaf-tune-videotoolbox-adapters.md) and | ||
| [ADR-0284](../adr/0284-fr-regressor-v2-codec-schema-expansion.md)). | ||
| All adapters live under `tools/vmaf-tune/src/vmaftune/codec_adapters/`. |
| # Research-0054: Apple VideoToolbox + 16-slot codec one-hot expansion | ||
|
|
||
| - **Date**: 2026-05-03 | ||
| - **Companion ADRs**: [ADR-0283](../adr/0283-vmaf-tune-videotoolbox-adapters.md), [ADR-0284](../adr/0284-fr-regressor-v2-codec-schema-expansion.md) | ||
| - **Status**: Snapshot at proposal time. | ||
|
|
||
| ## Question | ||
|
|
||
| Two coupled questions: | ||
|
|
||
| 1. How should `vmaf-tune` drive Apple's VideoToolbox (the only | ||
| hardware-encode path on Apple Silicon and T2 Macs)? | ||
| 2. How wide does the codec one-hot vocabulary need to be to cover | ||
| the software + hardware codec adapter set the parallel agents are | ||
| landing for `fr_regressor_v2_hw`? | ||
|
|
…s 17 adapters) Refactors `tools/vmaf-tune/src/vmaftune/encode.py` away from the Phase A hard-coded `libx264` `-c:v / -preset / -crf` argv. `run_encode` now looks up the codec adapter via `codec_adapters.get_adapter(req.encoder)` and asks it for the FFmpeg argv slice via `adapter.ffmpeg_codec_args(preset, quality)` plus an optional `adapter.extra_params()`. Adapters that don't yet expose `ffmpeg_codec_args` fall back silently to the legacy x264-CRF shape so partial in-flight adapter PRs stay drivable end-to-end. `parse_versions(stderr, encoder=...)` selects a per-codec version probe (libx264, libx265, libsvtav1, libvpx-vp9, libaom-av1, libvvenc, NVENC, QSV, AMF, VideoToolbox); unknown encoders return "unknown" rather than raising. The `EncodeRequest.crf` field is preserved unchanged for the SCHEMA_VERSION=1 row contract; a `quality` property mirrors it for adapter-side codec-agnostic vocabulary. Existing 13-test x264 suite still green; new 19-test multi-codec suite covers 9 representative codec shapes plus the unknown-codec / missing-method fallback paths. Unblocks 17 in-flight codec adapter PRs (#360 libaom, #362 libx265, #364 NVENC, #366 AMF, #367 QSV, #368 libvvenc, #370 libsvtav1, #373 VideoToolbox, plus follow-on waves) which can now drive end-to-end encodes without copying or mutating the harness. Ships ADR-0294 + research digest 0054, vmaf-tune.md "Codec adapter contract" section, rebase-notes #228 invariant, CHANGELOG entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coupled changes for vmaf-tune + fr_regressor_v2_hw: A. Apple VideoToolbox codec adapters (ADR-0283) Adds H264VideoToolboxAdapter and HEVCVideoToolboxAdapter under tools/vmaf-tune/src/vmaftune/codec_adapters/, sharing _videotoolbox_common.py for the -q:v (0..100, higher = better) quality knob and the nine-name preset to -realtime boolean mapping. AV1 hardware encoding intentionally omitted (unsupported on Apple Silicon as of 2026). B. fr_regressor_v2 codec one-hot schema expansion 6 to 16 slots (ADR-0284) Bumps CODEC_VOCAB_VERSION from 1 to 2 in ai/src/vmaf_train/codec.py. New vocabulary covers: software (x264, x265, libsvtav1, libaom) + NVENC x3 + QSV x3 + AMF x3 + VideoToolbox x2 + reserved. Ships a SMOKE fr_regressor_v2_hw.onnx (24-D wide-input vector = 6 features + 16 codec one-hot + 1 preset_norm + 1 crf_norm) trained on synthetic deterministic data; T7-CODEC-AWARE-V2 follow-up retrains against a real multi-codec corpus. v1 ONNX (fr_regressor_v1.onnx) and the CODEC_VOCAB_V1 tuple stay shipped and unaffected. New tests: test_codec_adapter_videotoolbox.py (9 cases) + test_codec_one_hot.py (6 cases) under tools/vmaf-tune/tests/. Six deep-dive deliverables: Research-0054, ADR-0283 + ADR-0284 alternatives matrices, AGENTS.md invariant notes (tools/vmaf-tune + ai/), reproducer in PR description, CHANGELOG entry, docs/rebase-notes.md row. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e393bfe to
73f4c24
Compare
|
Skipping for now — architectural conflict with merged ADR-0291 (12-slot ENCODER_VOCAB v2). The 16-slot schema expansion needs to land as a successor ADR (vocab v3) with a renamed script ( |
…s 17 adapters) (#376) * feat(tools): vmaf-tune encode.py — codec-agnostic dispatcher (unblocks 17 adapters) Refactors `tools/vmaf-tune/src/vmaftune/encode.py` away from the Phase A hard-coded `libx264` `-c:v / -preset / -crf` argv. `run_encode` now looks up the codec adapter via `codec_adapters.get_adapter(req.encoder)` and asks it for the FFmpeg argv slice via `adapter.ffmpeg_codec_args(preset, quality)` plus an optional `adapter.extra_params()`. Adapters that don't yet expose `ffmpeg_codec_args` fall back silently to the legacy x264-CRF shape so partial in-flight adapter PRs stay drivable end-to-end. `parse_versions(stderr, encoder=...)` selects a per-codec version probe (libx264, libx265, libsvtav1, libvpx-vp9, libaom-av1, libvvenc, NVENC, QSV, AMF, VideoToolbox); unknown encoders return "unknown" rather than raising. The `EncodeRequest.crf` field is preserved unchanged for the SCHEMA_VERSION=1 row contract; a `quality` property mirrors it for adapter-side codec-agnostic vocabulary. Existing 13-test x264 suite still green; new 19-test multi-codec suite covers 9 representative codec shapes plus the unknown-codec / missing-method fallback paths. Unblocks 17 in-flight codec adapter PRs (#360 libaom, #362 libx265, #364 NVENC, #366 AMF, #367 QSV, #368 libvvenc, #370 libsvtav1, #373 VideoToolbox, plus follow-on waves) which can now drive end-to-end encodes without copying or mutating the harness. Ships ADR-0294 + research digest 0054, vmaf-tune.md "Codec adapter contract" section, rebase-notes #228 invariant, CHANGELOG entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(docs): renumber encode-multi-codec ADR 0294→0297 + research 0069→0070 --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds H264VideoToolboxAdapter and HEVCVideoToolboxAdapter under tools/vmaf-tune/src/vmaftune/codec_adapters/, sharing a single _videotoolbox_common.py for the -q:v (0..100, higher = better) quality knob and the nine-name preset to -realtime boolean mapping. Both adapters carry invert_quality=False and a [0, 100] quality range — downstream consumers interpret the knob via the adapter registry. Preset taxonomy maps onto VT's coarser -realtime flag: - ultrafast/superfast/veryfast/faster/fast → realtime=1 - medium/slow/slower/veryslow → realtime=0 AV1 hardware encoding intentionally omitted — Apple Silicon has no AV1 hardware encoder block as of 2026 and FFmpeg exposes no av1_videotoolbox. Tests mock subprocess.run so Linux CI stays green; macOS end-to-end is left to contributors with VideoToolbox available locally. Split from PR #373 — the originally-coupled 16-slot codec-vocab schema expansion is deferred to a follow-up PR awaiting a fresh fr_regressor_v2 production retrain (ship-gate per ADR-0235 + ADR-0291). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#398) Adds H264VideoToolboxAdapter and HEVCVideoToolboxAdapter under tools/vmaf-tune/src/vmaftune/codec_adapters/, sharing a single _videotoolbox_common.py for the -q:v (0..100, higher = better) quality knob and the nine-name preset to -realtime boolean mapping. Both adapters carry invert_quality=False and a [0, 100] quality range — downstream consumers interpret the knob via the adapter registry. Preset taxonomy maps onto VT's coarser -realtime flag: - ultrafast/superfast/veryfast/faster/fast → realtime=1 - medium/slow/slower/veryslow → realtime=0 AV1 hardware encoding intentionally omitted — Apple Silicon has no AV1 hardware encoder block as of 2026 and FFmpeg exposes no av1_videotoolbox. Tests mock subprocess.run so Linux CI stays green; macOS end-to-end is left to contributors with VideoToolbox available locally. Split from PR #373 — the originally-coupled 16-slot codec-vocab schema expansion is deferred to a follow-up PR awaiting a fresh fr_regressor_v2 production retrain (ship-gate per ADR-0235 + ADR-0291). Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Closing — re-scoped by #401 (feat(ai): ENCODER_VOCAB v3 (16-slot) schema expansion + retrain plan, ADR-0302). The VideoToolbox adapter half landed separately (master already has h264_videotoolbox.py + hevc_videotoolbox.py + ADR-0283); only the schema-expansion half remained, and that's what #401 ships as a clean scaffold against current master. |
…(ADR-0302) Re-scope of PR #373: drop the VideoToolbox adapters (already on master via ADR-0283) and keep only the 13 -> 16 vocab expansion + retrain plan. This is the schema scaffold only -- the live `ENCODER_VOCAB` and `ENCODER_VOCAB_VERSION = 2` stay as the source of truth. A parallel `ENCODER_VOCAB_V3` constant in `ai/scripts/train_fr_regressor_v2.py` documents the target 16-slot vocab (slots 0..12 mirror v2 verbatim; slots 13/14/15 append `libsvtav1`, `h264_videotoolbox`, `hevc_videotoolbox`). Append-only ordering preserved per ADR-0235. The follow-up retrain PR is gated on clearing the same mean LOSO PLCC >= 0.95 ship gate ADR-0291 cleared on v2, plus the ADR-0235 multi-codec lift floor (>= +0.005 PLCC over the v1 single-input regressor). Production ONNX swap deferred until that retrain clears. Six deep-dive deliverables (ADR-0108): 1. Research digest: docs/research/0075-encoder-vocab-v3-schema-expansion.md 2. Decision matrix: ADR-0302 Alternatives considered (4-row table) 3. AGENTS.md invariant note: ai/AGENTS.md "v3 retrain invariant" section 4. Reproducer: python -m pytest ai/tests/ -k encoder_vocab (no-op until vocab tests are added by the retrain PR) 5. CHANGELOG fragment: changelog.d/added/encoder-vocab-v3-schema-expansion.md 6. Rebase note: docs/rebase-notes.md section "0302 -- ENCODER_VOCAB v3" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(ADR-0302) Re-scope of PR #373: drop the VideoToolbox adapters (already on master via ADR-0283) and keep only the 13 -> 16 vocab expansion + retrain plan. This is the schema scaffold only -- the live `ENCODER_VOCAB` and `ENCODER_VOCAB_VERSION = 2` stay as the source of truth. A parallel `ENCODER_VOCAB_V3` constant in `ai/scripts/train_fr_regressor_v2.py` documents the target 16-slot vocab (slots 0..12 mirror v2 verbatim; slots 13/14/15 append `libsvtav1`, `h264_videotoolbox`, `hevc_videotoolbox`). Append-only ordering preserved per ADR-0235. The follow-up retrain PR is gated on clearing the same mean LOSO PLCC >= 0.95 ship gate ADR-0291 cleared on v2, plus the ADR-0235 multi-codec lift floor (>= +0.005 PLCC over the v1 single-input regressor). Production ONNX swap deferred until that retrain clears. Six deep-dive deliverables (ADR-0108): 1. Research digest: docs/research/0075-encoder-vocab-v3-schema-expansion.md 2. Decision matrix: ADR-0302 Alternatives considered (4-row table) 3. AGENTS.md invariant note: ai/AGENTS.md "v3 retrain invariant" section 4. Reproducer: python -m pytest ai/tests/ -k encoder_vocab (no-op until vocab tests are added by the retrain PR) 5. CHANGELOG fragment: changelog.d/added/encoder-vocab-v3-schema-expansion.md 6. Rebase note: docs/rebase-notes.md section "0302 -- ENCODER_VOCAB v3" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(ADR-0302) (#401) * feat(ai): ENCODER_VOCAB v3 (16-slot) schema expansion + retrain plan (ADR-0302) Re-scope of PR #373: drop the VideoToolbox adapters (already on master via ADR-0283) and keep only the 13 -> 16 vocab expansion + retrain plan. This is the schema scaffold only -- the live `ENCODER_VOCAB` and `ENCODER_VOCAB_VERSION = 2` stay as the source of truth. A parallel `ENCODER_VOCAB_V3` constant in `ai/scripts/train_fr_regressor_v2.py` documents the target 16-slot vocab (slots 0..12 mirror v2 verbatim; slots 13/14/15 append `libsvtav1`, `h264_videotoolbox`, `hevc_videotoolbox`). Append-only ordering preserved per ADR-0235. The follow-up retrain PR is gated on clearing the same mean LOSO PLCC >= 0.95 ship gate ADR-0291 cleared on v2, plus the ADR-0235 multi-codec lift floor (>= +0.005 PLCC over the v1 single-input regressor). Production ONNX swap deferred until that retrain clears. Six deep-dive deliverables (ADR-0108): 1. Research digest: docs/research/0075-encoder-vocab-v3-schema-expansion.md 2. Decision matrix: ADR-0302 Alternatives considered (4-row table) 3. AGENTS.md invariant note: ai/AGENTS.md "v3 retrain invariant" section 4. Reproducer: python -m pytest ai/tests/ -k encoder_vocab (no-op until vocab tests are added by the retrain PR) 5. CHANGELOG fragment: changelog.d/added/encoder-vocab-v3-schema-expansion.md 6. Rebase note: docs/rebase-notes.md section "0302 -- ENCODER_VOCAB v3" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(docs): renumber encoder-vocab-v3 research 0075→0078 (collision with #399 ensemble) --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Two coupled changes for
tools/vmaf-tune/+ the codec-aware FR regressor:H264VideoToolboxAdapter+HEVCVideoToolboxAdapterundertools/vmaf-tune/src/vmaftune/codec_adapters/, sharing_videotoolbox_common.pyfor the-q:v(0..100, higher = better)quality knob and the nine-name preset →
-realtimeboolean mapping.AV1 hardware encoding intentionally omitted (not available on Apple
Silicon as of 2026).
fr_regressor_v2codec one-hot 6 → 16 slots (ADR-0284).Bumps
CODEC_VOCAB_VERSION1 → 2 inai/src/vmaf_train/codec.py.New vocabulary fits today's 13 software + hardware adapters with
one column of headroom: x264, x265, libsvtav1, libaom, h264_nvenc,
hevc_nvenc, av1_nvenc, h264_qsv, hevc_qsv, av1_qsv, h264_amf,
hevc_amf, av1_amf, h264_videotoolbox, hevc_videotoolbox, reserved.
fr_regressor_v2_hw.onnx(24-D wide-input vector =6 features + 16 codec one-hot + 1 preset_norm + 1 crf_norm) trained
on synthetic deterministic data. v1 ONNX +
CODEC_VOCAB_V1stayunaffected. Real-corpus retrain tracked under T7-CODEC-AWARE-V2.
Schema before / after
Smoke result
Test plan
PYTHONPATH=ai/src python -m pytest ai/tests/test_codec_aware_fr.py tools/vmaf-tune/tests/— 37 passpython ai/scripts/train_fr_regressor_v2.py --smoke --epochs 3— exports a valid 24-D ONNXpython -c "import onnxruntime as ort; ort.InferenceSession('model/tiny/fr_regressor_v2_hw.onnx').run(None, {'features': ...})"— ORT inference workspre-commit run --files <all>— cleanmodel/tiny/registry.schema.jsonSix deep-dive deliverables (ADR-0108)
docs/research/0054-videotoolbox-and-codec-schema-v2.mdtools/vmaf-tune/AGENTS.md,ai/AGENTS.mdCHANGELOG.mdunder "Unreleased / lusoris fork"docs/rebase-notes.mdentry "0283 / 0284"🤖 Generated with Claude Code