Skip to content

feat(tools+ai): VideoToolbox adapters + 16-slot codec schema expansion#373

Closed
lusoris wants to merge 1 commit intomasterfrom
feat/vmaf-tune-vt-and-codec-schema
Closed

feat(tools+ai): VideoToolbox adapters + 16-slot codec schema expansion#373
lusoris wants to merge 1 commit intomasterfrom
feat/vmaf-tune-vt-and-codec-schema

Conversation

@lusoris
Copy link
Copy Markdown
Owner

@lusoris lusoris commented May 3, 2026

Summary

Two coupled changes for tools/vmaf-tune/ + the codec-aware FR regressor:

  • (A) Apple VideoToolbox adapters (ADR-0283). Adds
    H264VideoToolboxAdapter + HEVCVideoToolboxAdapter under
    tools/vmaf-tune/src/vmaftune/codec_adapters/, sharing
    _videotoolbox_common.py for the -q:v (0..100, higher = better)
    quality knob and the nine-name preset → -realtime boolean mapping.
    AV1 hardware encoding intentionally omitted (not available on Apple
    Silicon as of 2026).
  • (B) fr_regressor_v2 codec one-hot 6 → 16 slots (ADR-0284).
    Bumps CODEC_VOCAB_VERSION 1 → 2 in ai/src/vmaf_train/codec.py.
    New vocabulary fits today's 13 software + hardware adapters with
    one column of headroom: x264, x265, libsvtav1, libaom, h264_nvenc,
    hevc_nvenc, av1_nvenc, h264_qsv, hevc_qsv, av1_qsv, h264_amf,
    hevc_amf, av1_amf, h264_videotoolbox, hevc_videotoolbox, reserved.
  • Ships a SMOKE fr_regressor_v2_hw.onnx (24-D wide-input vector =
    6 features + 16 codec one-hot + 1 preset_norm + 1 crf_norm) trained
    on synthetic deterministic data. v1 ONNX + CODEC_VOCAB_V1 stay
    unaffected. Real-corpus retrain tracked under T7-CODEC-AWARE-V2.

Schema before / after

Index v1 (6 slots) v2 (16 slots)
0 x264 x264
1 x265 x265
2 libsvtav1 libsvtav1
3 libvvenc libaom
4 libvpx-vp9 h264_nvenc
5 unknown hevc_nvenc
6 av1_nvenc
7 h264_qsv
8 hevc_qsv
9 av1_qsv
10 h264_amf
11 hevc_amf
12 av1_amf
13 h264_videotoolbox
14 hevc_videotoolbox
15 reserved

Smoke result

$ python ai/scripts/train_fr_regressor_v2.py --smoke --epochs 3
[fr-v2-hw] building synthetic 24-D smoke training set ...
[fr-v2-hw] X=(320, 24) y=(320,) (in_features=24)
[fr-v2-hw] training 3 epochs lr=0.001 ...
[fr-v2-hw] exporting ONNX → model/tiny/fr_regressor_v2_hw.onnx ...
[fr-v2-hw] shipped: model/tiny/fr_regressor_v2_hw.onnx
  (sha256=c3ec697ae42b596354167edec041ee4d43b920ce82e5757e244be3223cbca57e)

$ PYTHONPATH=ai/src python -m pytest \
    ai/tests/test_codec_aware_fr.py tools/vmaf-tune/tests/
======================== 37 passed in 1.43s ========================

Test plan

  • PYTHONPATH=ai/src python -m pytest ai/tests/test_codec_aware_fr.py tools/vmaf-tune/tests/ — 37 pass
  • python ai/scripts/train_fr_regressor_v2.py --smoke --epochs 3 — exports a valid 24-D ONNX
  • python -c "import onnxruntime as ort; ort.InferenceSession('model/tiny/fr_regressor_v2_hw.onnx').run(None, {'features': ...})" — ORT inference works
  • pre-commit run --files <all> — clean
  • Registry validates against model/tiny/registry.schema.json

Six deep-dive deliverables (ADR-0108)

  1. Research digest: docs/research/0054-videotoolbox-and-codec-schema-v2.md
  2. Decision matrices: ADR-0283 §Alternatives, ADR-0284 §Alternatives
  3. AGENTS.md invariant notes: tools/vmaf-tune/AGENTS.md, ai/AGENTS.md
  4. Reproducer: smoke result block above
  5. CHANGELOG entry: CHANGELOG.md under "Unreleased / lusoris fork"
  6. Rebase-notes row: docs/rebase-notes.md entry "0283 / 0284"

🤖 Generated with Claude Code

@lusoris lusoris force-pushed the feat/vmaf-tune-vt-and-codec-schema branch 2 times, most recently from cc00fa5 to e393bfe Compare May 3, 2026 19:46
lusoris pushed a commit that referenced this pull request May 3, 2026
…s 17 adapters)

Refactors `tools/vmaf-tune/src/vmaftune/encode.py` away from the Phase A
hard-coded `libx264` `-c:v / -preset / -crf` argv. `run_encode` now
looks up the codec adapter via `codec_adapters.get_adapter(req.encoder)`
and asks it for the FFmpeg argv slice via
`adapter.ffmpeg_codec_args(preset, quality)` plus an optional
`adapter.extra_params()`. Adapters that don't yet expose
`ffmpeg_codec_args` fall back silently to the legacy x264-CRF shape so
partial in-flight adapter PRs stay drivable end-to-end.
`parse_versions(stderr, encoder=...)` selects a per-codec version probe
(libx264, libx265, libsvtav1, libvpx-vp9, libaom-av1, libvvenc, NVENC,
QSV, AMF, VideoToolbox); unknown encoders return "unknown" rather than
raising. The `EncodeRequest.crf` field is preserved unchanged for the
SCHEMA_VERSION=1 row contract; a `quality` property mirrors it for
adapter-side codec-agnostic vocabulary.

Existing 13-test x264 suite still green; new 19-test multi-codec suite
covers 9 representative codec shapes plus the unknown-codec /
missing-method fallback paths. Unblocks 17 in-flight codec adapter PRs
(#360 libaom, #362 libx265, #364 NVENC, #366 AMF, #367 QSV, #368
libvvenc, #370 libsvtav1, #373 VideoToolbox, plus follow-on waves)
which can now drive end-to-end encodes without copying or mutating the
harness.

Ships ADR-0294 + research digest 0054, vmaf-tune.md "Codec adapter
contract" section, rebase-notes #228 invariant, CHANGELOG entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lusoris lusoris marked this pull request as ready for review May 5, 2026 12:35
Copilot AI review requested due to automatic review settings May 5, 2026 12:35
@lusoris
Copy link
Copy Markdown
Owner Author

lusoris commented May 5, 2026

Skipping in merge train — needs rework against shipped v2.

Master now has fr_regressor_v2 shipped to production via PR #397 with the 13-slot ENCODER_VOCAB v2 schema (PR #394). This PR expands the schema to 16 slots, which would invalidate the production ONNX checkpoint and the registry entry that just landed.

Specific conflicts on rebase:

  • ai/scripts/train_fr_regressor_v2.py: ENCODER_VOCAB definition (this PR's 16-slot version vs master's 13-slot production version)
  • model/tiny/registry.json: production entry vs this PR's smoke version
  • 6 other files (CHANGELOG / docs / codec_adapters / vmaf-tune AGENTS / usage doc / tests)

Recommended path forward: split into two PRs:

  1. VideoToolbox adapters only — add the 3 *_videotoolbox codec adapters following the same pattern as nvenc/amf/qsv. No vocab change. Mergeable today.
  2. Schema expansion 13→16 + retrain — landed as a single coherent PR including a fresh production retrain that clears v2's 0.95 LOSO PLCC ship gate (matches the criterion ADR-0235 + ADR-0291 set for v2 retrain). Follow-up to T7-FR-REGRESSOR-V2-PROD.

@lusoris lusoris marked this pull request as draft May 5, 2026 12:39
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Apple VideoToolbox codec adapters to tools/vmaf-tune and expands the codec-conditioning vocabulary for the codec-aware FR regressor from 6 to 16 one-hot slots (including new hardware encoder buckets), plus a smoke-trained fr_regressor_v2_hw ONNX and associated documentation/ADRs.

Changes:

  • Add H264VideoToolboxAdapter / HEVCVideoToolboxAdapter plus shared VideoToolbox preset/quality helpers, and extend the codec adapter registry + tests.
  • Expand ai/src/vmaf_train/codec.py to v2 CODEC_VOCAB (16 slots) with CODEC_VOCAB_VERSION = 2, update AI tests accordingly, and add a smoke training/export script.
  • Register the new smoke ONNX in model/tiny/registry.json and add user-facing/docs updates (usage doc, research digest, ADRs, rebase notes, changelog entry).

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tools/vmaf-tune/tests/test_corpus.py Updates codec registry expectations to include VideoToolbox adapters.
tools/vmaf-tune/tests/test_codec_one_hot.py Adds regression gate pinning v2 codec vocab ordering and adapter→slot mapping.
tools/vmaf-tune/tests/test_codec_adapter_videotoolbox.py Adds mocked smoke tests for VideoToolbox adapter contract + encode argv shape.
tools/vmaf-tune/src/vmaftune/codec_adapters/h264_videotoolbox.py Introduces H.264 VideoToolbox adapter.
tools/vmaf-tune/src/vmaftune/codec_adapters/hevc_videotoolbox.py Introduces HEVC VideoToolbox adapter.
tools/vmaf-tune/src/vmaftune/codec_adapters/_videotoolbox_common.py Shared preset→-realtime mapping and -q:v validation/constants.
tools/vmaf-tune/src/vmaftune/codec_adapters/init.py Registers VideoToolbox adapters and exports them via __all__.
tools/vmaf-tune/AGENTS.md Documents new invariants (codec one-hot ordering; VT quality scale).
model/tiny/registry.json Adds fr_regressor_v2_hw model registry entry.
model/tiny/fr_regressor_v2_hw.json Adds sidecar describing vocab + feature layout for the 24-D wide input.
docs/usage/vmaf-tune.md Documents VideoToolbox usage and the 16-slot codec vocabulary layout.
docs/research/0068-videotoolbox-and-codec-schema-v2.md Adds supporting research digest for VT adapters + vocab sizing.
docs/rebase-notes.md Adds fork-local rebase note entry for this workstream.
docs/adr/README.md Appends ADR-0283/0284 rows to the ADR index table.
docs/adr/0283-vmaf-tune-videotoolbox-adapters.md New ADR for VideoToolbox adapters.
docs/adr/0284-fr-regressor-v2-codec-schema-expansion.md New ADR for 6→16 codec vocab expansion.
CHANGELOG.md Adds an Unreleased entry describing the new adapters + vocab expansion + smoke model.
ai/tests/test_codec_aware_fr.py Updates AI tests to the v2 vocab contract and reserved-bucket semantics.
ai/src/vmaf_train/codec.py Implements v2 codec vocab + version bump; keeps v1 vocab as CODEC_VOCAB_V1.
ai/scripts/train_fr_regressor_v2.py Adds smoke-only training + ONNX export + registry/sidecar writer for fr_regressor_v2_hw.
ai/AGENTS.md Documents the v2 vocab invariant and the new regression gate test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +116 to +133
req = EncodeRequest(
source=src,
width=1920,
height=1080,
pix_fmt="yuv420p",
framerate=24.0,
encoder="h264_videotoolbox",
preset="medium",
crf=50,
output=out,
)
# Direct command builder check.
cmd = build_ffmpeg_command(req)
assert "h264_videotoolbox" in cmd
assert "-c:v" in cmd
# Driver path with the mock.
res = run_encode(req, runner=fake_run)
assert res.exit_status == 0
Comment thread model/tiny/registry.json
Comment on lines +26 to +30
"kind": "fr",
"notes": "Codec-aware FR regressor (v2_hw, ADR-0284) \u2014 wide-input 24-D vector (canonical-6 features + 16-slot codec one-hot + preset_norm + crf_norm) \u2192 MOS scalar. SMOKE export \u2014 trained on synthetic deterministic data; real multi-codec corpus training tracked under T7-CODEC-AWARE-V2. Vocabulary version 2; v1 single-input fr_regressor_v1.onnx remains shipped and unaffected.",
"onnx": "fr_regressor_v2_hw.onnx",
"opset": 17,
"sha256": "c3ec697ae42b596354167edec041ee4d43b920ce82e5757e244be3223cbca57e",
Comment thread CHANGELOG.md
Comment on lines +11 to +30
- **`vmaf-tune` Apple VideoToolbox adapters + 16-slot codec one-hot
schema expansion (ADR-0283 + ADR-0284).** Adds
`H264VideoToolboxAdapter` and `HEVCVideoToolboxAdapter` under
[`tools/vmaf-tune/src/vmaftune/codec_adapters/`](tools/vmaf-tune/src/vmaftune/codec_adapters/),
sharing `_videotoolbox_common.py` for the `-q:v` (0..100, higher =
better) quality knob and the nine-name preset → `-realtime` boolean
mapping. AV1 hardware encoding intentionally omitted (unsupported
on Apple Silicon as of 2026). The codec-aware FR regressor
vocabulary expands from 6 → 16 slots in
[`ai/src/vmaf_train/codec.py`](ai/src/vmaf_train/codec.py)
(`CODEC_VOCAB_VERSION` 1 → 2): software (x264, x265, libsvtav1,
libaom) + NVENC ×3 + QSV ×3 + AMF ×3 + VideoToolbox ×2 + reserved.
Ships a SMOKE `fr_regressor_v2_hw.onnx` (24-D wide-input vector =
6 features + 16 codec one-hot + 1 preset_norm + 1 crf_norm) trained
on synthetic deterministic data; T7-CODEC-AWARE-V2 follow-up
retrains against a real multi-codec corpus. v1 ONNX
(`fr_regressor_v1.onnx`) and the `CODEC_VOCAB_V1` tuple stay shipped
and unaffected. New tests under
[`tools/vmaf-tune/tests/`](tools/vmaf-tune/tests/):
`test_codec_adapter_videotoolbox.py` and `test_codec_one_hot.py`.
Comment thread docs/adr/README.md
Comment on lines +265 to +266
| [ADR-0283](0283-vmaf-tune-videotoolbox-adapters.md) | `vmaf-tune` Apple VideoToolbox codec adapters. Adds `H264VideoToolboxAdapter` + `HEVCVideoToolboxAdapter` under `tools/vmaf-tune/src/vmaftune/codec_adapters/`, sharing `_videotoolbox_common.py` for the `-q:v` (0..100, higher = better) quality knob and the nine-name preset → `-realtime` boolean mapping. AV1 hardware encoding intentionally omitted (not available on Apple Silicon as of 2026). Tests mock `subprocess.run` so the suite stays Linux-CI-runnable; the adapters exercise the codec-adapter contract from [ADR-0237](0237-quality-aware-encode-automation.md). Companion: [ADR-0284](0284-fr-regressor-v2-codec-schema-expansion.md). | Accepted | tooling, ai, ffmpeg, codec, hardware-encoder, apple, fork-local |
| [ADR-0284](0284-fr-regressor-v2-codec-schema-expansion.md) | `fr_regressor_v2` codec one-hot expansion from 6 → 16 slots. Bumps `CODEC_VOCAB_VERSION` from 1 to 2; vocabulary becomes `(x264, x265, libsvtav1, libaom, h264_nvenc, hevc_nvenc, av1_nvenc, h264_qsv, hevc_qsv, av1_qsv, h264_amf, hevc_amf, av1_amf, h264_videotoolbox, hevc_videotoolbox, reserved)`. The shipped `fr_regressor_v2_hw` model concatenates a 24-D wide-input vector (`6 features + 16 codec one-hot + 1 preset_norm + 1 crf_norm`); SMOKE-trained on synthetic data, T7-CODEC-AWARE-V2 follow-up retrains against a real multi-codec corpus. v1 `CODEC_VOCAB_V1` tuple + `fr_regressor_v1.onnx` stay shipped and unaffected. | Accepted | ai, fr-regressor, codec, schema, hardware-encoder, fork-local |
Comment thread docs/usage/vmaf-tune.md
Comment on lines +147 to +151
(Apple-Silicon hardware) are wired today — `libx265` / `libsvtav1` /
`libaom` / NVENC / QSV / AMF adapters land alongside `fr_regressor_v2_hw`
(see [ADR-0283](../adr/0283-vmaf-tune-videotoolbox-adapters.md) and
[ADR-0284](../adr/0284-fr-regressor-v2-codec-schema-expansion.md)).
All adapters live under `tools/vmaf-tune/src/vmaftune/codec_adapters/`.
Comment on lines +1 to +16
# Research-0054: Apple VideoToolbox + 16-slot codec one-hot expansion

- **Date**: 2026-05-03
- **Companion ADRs**: [ADR-0283](../adr/0283-vmaf-tune-videotoolbox-adapters.md), [ADR-0284](../adr/0284-fr-regressor-v2-codec-schema-expansion.md)
- **Status**: Snapshot at proposal time.

## Question

Two coupled questions:

1. How should `vmaf-tune` drive Apple's VideoToolbox (the only
hardware-encode path on Apple Silicon and T2 Macs)?
2. How wide does the codec one-hot vocabulary need to be to cover
the software + hardware codec adapter set the parallel agents are
landing for `fr_regressor_v2_hw`?

lusoris pushed a commit that referenced this pull request May 5, 2026
…s 17 adapters)

Refactors `tools/vmaf-tune/src/vmaftune/encode.py` away from the Phase A
hard-coded `libx264` `-c:v / -preset / -crf` argv. `run_encode` now
looks up the codec adapter via `codec_adapters.get_adapter(req.encoder)`
and asks it for the FFmpeg argv slice via
`adapter.ffmpeg_codec_args(preset, quality)` plus an optional
`adapter.extra_params()`. Adapters that don't yet expose
`ffmpeg_codec_args` fall back silently to the legacy x264-CRF shape so
partial in-flight adapter PRs stay drivable end-to-end.
`parse_versions(stderr, encoder=...)` selects a per-codec version probe
(libx264, libx265, libsvtav1, libvpx-vp9, libaom-av1, libvvenc, NVENC,
QSV, AMF, VideoToolbox); unknown encoders return "unknown" rather than
raising. The `EncodeRequest.crf` field is preserved unchanged for the
SCHEMA_VERSION=1 row contract; a `quality` property mirrors it for
adapter-side codec-agnostic vocabulary.

Existing 13-test x264 suite still green; new 19-test multi-codec suite
covers 9 representative codec shapes plus the unknown-codec /
missing-method fallback paths. Unblocks 17 in-flight codec adapter PRs
(#360 libaom, #362 libx265, #364 NVENC, #366 AMF, #367 QSV, #368
libvvenc, #370 libsvtav1, #373 VideoToolbox, plus follow-on waves)
which can now drive end-to-end encodes without copying or mutating the
harness.

Ships ADR-0294 + research digest 0054, vmaf-tune.md "Codec adapter
contract" section, rebase-notes #228 invariant, CHANGELOG entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lusoris lusoris marked this pull request as ready for review May 5, 2026 13:35
Two coupled changes for vmaf-tune + fr_regressor_v2_hw:

A. Apple VideoToolbox codec adapters (ADR-0283)

   Adds H264VideoToolboxAdapter and HEVCVideoToolboxAdapter under
   tools/vmaf-tune/src/vmaftune/codec_adapters/, sharing
   _videotoolbox_common.py for the -q:v (0..100, higher = better)
   quality knob and the nine-name preset to -realtime boolean
   mapping. AV1 hardware encoding intentionally omitted
   (unsupported on Apple Silicon as of 2026).

B. fr_regressor_v2 codec one-hot schema expansion 6 to 16 slots
   (ADR-0284)

   Bumps CODEC_VOCAB_VERSION from 1 to 2 in
   ai/src/vmaf_train/codec.py. New vocabulary covers:
   software (x264, x265, libsvtav1, libaom) + NVENC x3 +
   QSV x3 + AMF x3 + VideoToolbox x2 + reserved.

   Ships a SMOKE fr_regressor_v2_hw.onnx (24-D wide-input vector =
   6 features + 16 codec one-hot + 1 preset_norm + 1 crf_norm)
   trained on synthetic deterministic data; T7-CODEC-AWARE-V2
   follow-up retrains against a real multi-codec corpus. v1 ONNX
   (fr_regressor_v1.onnx) and the CODEC_VOCAB_V1 tuple stay
   shipped and unaffected.

New tests: test_codec_adapter_videotoolbox.py (9 cases) +
test_codec_one_hot.py (6 cases) under tools/vmaf-tune/tests/.

Six deep-dive deliverables: Research-0054, ADR-0283 + ADR-0284
alternatives matrices, AGENTS.md invariant notes (tools/vmaf-tune
+ ai/), reproducer in PR description, CHANGELOG entry,
docs/rebase-notes.md row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lusoris lusoris force-pushed the feat/vmaf-tune-vt-and-codec-schema branch from e393bfe to 73f4c24 Compare May 5, 2026 13:36
@lusoris lusoris marked this pull request as draft May 5, 2026 13:37
@lusoris
Copy link
Copy Markdown
Owner Author

lusoris commented May 5, 2026

Skipping for now — architectural conflict with merged ADR-0291 (12-slot ENCODER_VOCAB v2). The 16-slot schema expansion needs to land as a successor ADR (vocab v3) with a renamed script (train_fr_regressor_v2_hw.py) to avoid clobbering the production v2 ONNX. Reopen when scoped.

lusoris added a commit that referenced this pull request May 5, 2026
…s 17 adapters) (#376)

* feat(tools): vmaf-tune encode.py — codec-agnostic dispatcher (unblocks 17 adapters)

Refactors `tools/vmaf-tune/src/vmaftune/encode.py` away from the Phase A
hard-coded `libx264` `-c:v / -preset / -crf` argv. `run_encode` now
looks up the codec adapter via `codec_adapters.get_adapter(req.encoder)`
and asks it for the FFmpeg argv slice via
`adapter.ffmpeg_codec_args(preset, quality)` plus an optional
`adapter.extra_params()`. Adapters that don't yet expose
`ffmpeg_codec_args` fall back silently to the legacy x264-CRF shape so
partial in-flight adapter PRs stay drivable end-to-end.
`parse_versions(stderr, encoder=...)` selects a per-codec version probe
(libx264, libx265, libsvtav1, libvpx-vp9, libaom-av1, libvvenc, NVENC,
QSV, AMF, VideoToolbox); unknown encoders return "unknown" rather than
raising. The `EncodeRequest.crf` field is preserved unchanged for the
SCHEMA_VERSION=1 row contract; a `quality` property mirrors it for
adapter-side codec-agnostic vocabulary.

Existing 13-test x264 suite still green; new 19-test multi-codec suite
covers 9 representative codec shapes plus the unknown-codec /
missing-method fallback paths. Unblocks 17 in-flight codec adapter PRs
(#360 libaom, #362 libx265, #364 NVENC, #366 AMF, #367 QSV, #368
libvvenc, #370 libsvtav1, #373 VideoToolbox, plus follow-on waves)
which can now drive end-to-end encodes without copying or mutating the
harness.

Ships ADR-0294 + research digest 0054, vmaf-tune.md "Codec adapter
contract" section, rebase-notes #228 invariant, CHANGELOG entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(docs): renumber encode-multi-codec ADR 0294→0297 + research 0069→0070

---------

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lusoris
Copy link
Copy Markdown
Owner Author

lusoris commented May 5, 2026

VT-only split landed as #398 (mergeable today, currently parked draft until merge-train cursor lands #380). The schema-expansion side stays here on this PR as draft pending the fresh fr_regressor_v2 production retrain.

lusoris pushed a commit that referenced this pull request May 5, 2026
Adds H264VideoToolboxAdapter and HEVCVideoToolboxAdapter under
tools/vmaf-tune/src/vmaftune/codec_adapters/, sharing a single
_videotoolbox_common.py for the -q:v (0..100, higher = better)
quality knob and the nine-name preset to -realtime boolean
mapping. Both adapters carry invert_quality=False and a [0, 100]
quality range — downstream consumers interpret the knob via the
adapter registry.

Preset taxonomy maps onto VT's coarser -realtime flag:
- ultrafast/superfast/veryfast/faster/fast → realtime=1
- medium/slow/slower/veryslow → realtime=0

AV1 hardware encoding intentionally omitted — Apple Silicon has
no AV1 hardware encoder block as of 2026 and FFmpeg exposes no
av1_videotoolbox.

Tests mock subprocess.run so Linux CI stays green; macOS
end-to-end is left to contributors with VideoToolbox available
locally.

Split from PR #373 — the originally-coupled 16-slot codec-vocab
schema expansion is deferred to a follow-up PR awaiting a fresh
fr_regressor_v2 production retrain (ship-gate per ADR-0235 +
ADR-0291).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 5, 2026
…#398)

Adds H264VideoToolboxAdapter and HEVCVideoToolboxAdapter under
tools/vmaf-tune/src/vmaftune/codec_adapters/, sharing a single
_videotoolbox_common.py for the -q:v (0..100, higher = better)
quality knob and the nine-name preset to -realtime boolean
mapping. Both adapters carry invert_quality=False and a [0, 100]
quality range — downstream consumers interpret the knob via the
adapter registry.

Preset taxonomy maps onto VT's coarser -realtime flag:
- ultrafast/superfast/veryfast/faster/fast → realtime=1
- medium/slow/slower/veryslow → realtime=0

AV1 hardware encoding intentionally omitted — Apple Silicon has
no AV1 hardware encoder block as of 2026 and FFmpeg exposes no
av1_videotoolbox.

Tests mock subprocess.run so Linux CI stays green; macOS
end-to-end is left to contributors with VideoToolbox available
locally.

Split from PR #373 — the originally-coupled 16-slot codec-vocab
schema expansion is deferred to a follow-up PR awaiting a fresh
fr_regressor_v2 production retrain (ship-gate per ADR-0235 +
ADR-0291).

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lusoris
Copy link
Copy Markdown
Owner Author

lusoris commented May 5, 2026

Closing — re-scoped by #401 (feat(ai): ENCODER_VOCAB v3 (16-slot) schema expansion + retrain plan, ADR-0302). The VideoToolbox adapter half landed separately (master already has h264_videotoolbox.py + hevc_videotoolbox.py + ADR-0283); only the schema-expansion half remained, and that's what #401 ships as a clean scaffold against current master.

@lusoris lusoris closed this May 5, 2026
lusoris pushed a commit that referenced this pull request May 5, 2026
…(ADR-0302)

Re-scope of PR #373: drop the VideoToolbox adapters (already on master
via ADR-0283) and keep only the 13 -> 16 vocab expansion + retrain plan.

This is the schema scaffold only -- the live `ENCODER_VOCAB` and
`ENCODER_VOCAB_VERSION = 2` stay as the source of truth. A parallel
`ENCODER_VOCAB_V3` constant in `ai/scripts/train_fr_regressor_v2.py`
documents the target 16-slot vocab (slots 0..12 mirror v2 verbatim;
slots 13/14/15 append `libsvtav1`, `h264_videotoolbox`,
`hevc_videotoolbox`). Append-only ordering preserved per ADR-0235.

The follow-up retrain PR is gated on clearing the same mean LOSO
PLCC >= 0.95 ship gate ADR-0291 cleared on v2, plus the ADR-0235
multi-codec lift floor (>= +0.005 PLCC over the v1 single-input
regressor). Production ONNX swap deferred until that retrain clears.

Six deep-dive deliverables (ADR-0108):
1. Research digest: docs/research/0075-encoder-vocab-v3-schema-expansion.md
2. Decision matrix: ADR-0302 Alternatives considered (4-row table)
3. AGENTS.md invariant note: ai/AGENTS.md "v3 retrain invariant" section
4. Reproducer: python -m pytest ai/tests/ -k encoder_vocab (no-op
   until vocab tests are added by the retrain PR)
5. CHANGELOG fragment: changelog.d/added/encoder-vocab-v3-schema-expansion.md
6. Rebase note: docs/rebase-notes.md section "0302 -- ENCODER_VOCAB v3"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 5, 2026
…(ADR-0302)

Re-scope of PR #373: drop the VideoToolbox adapters (already on master
via ADR-0283) and keep only the 13 -> 16 vocab expansion + retrain plan.

This is the schema scaffold only -- the live `ENCODER_VOCAB` and
`ENCODER_VOCAB_VERSION = 2` stay as the source of truth. A parallel
`ENCODER_VOCAB_V3` constant in `ai/scripts/train_fr_regressor_v2.py`
documents the target 16-slot vocab (slots 0..12 mirror v2 verbatim;
slots 13/14/15 append `libsvtav1`, `h264_videotoolbox`,
`hevc_videotoolbox`). Append-only ordering preserved per ADR-0235.

The follow-up retrain PR is gated on clearing the same mean LOSO
PLCC >= 0.95 ship gate ADR-0291 cleared on v2, plus the ADR-0235
multi-codec lift floor (>= +0.005 PLCC over the v1 single-input
regressor). Production ONNX swap deferred until that retrain clears.

Six deep-dive deliverables (ADR-0108):
1. Research digest: docs/research/0075-encoder-vocab-v3-schema-expansion.md
2. Decision matrix: ADR-0302 Alternatives considered (4-row table)
3. AGENTS.md invariant note: ai/AGENTS.md "v3 retrain invariant" section
4. Reproducer: python -m pytest ai/tests/ -k encoder_vocab (no-op
   until vocab tests are added by the retrain PR)
5. CHANGELOG fragment: changelog.d/added/encoder-vocab-v3-schema-expansion.md
6. Rebase note: docs/rebase-notes.md section "0302 -- ENCODER_VOCAB v3"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 6, 2026
…(ADR-0302) (#401)

* feat(ai): ENCODER_VOCAB v3 (16-slot) schema expansion + retrain plan (ADR-0302)

Re-scope of PR #373: drop the VideoToolbox adapters (already on master
via ADR-0283) and keep only the 13 -> 16 vocab expansion + retrain plan.

This is the schema scaffold only -- the live `ENCODER_VOCAB` and
`ENCODER_VOCAB_VERSION = 2` stay as the source of truth. A parallel
`ENCODER_VOCAB_V3` constant in `ai/scripts/train_fr_regressor_v2.py`
documents the target 16-slot vocab (slots 0..12 mirror v2 verbatim;
slots 13/14/15 append `libsvtav1`, `h264_videotoolbox`,
`hevc_videotoolbox`). Append-only ordering preserved per ADR-0235.

The follow-up retrain PR is gated on clearing the same mean LOSO
PLCC >= 0.95 ship gate ADR-0291 cleared on v2, plus the ADR-0235
multi-codec lift floor (>= +0.005 PLCC over the v1 single-input
regressor). Production ONNX swap deferred until that retrain clears.

Six deep-dive deliverables (ADR-0108):
1. Research digest: docs/research/0075-encoder-vocab-v3-schema-expansion.md
2. Decision matrix: ADR-0302 Alternatives considered (4-row table)
3. AGENTS.md invariant note: ai/AGENTS.md "v3 retrain invariant" section
4. Reproducer: python -m pytest ai/tests/ -k encoder_vocab (no-op
   until vocab tests are added by the retrain PR)
5. CHANGELOG fragment: changelog.d/added/encoder-vocab-v3-schema-expansion.md
6. Rebase note: docs/rebase-notes.md section "0302 -- ENCODER_VOCAB v3"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(docs): renumber encoder-vocab-v3 research 0075→0078 (collision with #399 ensemble)

---------

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lusoris lusoris deleted the feat/vmaf-tune-vt-and-codec-schema branch May 6, 2026 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants