feat(ai): fr_regressor_v2 ENCODER_VOCAB v2 (hw codec extension) by lusoris · Pull Request #394 · lusoris/vmaf

lusoris · 2026-05-04T21:45:56Z

Summary

Extends the closed encoder vocabulary used by codec-aware training (ai/scripts/train_fr_regressor_v2.py) with 6 hardware encoder entries (h264/hevc/av1 × NVENC + QSV) and bumps ENCODER_VOCAB_VERSION from 1 to 2. PRESET_ORDINAL gains matching sub-tables for NVENC's p1..p7 preset family and Intel QSV's libx264-aligned preset names.

Validation

216-row real corpus aggregated from 33,840 per-frame rows produced by scripts/dev/hw_encoder_corpus.py (PR #392):

vocab	PLCC	SROCC	RMSE
v1 (hw codecs → "unknown")	0.92	0.93	6.41
v2 (proper one-hot)	0.96	0.95	4.15

The vocab extension carries real signal — codec one-hot lets the model learn per-codec quality calibration that the "unknown" collapse discarded.

Scope

6 vocab entries appended (load-bearing — appended-only, never inserted)
ENCODER_VOCAB_VERSION 1 → 2
6 PRESET_ORDINAL sub-tables added (NVENC p1..p7 + 3 QSV libx264-aligned)
No model graph regenerated in this PR — the registry's fr_regressor_v2.onnx remains the smoke ONNX. A follow-up training run will produce the real v2-vocab graph.

ADR-0108 deliverables

(1) Research digest
no digest needed: trivial vocab-table extension; rationale is in the PLCC validation table above
(2) Decision matrix
no alternatives: only-one-way fix — closed-vocab schema requires ENCODER_VOCAB extension to recognise hw codecs
(3) AGENTS.md invariant note: docs/rebase-notes.md entry 0235 documents the append-only invariant + version bump semantic
(4) Reproducer / smoke-test command: python3 ai/scripts/train_fr_regressor_v2.py --corpus <jsonl> --epochs 200 --no-export
(5) CHANGELOG fragment: changelog.d/changed/fr-regressor-v2-hw-codec-vocab.md
(6) Rebase note: docs/rebase-notes.md entry 0235

🤖 Generated with Claude Code

Extends the closed encoder vocabulary used by codec-aware training with 6 hardware encoder entries (h264/hevc/av1 across NVENC + QSV) and bumps ENCODER_VOCAB_VERSION 1 -> 2. PRESET_ORDINAL gains matching sub-tables for NVENC's p1..p7 preset family and Intel QSV's libx264-aligned preset names. Validated on a 216-row real corpus aggregated from 33,840 per-frame rows produced by scripts/dev/hw_encoder_corpus.py: vocab v1 (all hw codecs -> 'unknown'): PLCC 0.92 RMSE 6.41 vocab v2 (proper one-hot): PLCC 0.96 RMSE 4.15 The vocab extension carries real signal — codec one-hot lets the model learn per-codec quality calibration that 'unknown' collapses discard. No model graph regenerated in this PR (the registry's fr_regressor_v2.onnx remains the smoke ONNX); the next training run that produces a real ONNX will use vocab v2. Co-Authored-By: Claude <noreply@anthropic.com>

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Extends fr_regressor_v2’s closed encoder vocabulary and preset ordinal mappings to recognize NVIDIA NVENC and Intel QSV hardware encoders, and records the append-only invariant + version bump semantics in docs/changelog.

Changes:

Append 6 hardware encoder strings to ENCODER_VOCAB and bump ENCODER_VOCAB_VERSION to 2
Add PRESET_ORDINAL sub-tables for NVENC p1..p7 and QSV’s libx264-aligned presets
Document the change in docs/rebase-notes.md and a changelog fragment

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
docs/rebase-notes.md	Adds rebase note documenting append-only invariant and v2 bump semantics
changelog.d/changed/fr-regressor-v2-hw-codec-vocab.md	Adds changelog entry describing the encoder vocab + preset table extension
ai/scripts/train_fr_regressor_v2.py	Appends hw encoder tokens, bumps vocab version, and introduces preset ordinal maps for NVENC/QSV

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    # ENCODER_VOCAB_VERSION to 2 — the previous vocab v1 model graphs
+    # are forward-compatible (extra one-hot bits are zero-padded for
+    # legacy callers) but new training runs target v2.


+    "h264_nvenc": {"p1": 0, "p2": 2, "p3": 3, "p4": 5, "p5": 6, "p6": 7, "p7": 9},
+    "hevc_nvenc": {"p1": 0, "p2": 2, "p3": 3, "p4": 5, "p5": 6, "p6": 7, "p7": 9},
+    "av1_nvenc": {"p1": 0, "p2": 2, "p3": 3, "p4": 5, "p5": 6, "p6": 7, "p7": 9},
+    # Intel QSV presets share the libx264 vocabulary but only ship
+    # veryfast/faster/fast/medium/slow/slower/veryslow on Arc.
+    "h264_qsv": {
+        "veryfast": 2,
+        "faster": 3,
+        "fast": 4,
+        "medium": 5,
+        "slow": 6,
+        "slower": 7,
+        "veryslow": 8,
+    },
+    "hevc_qsv": {
+        "veryfast": 2,
+        "faster": 3,
+        "fast": 4,
+        "medium": 5,
+        "slow": 6,
+        "slower": 7,
+        "veryslow": 8,
+    },
+    "av1_qsv": {
+        "veryfast": 2,
+        "faster": 3,
+        "fast": 4,
+        "medium": 5,
+        "slow": 6,
+        "slower": 7,
+        "veryslow": 8,
+    },


lusoris marked this pull request as ready for review May 5, 2026 04:20

Copilot AI review requested due to automatic review settings May 5, 2026 04:20

chore: wake required checks (post-promotion empty trigger)

3d674df

Copilot AI reviewed May 5, 2026

View reviewed changes

Copilot started reviewing on behalf of lusoris May 5, 2026 04:28 View session

lusoris merged commit a5ca35d into master May 5, 2026
54 checks passed

lusoris deleted the feat/fr-regressor-v2-hw-codec-vocab branch May 5, 2026 04:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ai): fr_regressor_v2 ENCODER_VOCAB v2 (hw codec extension)#394

feat(ai): fr_regressor_v2 ENCODER_VOCAB v2 (hw codec extension)#394
lusoris merged 2 commits intomasterfrom
feat/fr-regressor-v2-hw-codec-vocab

lusoris commented May 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lusoris commented May 4, 2026

Summary

Validation

Scope

ADR-0108 deliverables

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants