feat(ai,tools): hw_encoder_corpus.py — Phase A real-corpus runner#392
Merged
feat(ai,tools): hw_encoder_corpus.py — Phase A real-corpus runner#392
Conversation
Encodes a raw YUV with NVENC / QSV / VAAPI / libx264 at a CRF/CQ grid, decodes back to raw YUV, scores with libvmaf (CUDA backend), and emits one JSONL row per (source, encoder, cq, frame) carrying canonical-6 features + per-frame VMAF + encode metadata. Local smoke produced 33,840 per-frame rows in ~5 min wall time across 9 Netflix refs x 6 hardware codecs (h264/hevc/av1 on both NVIDIA NVENC and Intel Arc QSV) x 4 CQ values. Companion doc docs/development/intel-arc-vaapi-driver-priority.md captures the LIBVA_DRIVER_NAME=iHD gotcha for multi-card hosts. Output landing in runs/phase_a/ is gitignored — rerun to reproduce. Co-Authored-By: Claude <noreply@anthropic.com>
6 tasks
There was a problem hiding this comment.
Pull request overview
Adds a fork-local “Phase A real-corpus runner” that generates per-frame canonical-6 + VMAF + encoder metadata rows (JSONL) by encoding/decoding via ffmpeg (NVENC/QSV/VAAPI/libx264) and scoring via libvmaf’s CUDA path, plus accompanying documentation and changelog notes.
Changes:
- Introduces
scripts/dev/hw_encoder_corpus.pyto produce per-frame training rows forfr_regressor_v2. - Documents the Intel Arc multi-card VAAPI/QSV driver-selection pitfall (
LIBVA_DRIVER_NAME=iHD). - Records the change in rebase notes and adds a changelog fragment.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| scripts/dev/hw_encoder_corpus.py | New dev script to run hw-encode → decode → CUDA-VMAF scoring and emit per-frame JSONL rows. |
| docs/rebase-notes.md | Adds a new rebase-note entry for the runner (but includes apparent paste/formatting issues). |
| docs/development/intel-arc-vaapi-driver-priority.md | New doc explaining the LIBVA_DRIVER_NAME=iHD workaround on multi-GPU systems. |
| changelog.d/added/hw-encoder-corpus-runner.md | Changelog fragment announcing the new runner + companion doc. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+63
to
+108
| pre_args: list[str] = ["ffmpeg", "-y", "-loglevel", "error"] | ||
| if encoder.endswith("_qsv") and qsv_device is not None: | ||
| pre_args += [ | ||
| "-init_hw_device", | ||
| f"qsv:hw,child_device={qsv_device}", | ||
| ] | ||
| if encoder.endswith("_vaapi") and vaapi_device is not None: | ||
| pre_args += [ | ||
| "-init_hw_device", | ||
| f"vaapi=va:{vaapi_device}", | ||
| ] | ||
| pre_args += [ | ||
| "-f", | ||
| "rawvideo", | ||
| "-pix_fmt", | ||
| pix_fmt, | ||
| "-s", | ||
| f"{width}x{height}", | ||
| "-r", | ||
| str(framerate), | ||
| "-i", | ||
| str(source), | ||
| ] | ||
| if encoder.endswith("_nvenc"): | ||
| post = ["-c:v", encoder, "-cq", str(cq), "-preset", "p4"] | ||
| elif encoder.endswith("_qsv"): | ||
| post = ["-c:v", encoder, "-global_quality", str(cq), "-preset", "medium"] | ||
| elif encoder.endswith("_vaapi"): | ||
| post = [ | ||
| "-vf", | ||
| "format=nv12,hwupload=extra_hw_frames=16", | ||
| "-c:v", | ||
| encoder, | ||
| "-qp", | ||
| str(cq), | ||
| ] | ||
| else: | ||
| # CPU fallback (libx264) — the corpus may want a CPU baseline row. | ||
| post = ["-c:v", encoder, "-crf", str(cq), "-preset", "medium"] | ||
| if extra: | ||
| post += extra | ||
| cmd = pre_args + post + [str(out_mp4)] | ||
|
|
||
| t0 = time.monotonic() | ||
| p = subprocess.run(cmd, capture_output=True, text=True, check=False) # noqa: S603 | ||
| elapsed_ms = (time.monotonic() - t0) * 1000.0 |
Comment on lines
+184
to
+202
| for fr in payload.get("frames", []): | ||
| m = fr.get("metrics", {}) | ||
| if not all(k in m for k in CANONICAL_6): | ||
| continue | ||
| row = { | ||
| "src": src, | ||
| "encoder": encoder, | ||
| "cq": cq, | ||
| "enc_bytes": enc_bytes, | ||
| "enc_time_ms": enc_time_ms, | ||
| "frame_index": fr.get("frameNum", len(rows)), | ||
| "vmaf": m.get("vmaf"), | ||
| "adm2": m["integer_adm2"], | ||
| "vif_scale0": m["integer_vif_scale0"], | ||
| "vif_scale1": m["integer_vif_scale1"], | ||
| "vif_scale2": m["integer_vif_scale2"], | ||
| "vif_scale3": m["integer_vif_scale3"], | ||
| "motion2": m["integer_motion2"], | ||
| } |
Comment on lines
+242
to
+245
| src_stem = args.source.stem | ||
| written = 0 | ||
| with args.out.open("a", encoding="utf-8") as fh: | ||
| for cq in args.cq: |
Comment on lines
+261
to
+287
| if rc != 0 or sz == 0: | ||
| print( | ||
| f"[skip] {src_stem} {args.encoder} cq{cq}: encode rc={rc}", file=sys.stderr | ||
| ) | ||
| continue | ||
| yuv = td_path / f"{src_stem}_{args.encoder}_cq{cq}.yuv" | ||
| if decode_to_raw(mp4, yuv, args.pix_fmt) != 0 or not yuv.exists(): | ||
| print( | ||
| f"[skip] {src_stem} {args.encoder} cq{cq}: decode failed", file=sys.stderr | ||
| ) | ||
| continue | ||
| json_out = td_path / "vmaf.json" | ||
| if ( | ||
| score_cuda( | ||
| args.vmaf_bin, | ||
| args.source, | ||
| yuv, | ||
| args.width, | ||
| args.height, | ||
| args.pix_fmt, | ||
| json_out, | ||
| ) | ||
| != 0 | ||
| or not json_out.exists() | ||
| ): | ||
| print(f"[skip] {src_stem} {args.encoder} cq{cq}: score failed", file=sys.stderr) | ||
| continue |
|
|
||
| - **Touches**: | ||
| - `libvmaf/src/vulkan/kernel_template.h` — fork-local; new | ||
| - `libvmaf/src/vulkan/kernel_template.h` — fork-local. Output landing in `runs/phase_a/` is gitignored — rerun the script to reproduce. |
Comment on lines
+7172
to
+7175
| - **Touches**: new `scripts/dev/hw_encoder_corpus.py` (no existing | ||
| caller; opt-in tooling). Output landing in `runs/phase_a/` is gitignored — rerun the script to reproduce. | ||
| `docs/development/intel-arc-vaapi-driver-priority.md`. Output landing in `runs/phase_a/` is gitignored — rerun the script to reproduce. | ||
| stratified sample, 58 KiB). |
Comment on lines
+7176
to
+7182
| - **Invariant**: the script's QSV path forces | ||
| `env['LIBVA_DRIVER_NAME']='iHD'` (set by the calling shell, not | ||
| inside the script) when targeting `/dev/dri/renderD129` on a | ||
| multi-card host that has NVIDIA's libva-driver-nvidia shim | ||
| installed. Without that, libva picks up NVIDIA's NVDEC-VAAPI | ||
| translation and the MFX session handshake fails with -9. See the | ||
| companion doc for the failure mode + fix. |
Comment on lines
+30
to
+32
| VAAPI encoders against the Arc card. The fork's | ||
| [`scripts/dev/hw_encoder_corpus.py`](../../scripts/dev/hw_encoder_corpus.py) | ||
| forces this in its QSV invocation path. |
This was referenced May 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
fr_regressor_v2needs per-frame canonical-6 + VMAF + codec metadata to actually train. The existingvmaf-tune corpusCLI (ADR-0237, Phase A) emits only pooled VMAF — a scope-cut.This PR ships
scripts/dev/hw_encoder_corpus.py: encodes a raw YUV with NVENC / QSV / VAAPI / libx264 at a CRF/CQ grid, decodes back to raw YUV, scores with libvmaf (CUDA backend), and emits one JSONL row per(source, encoder, cq, frame)carryinginteger_adm2,integer_vif_scale0..3,integer_motion2(canonical-6) + per-frame VMAF +encoder/cq/enc_bytes/enc_time_ms.Smoke evidence
Local run on this fork (RTX 4090 + Arc A380):
33,840 rows in ~5 min wall — both pipelines run concurrently because NVENC and Arc QSV use different hardware engines on different cards. Output lands in
runs/phase_a/(gitignored); rerun to reproduce.Companion doc
docs/development/intel-arc-vaapi-driver-priority.mdcaptures theLIBVA_DRIVER_NAME=iHDgotcha for multi-card hosts. NVIDIA'slibva-driver-nvidiashim shadows Intel's iHD by default; QSV calls against/dev/dri/renderD129(Arc) silently route through NVIDIA's translator and MFX session fails with-9. ForcingLIBVA_DRIVER_NAME=iHDfixes it.ADR-0108 deliverables
no digest needed: tooling, not a research result; ADR-0237 + future fr_regressor_v2 docs cover the why
no alternatives: only-one-way fix —
vmaf-tune corpus's pooled-only schema can't be extended in-place without breaking its row contractdocs/rebase-notes.mdentry 0234 documents the LIBVA_DRIVER_NAME=iHD invariantdocs/rebase-notes.md0234changelog.d/added/hw-encoder-corpus-runner.mddocs/rebase-notes.mdentry 0234Test plan
🤖 Generated with Claude Code