feat(tools): vmaf-tune compare — multi-codec ranked output by lusoris · Pull Request #377 · lusoris/vmaf

lusoris · 2026-05-03T20:01:59Z

Summary

Implements Bucket #7 of the vmaf-tune capability audit (PR #354 / research-0061): codec-comparison mode. Given a single source and a target VMAF, vmaf-tune compare runs each codec's recommend predicate in a thread pool and emits a ranked (codec, best_crf, bitrate_kbps, encode_time_ms, vmaf_score) table sorted by smallest file — answering the perennial "should I migrate from x264 to SVT-AV1 yet?" question per source instead of per marketing deck.

New module tools/vmaf-tune/src/vmaftune/compare.py: predicate-driven orchestration + markdown / JSON / CSV renderers.
New CLI subcommand vmaf-tune compare --src REF.yuv --target-vmaf 92 --encoders libx264,libx265,libsvtav1,libaom,libvvenc --format markdown.
Pluggable predicate via --predicate-module MODULE:CALLABLE so the comparison ranking is exercised today; the real recommend backend lands with Phase B (target-VMAF bisect, ADR-0237).
13 mocked smoke tests cover ranking, parallel-vs-sequential parity, error capture, all three output formats, and the CLI smoke through --predicate-module. No ffmpeg / vmaf binaries required.

This PR ships only the orchestration layer — per the digest's effort note ("orchestration is trivial; effort lives in the per-codec adapters"). Default --encoders resolves to every adapter currently registered in codec_adapters/. Phase A wires libx264 only, so the canonical four / five codec invocation only ranks codecs whose adapter PRs have already merged. No new ADR — cites ADR-0237 (parent) + research-0061 Bucket #7.

Type

feat — new user-discoverable surface (vmaf-tune compare subcommand).

Checklist

Commits follow Conventional Commits.
Touched files lint-clean (ruff, isort, black, semgrep, markdownlint on the file's added section).
All 26 tests in tools/vmaf-tune/tests/ pass.

Bug-status hygiene

no state delta: new feature, no bug interaction.

Netflix golden-data gate

Did not modify any assertAlmostEqual score.

Deep-dive deliverables (ADR-0108)

Research digest — no digest needed: covered by Research-0061 Bucket build: CUDA 13 + oneAPI 2025.3 + clang-format 22 + black 26 (3/5) #7 (this PR is the implementation)
Decision matrix — no alternatives: predicate-driven orchestration is the only seam consistent with the codec-adapter discipline laid out in tools/vmaf-tune/AGENTS.md ("the codec-adapter contract is multi-codec from day one"). Branching on codec name inside compare.py was rejected on principle, not after weighing trade-offs.
AGENTS.md invariant note — tools/vmaf-tune/AGENTS.md updated with the predicate seam and COMPARE_ROW_KEYS contract.
Reproducer / smoke-test command — see below.
CHANGELOG fragment — changelog.d/added/T-VMAF-TUNE-compare-codecs.md.
Rebase note — docs/rebase-notes.md entry 0228.

Reproducer

# Unit tests (mocked predicate, no binaries required)
pytest tools/vmaf-tune/tests/test_compare.py -v

# CLI smoke against the placeholder predicate (every codec reports
# "Phase B pending" until the recommend backend lands).
PYTHONPATH=tools/vmaf-tune/src python -m vmaftune.cli compare \
    --src /tmp/ref.yuv --target-vmaf 92 --format markdown

# CLI smoke with a shim predicate (`--predicate-module`):
PYTHONPATH=tools/vmaf-tune/src python -m vmaftune.cli compare \
    --src /tmp/ref.yuv --target-vmaf 92 \
    --encoders libx264,libx265,libsvtav1,libaom \
    --format markdown \
    --predicate-module mypkg.shim:predicate

Out of scope

No real recommend backend — the per-codec target-VMAF bisect lands with Phase B (ADR-0237). Until then, the placeholder predicate reports "Phase B pending" for every codec and --predicate-module is the seam for downstream consumers and tests.
No new codec adapters — only libx264 is registered today. The four-codec / five-codec canonical invocation ranks the subset of codecs whose adapter PRs have already merged.
No encode-time normalisation across machines — the doc surfaces the caveat (cross-codec time comparisons require a fixed reference machine).

lusoris · 2026-05-05T13:39:06Z

Skipping for now — extensive cli.py conflicts after #355/#358/#369/#371 reshuffle. Needs rework on top of current master before re-promotion.

Copilot

Pull request overview

Adds a new vmaf-tune compare workflow to the fork’s tools/vmaf-tune/ harness, enabling multi-codec comparison at a target VMAF via a pluggable “recommend predicate” seam and emitting ranked results in multiple formats.

Changes:

Introduces vmaftune.compare orchestration (threaded execution + ranking) and report renderers (Markdown/JSON/CSV).
Extends vmaf-tune CLI with a compare subcommand, including --predicate-module to inject a recommend implementation.
Adds a mocked, subprocess-free test suite plus updates to docs / AGENTS invariants / changelog + rebase notes.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tools/vmaf-tune/src/vmaftune/compare.py	New comparison orchestration layer, ranking logic, and emitters for markdown/json/csv.
tools/vmaf-tune/src/vmaftune/cli.py	Adds `compare` subcommand and predicate-module resolver + execution path.
tools/vmaf-tune/tests/test_compare.py	Adds mocked smoke tests for ranking, formatting, error capture, and CLI integration via injected predicate.
tools/vmaf-tune/AGENTS.md	Documents the compare predicate seam + `COMPARE_ROW_KEYS` contract.
docs/usage/vmaf-tune.md	Documents `vmaf-tune compare`, flags, and output schema.
changelog.d/added/T-VMAF-TUNE-compare-codecs.md	Adds a changelog fragment for the new subcommand.
CHANGELOG.md	Regenerated Unreleased section to include the new fragment.
docs/rebase-notes.md	Adds rebase note entry for the new feature and its invariants.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+The predicate signature is::
+
+    predicate(codec: str, src: Path, target_vmaf: float) -> RecommendResult
+
+The shipped default predicate raises ``NotImplementedError``: the
+real recommend backend lands in Phase B (target-VMAF bisect, ADR-0237).
+Until then the CLI accepts a ``--predicate-module`` hook that points
+at any importable callable matching the signature above; this lets
+downstream consumers (and the test suite) drive ``compare`` against
+a shim today.


+    results: list[RecommendResult] = []
+    if parallel and len(encoders) > 1:
+        workers = max_workers if max_workers is not None else len(encoders)
+        with ThreadPoolExecutor(max_workers=workers) as pool:
+            futures = {pool.submit(pred, codec, src_path, target_vmaf): codec for codec in encoders}
+            for fut in as_completed(futures):
+                codec = futures[fut]
+                try:
+                    results.append(fut.result())
+                except Exception as exc:  # noqa: BLE001 — surface the error verbatim
+                    results.append(
+                        RecommendResult(
+                            codec=codec,
+                            best_crf=-1,
+                            bitrate_kbps=float("nan"),
+                            encode_time_ms=float("nan"),
+                            vmaf_score=float("nan"),
+                            ok=False,
+                            error=f"{type(exc).__name__}: {exc}",
+                        )
+                    )
+    else:
+        for codec in encoders:
+            try:
+                results.append(pred(codec, src_path, target_vmaf))
+            except Exception as exc:  # noqa: BLE001
+                results.append(
+                    RecommendResult(
+                        codec=codec,
+                        best_crf=-1,
+                        bitrate_kbps=float("nan"),
+                        encode_time_ms=float("nan"),
+                        vmaf_score=float("nan"),
+                        ok=False,
+                        error=f"{type(exc).__name__}: {exc}",
+                    )
+                )


+        return _fake_predicate(codec, src, target_vmaf)
+
+    shim.predicate = predicate  # type: ignore[attr-defined]
+    sys.modules["_compare_shim"] = shim


+  tests under `tools/vmaf-tune/tests/test_compare.py` (no `ffmpeg`,
+  no built `vmaf` required). Schema exported as
+  `vmaftune.compare.COMPARE_ROW_KEYS`. User docs:
+  [`docs/usage/vmaf-tune.md`](../docs/usage/vmaf-tune.md) §"Codec


+  tests under `tools/vmaf-tune/tests/test_compare.py` (no `ffmpeg`,
+  no built `vmaf` required). Schema exported as
+  `vmaftune.compare.COMPARE_ROW_KEYS`. User docs:
+  [`docs/usage/vmaf-tune.md`](../docs/usage/vmaf-tune.md) §"Codec
+  comparison".


Adds the `compare` subcommand to `vmaf-tune` (research-0061 Bucket #7, ADR-0237 Phase A follow-up). Given a single source and a target VMAF, runs each codec's recommend predicate in a thread pool and emits a ranked (codec, best_crf, bitrate_kbps, encode_time_ms, vmaf_score) table sorted by smallest file. Supports markdown / JSON / CSV output and `--output PATH`. The orchestration is pluggable via `--predicate-module MODULE:CALLABLE` so the comparison ranking is exercised today (the real recommend backend lands with Phase B's target-VMAF bisect). Default `--encoders` resolves to every adapter currently registered in `codec_adapters/` — Phase A wires libx264 only, so the canonical four / five codec invocation only ranks codecs whose adapters have already merged. 13 mocked smoke tests under `tools/vmaf-tune/tests/test_compare.py` cover ranking, parallel-vs-sequential parity, error capture, all three output formats, and the CLI smoke through `--predicate-module`. Six ADR-0108 deliverables: - Research digest: existing research-0061 Bucket #7 (no new digest; this PR is the implementation). - Decision matrix: no alternatives — predicate-driven orchestration is the only seam consistent with the codec-adapter discipline in tools/vmaf-tune/AGENTS.md. - AGENTS.md invariant note: tools/vmaf-tune/AGENTS.md updated. - Reproducer: pytest tools/vmaf-tune/tests/test_compare.py + the CLI smoke documented in docs/rebase-notes.md entry 0228. - CHANGELOG fragment: changelog.d/added/T-VMAF-TUNE-compare-codecs.md. - Rebase notes: docs/rebase-notes.md entry 0228. User docs: docs/usage/vmaf-tune.md §"Codec comparison". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lusoris marked this pull request as ready for review May 5, 2026 13:38

Copilot AI review requested due to automatic review settings May 5, 2026 13:38

lusoris marked this pull request as draft May 5, 2026 13:39

Copilot started reviewing on behalf of lusoris May 5, 2026 13:44 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

lusoris marked this pull request as ready for review May 5, 2026 13:56

lusoris force-pushed the feat/vmaf-tune-compare-codecs branch 2 times, most recently from a5bea68 to 74a7e32 Compare May 5, 2026 13:58

lusoris force-pushed the feat/vmaf-tune-compare-codecs branch from 74a7e32 to 8a486d2 Compare May 5, 2026 13:59

Lusoris added 2 commits May 5, 2026 16:07

chore: re-trigger CI after research-digest opt-out

2670f64

chore: re-trigger CI after research-digest opt-out

5fb1e6e

lusoris merged commit 8f41e08 into master May 5, 2026
54 checks passed

lusoris deleted the feat/vmaf-tune-compare-codecs branch May 5, 2026 14:32

lusoris restored the feat/vmaf-tune-compare-codecs branch May 6, 2026 17:42

This was referenced May 6, 2026

feat/vmaf tune hdr aware #434

Merged

feat/vmaf tune score backend vulkan #436

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(tools): vmaf-tune compare — multi-codec ranked output#377

feat(tools): vmaf-tune compare — multi-codec ranked output#377
lusoris merged 3 commits intomasterfrom
feat/vmaf-tune-compare-codecs

lusoris commented May 3, 2026 •

edited

Loading

Uh oh!

lusoris commented May 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lusoris commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type

Checklist

Bug-status hygiene

Netflix golden-data gate

Deep-dive deliverables (ADR-0108)

Reproducer

Out of scope

Uh oh!

lusoris commented May 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lusoris commented May 3, 2026 •

edited

Loading