feat(tools): vmaf-tune compare — multi-codec ranked output#377
Merged
feat(tools): vmaf-tune compare — multi-codec ranked output#377
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new vmaf-tune compare workflow to the fork’s tools/vmaf-tune/ harness, enabling multi-codec comparison at a target VMAF via a pluggable “recommend predicate” seam and emitting ranked results in multiple formats.
Changes:
- Introduces
vmaftune.compareorchestration (threaded execution + ranking) and report renderers (Markdown/JSON/CSV). - Extends
vmaf-tuneCLI with acomparesubcommand, including--predicate-moduleto inject a recommend implementation. - Adds a mocked, subprocess-free test suite plus updates to docs / AGENTS invariants / changelog + rebase notes.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/vmaf-tune/src/vmaftune/compare.py | New comparison orchestration layer, ranking logic, and emitters for markdown/json/csv. |
| tools/vmaf-tune/src/vmaftune/cli.py | Adds compare subcommand and predicate-module resolver + execution path. |
| tools/vmaf-tune/tests/test_compare.py | Adds mocked smoke tests for ranking, formatting, error capture, and CLI integration via injected predicate. |
| tools/vmaf-tune/AGENTS.md | Documents the compare predicate seam + COMPARE_ROW_KEYS contract. |
| docs/usage/vmaf-tune.md | Documents vmaf-tune compare, flags, and output schema. |
| changelog.d/added/T-VMAF-TUNE-compare-codecs.md | Adds a changelog fragment for the new subcommand. |
| CHANGELOG.md | Regenerated Unreleased section to include the new fragment. |
| docs/rebase-notes.md | Adds rebase note entry for the new feature and its invariants. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+16
to
+25
| The predicate signature is:: | ||
|
|
||
| predicate(codec: str, src: Path, target_vmaf: float) -> RecommendResult | ||
|
|
||
| The shipped default predicate raises ``NotImplementedError``: the | ||
| real recommend backend lands in Phase B (target-VMAF bisect, ADR-0237). | ||
| Until then the CLI accepts a ``--predicate-module`` hook that points | ||
| at any importable callable matching the signature above; this lets | ||
| downstream consumers (and the test suite) drive ``compare`` against | ||
| a shim today. |
Comment on lines
+173
to
+209
| results: list[RecommendResult] = [] | ||
| if parallel and len(encoders) > 1: | ||
| workers = max_workers if max_workers is not None else len(encoders) | ||
| with ThreadPoolExecutor(max_workers=workers) as pool: | ||
| futures = {pool.submit(pred, codec, src_path, target_vmaf): codec for codec in encoders} | ||
| for fut in as_completed(futures): | ||
| codec = futures[fut] | ||
| try: | ||
| results.append(fut.result()) | ||
| except Exception as exc: # noqa: BLE001 — surface the error verbatim | ||
| results.append( | ||
| RecommendResult( | ||
| codec=codec, | ||
| best_crf=-1, | ||
| bitrate_kbps=float("nan"), | ||
| encode_time_ms=float("nan"), | ||
| vmaf_score=float("nan"), | ||
| ok=False, | ||
| error=f"{type(exc).__name__}: {exc}", | ||
| ) | ||
| ) | ||
| else: | ||
| for codec in encoders: | ||
| try: | ||
| results.append(pred(codec, src_path, target_vmaf)) | ||
| except Exception as exc: # noqa: BLE001 | ||
| results.append( | ||
| RecommendResult( | ||
| codec=codec, | ||
| best_crf=-1, | ||
| bitrate_kbps=float("nan"), | ||
| encode_time_ms=float("nan"), | ||
| vmaf_score=float("nan"), | ||
| ok=False, | ||
| error=f"{type(exc).__name__}: {exc}", | ||
| ) | ||
| ) |
| return _fake_predicate(codec, src, target_vmaf) | ||
|
|
||
| shim.predicate = predicate # type: ignore[attr-defined] | ||
| sys.modules["_compare_shim"] = shim |
| tests under `tools/vmaf-tune/tests/test_compare.py` (no `ffmpeg`, | ||
| no built `vmaf` required). Schema exported as | ||
| `vmaftune.compare.COMPARE_ROW_KEYS`. User docs: | ||
| [`docs/usage/vmaf-tune.md`](../docs/usage/vmaf-tune.md) §"Codec |
Comment on lines
+3162
to
+3166
| tests under `tools/vmaf-tune/tests/test_compare.py` (no `ffmpeg`, | ||
| no built `vmaf` required). Schema exported as | ||
| `vmaftune.compare.COMPARE_ROW_KEYS`. User docs: | ||
| [`docs/usage/vmaf-tune.md`](../docs/usage/vmaf-tune.md) §"Codec | ||
| comparison". |
a5bea68 to
74a7e32
Compare
Adds the `compare` subcommand to `vmaf-tune` (research-0061 Bucket #7, ADR-0237 Phase A follow-up). Given a single source and a target VMAF, runs each codec's recommend predicate in a thread pool and emits a ranked (codec, best_crf, bitrate_kbps, encode_time_ms, vmaf_score) table sorted by smallest file. Supports markdown / JSON / CSV output and `--output PATH`. The orchestration is pluggable via `--predicate-module MODULE:CALLABLE` so the comparison ranking is exercised today (the real recommend backend lands with Phase B's target-VMAF bisect). Default `--encoders` resolves to every adapter currently registered in `codec_adapters/` — Phase A wires libx264 only, so the canonical four / five codec invocation only ranks codecs whose adapters have already merged. 13 mocked smoke tests under `tools/vmaf-tune/tests/test_compare.py` cover ranking, parallel-vs-sequential parity, error capture, all three output formats, and the CLI smoke through `--predicate-module`. Six ADR-0108 deliverables: - Research digest: existing research-0061 Bucket #7 (no new digest; this PR is the implementation). - Decision matrix: no alternatives — predicate-driven orchestration is the only seam consistent with the codec-adapter discipline in tools/vmaf-tune/AGENTS.md. - AGENTS.md invariant note: tools/vmaf-tune/AGENTS.md updated. - Reproducer: pytest tools/vmaf-tune/tests/test_compare.py + the CLI smoke documented in docs/rebase-notes.md entry 0228. - CHANGELOG fragment: changelog.d/added/T-VMAF-TUNE-compare-codecs.md. - Rebase notes: docs/rebase-notes.md entry 0228. User docs: docs/usage/vmaf-tune.md §"Codec comparison". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
74a7e32 to
8a486d2
Compare
This was referenced May 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements Bucket #7 of the
vmaf-tunecapability audit (PR #354 / research-0061): codec-comparison mode. Given a single source and a target VMAF,vmaf-tune compareruns each codec's recommend predicate in a thread pool and emits a ranked(codec, best_crf, bitrate_kbps, encode_time_ms, vmaf_score)table sorted by smallest file — answering the perennial "should I migrate from x264 to SVT-AV1 yet?" question per source instead of per marketing deck.tools/vmaf-tune/src/vmaftune/compare.py: predicate-driven orchestration + markdown / JSON / CSV renderers.vmaf-tune compare --src REF.yuv --target-vmaf 92 --encoders libx264,libx265,libsvtav1,libaom,libvvenc --format markdown.--predicate-module MODULE:CALLABLEso the comparison ranking is exercised today; the real recommend backend lands with Phase B (target-VMAF bisect, ADR-0237).--predicate-module. Noffmpeg/vmafbinaries required.This PR ships only the orchestration layer — per the digest's effort note ("orchestration is trivial; effort lives in the per-codec adapters"). Default
--encodersresolves to every adapter currently registered incodec_adapters/. Phase A wireslibx264only, so the canonical four / five codec invocation only ranks codecs whose adapter PRs have already merged. No new ADR — cites ADR-0237 (parent) + research-0061 Bucket #7.Type
feat— new user-discoverable surface (vmaf-tune comparesubcommand).Checklist
tools/vmaf-tune/tests/pass.Bug-status hygiene
Netflix golden-data gate
assertAlmostEqualscore.Deep-dive deliverables (ADR-0108)
tools/vmaf-tune/AGENTS.md("the codec-adapter contract is multi-codec from day one"). Branching on codec name insidecompare.pywas rejected on principle, not after weighing trade-offs.AGENTS.mdinvariant note —tools/vmaf-tune/AGENTS.mdupdated with the predicate seam andCOMPARE_ROW_KEYScontract.changelog.d/added/T-VMAF-TUNE-compare-codecs.md.docs/rebase-notes.mdentry 0228.Reproducer
Out of scope
--predicate-moduleis the seam for downstream consumers and tests.libx264is registered today. The four-codec / five-codec canonical invocation ranks the subset of codecs whose adapter PRs have already merged.