Skip to content

feat(tools): vmaf-tune compare — multi-codec ranked output#377

Merged
lusoris merged 3 commits intomasterfrom
feat/vmaf-tune-compare-codecs
May 5, 2026
Merged

feat(tools): vmaf-tune compare — multi-codec ranked output#377
lusoris merged 3 commits intomasterfrom
feat/vmaf-tune-compare-codecs

Conversation

@lusoris
Copy link
Copy Markdown
Owner

@lusoris lusoris commented May 3, 2026

Summary

Implements Bucket #7 of the vmaf-tune capability audit (PR #354 / research-0061): codec-comparison mode. Given a single source and a target VMAF, vmaf-tune compare runs each codec's recommend predicate in a thread pool and emits a ranked (codec, best_crf, bitrate_kbps, encode_time_ms, vmaf_score) table sorted by smallest file — answering the perennial "should I migrate from x264 to SVT-AV1 yet?" question per source instead of per marketing deck.

  • New module tools/vmaf-tune/src/vmaftune/compare.py: predicate-driven orchestration + markdown / JSON / CSV renderers.
  • New CLI subcommand vmaf-tune compare --src REF.yuv --target-vmaf 92 --encoders libx264,libx265,libsvtav1,libaom,libvvenc --format markdown.
  • Pluggable predicate via --predicate-module MODULE:CALLABLE so the comparison ranking is exercised today; the real recommend backend lands with Phase B (target-VMAF bisect, ADR-0237).
  • 13 mocked smoke tests cover ranking, parallel-vs-sequential parity, error capture, all three output formats, and the CLI smoke through --predicate-module. No ffmpeg / vmaf binaries required.

This PR ships only the orchestration layer — per the digest's effort note ("orchestration is trivial; effort lives in the per-codec adapters"). Default --encoders resolves to every adapter currently registered in codec_adapters/. Phase A wires libx264 only, so the canonical four / five codec invocation only ranks codecs whose adapter PRs have already merged. No new ADR — cites ADR-0237 (parent) + research-0061 Bucket #7.

Type

  • feat — new user-discoverable surface (vmaf-tune compare subcommand).

Checklist

  • Commits follow Conventional Commits.
  • Touched files lint-clean (ruff, isort, black, semgrep, markdownlint on the file's added section).
  • All 26 tests in tools/vmaf-tune/tests/ pass.

Bug-status hygiene

  • no state delta: new feature, no bug interaction.

Netflix golden-data gate

  • Did not modify any assertAlmostEqual score.

Deep-dive deliverables (ADR-0108)

  • Research digest — no digest needed: covered by Research-0061 Bucket build: CUDA 13 + oneAPI 2025.3 + clang-format 22 + black 26 (3/5) #7 (this PR is the implementation)
  • Decision matrix — no alternatives: predicate-driven orchestration is the only seam consistent with the codec-adapter discipline laid out in tools/vmaf-tune/AGENTS.md ("the codec-adapter contract is multi-codec from day one"). Branching on codec name inside compare.py was rejected on principle, not after weighing trade-offs.
  • AGENTS.md invariant notetools/vmaf-tune/AGENTS.md updated with the predicate seam and COMPARE_ROW_KEYS contract.
  • Reproducer / smoke-test command — see below.
  • CHANGELOG fragmentchangelog.d/added/T-VMAF-TUNE-compare-codecs.md.
  • Rebase notedocs/rebase-notes.md entry 0228.

Reproducer

# Unit tests (mocked predicate, no binaries required)
pytest tools/vmaf-tune/tests/test_compare.py -v

# CLI smoke against the placeholder predicate (every codec reports
# "Phase B pending" until the recommend backend lands).
PYTHONPATH=tools/vmaf-tune/src python -m vmaftune.cli compare \
    --src /tmp/ref.yuv --target-vmaf 92 --format markdown

# CLI smoke with a shim predicate (`--predicate-module`):
PYTHONPATH=tools/vmaf-tune/src python -m vmaftune.cli compare \
    --src /tmp/ref.yuv --target-vmaf 92 \
    --encoders libx264,libx265,libsvtav1,libaom \
    --format markdown \
    --predicate-module mypkg.shim:predicate

Out of scope

  • No real recommend backend — the per-codec target-VMAF bisect lands with Phase B (ADR-0237). Until then, the placeholder predicate reports "Phase B pending" for every codec and --predicate-module is the seam for downstream consumers and tests.
  • No new codec adapters — only libx264 is registered today. The four-codec / five-codec canonical invocation ranks the subset of codecs whose adapter PRs have already merged.
  • No encode-time normalisation across machines — the doc surfaces the caveat (cross-codec time comparisons require a fixed reference machine).

@lusoris lusoris marked this pull request as ready for review May 5, 2026 13:38
Copilot AI review requested due to automatic review settings May 5, 2026 13:38
@lusoris lusoris marked this pull request as draft May 5, 2026 13:39
@lusoris
Copy link
Copy Markdown
Owner Author

lusoris commented May 5, 2026

Skipping for now — extensive cli.py conflicts after #355/#358/#369/#371 reshuffle. Needs rework on top of current master before re-promotion.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new vmaf-tune compare workflow to the fork’s tools/vmaf-tune/ harness, enabling multi-codec comparison at a target VMAF via a pluggable “recommend predicate” seam and emitting ranked results in multiple formats.

Changes:

  • Introduces vmaftune.compare orchestration (threaded execution + ranking) and report renderers (Markdown/JSON/CSV).
  • Extends vmaf-tune CLI with a compare subcommand, including --predicate-module to inject a recommend implementation.
  • Adds a mocked, subprocess-free test suite plus updates to docs / AGENTS invariants / changelog + rebase notes.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tools/vmaf-tune/src/vmaftune/compare.py New comparison orchestration layer, ranking logic, and emitters for markdown/json/csv.
tools/vmaf-tune/src/vmaftune/cli.py Adds compare subcommand and predicate-module resolver + execution path.
tools/vmaf-tune/tests/test_compare.py Adds mocked smoke tests for ranking, formatting, error capture, and CLI integration via injected predicate.
tools/vmaf-tune/AGENTS.md Documents the compare predicate seam + COMPARE_ROW_KEYS contract.
docs/usage/vmaf-tune.md Documents vmaf-tune compare, flags, and output schema.
changelog.d/added/T-VMAF-TUNE-compare-codecs.md Adds a changelog fragment for the new subcommand.
CHANGELOG.md Regenerated Unreleased section to include the new fragment.
docs/rebase-notes.md Adds rebase note entry for the new feature and its invariants.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +16 to +25
The predicate signature is::

predicate(codec: str, src: Path, target_vmaf: float) -> RecommendResult

The shipped default predicate raises ``NotImplementedError``: the
real recommend backend lands in Phase B (target-VMAF bisect, ADR-0237).
Until then the CLI accepts a ``--predicate-module`` hook that points
at any importable callable matching the signature above; this lets
downstream consumers (and the test suite) drive ``compare`` against
a shim today.
Comment on lines +173 to +209
results: list[RecommendResult] = []
if parallel and len(encoders) > 1:
workers = max_workers if max_workers is not None else len(encoders)
with ThreadPoolExecutor(max_workers=workers) as pool:
futures = {pool.submit(pred, codec, src_path, target_vmaf): codec for codec in encoders}
for fut in as_completed(futures):
codec = futures[fut]
try:
results.append(fut.result())
except Exception as exc: # noqa: BLE001 — surface the error verbatim
results.append(
RecommendResult(
codec=codec,
best_crf=-1,
bitrate_kbps=float("nan"),
encode_time_ms=float("nan"),
vmaf_score=float("nan"),
ok=False,
error=f"{type(exc).__name__}: {exc}",
)
)
else:
for codec in encoders:
try:
results.append(pred(codec, src_path, target_vmaf))
except Exception as exc: # noqa: BLE001
results.append(
RecommendResult(
codec=codec,
best_crf=-1,
bitrate_kbps=float("nan"),
encode_time_ms=float("nan"),
vmaf_score=float("nan"),
ok=False,
error=f"{type(exc).__name__}: {exc}",
)
)
return _fake_predicate(codec, src, target_vmaf)

shim.predicate = predicate # type: ignore[attr-defined]
sys.modules["_compare_shim"] = shim
tests under `tools/vmaf-tune/tests/test_compare.py` (no `ffmpeg`,
no built `vmaf` required). Schema exported as
`vmaftune.compare.COMPARE_ROW_KEYS`. User docs:
[`docs/usage/vmaf-tune.md`](../docs/usage/vmaf-tune.md) §"Codec
Comment thread CHANGELOG.md
Comment on lines +3162 to +3166
tests under `tools/vmaf-tune/tests/test_compare.py` (no `ffmpeg`,
no built `vmaf` required). Schema exported as
`vmaftune.compare.COMPARE_ROW_KEYS`. User docs:
[`docs/usage/vmaf-tune.md`](../docs/usage/vmaf-tune.md) §"Codec
comparison".
@lusoris lusoris marked this pull request as ready for review May 5, 2026 13:56
@lusoris lusoris force-pushed the feat/vmaf-tune-compare-codecs branch 2 times, most recently from a5bea68 to 74a7e32 Compare May 5, 2026 13:58
Adds the `compare` subcommand to `vmaf-tune` (research-0061 Bucket #7,
ADR-0237 Phase A follow-up). Given a single source and a target VMAF,
runs each codec's recommend predicate in a thread pool and emits a
ranked (codec, best_crf, bitrate_kbps, encode_time_ms, vmaf_score)
table sorted by smallest file. Supports markdown / JSON / CSV output
and `--output PATH`.

The orchestration is pluggable via `--predicate-module MODULE:CALLABLE`
so the comparison ranking is exercised today (the real recommend
backend lands with Phase B's target-VMAF bisect). Default `--encoders`
resolves to every adapter currently registered in `codec_adapters/` —
Phase A wires libx264 only, so the canonical four / five codec
invocation only ranks codecs whose adapters have already merged.

13 mocked smoke tests under `tools/vmaf-tune/tests/test_compare.py`
cover ranking, parallel-vs-sequential parity, error capture, all three
output formats, and the CLI smoke through `--predicate-module`.

Six ADR-0108 deliverables:
- Research digest: existing research-0061 Bucket #7 (no new digest;
  this PR is the implementation).
- Decision matrix: no alternatives — predicate-driven orchestration is
  the only seam consistent with the codec-adapter discipline in
  tools/vmaf-tune/AGENTS.md.
- AGENTS.md invariant note: tools/vmaf-tune/AGENTS.md updated.
- Reproducer: pytest tools/vmaf-tune/tests/test_compare.py + the CLI
  smoke documented in docs/rebase-notes.md entry 0228.
- CHANGELOG fragment: changelog.d/added/T-VMAF-TUNE-compare-codecs.md.
- Rebase notes: docs/rebase-notes.md entry 0228.

User docs: docs/usage/vmaf-tune.md §"Codec comparison".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lusoris lusoris force-pushed the feat/vmaf-tune-compare-codecs branch from 74a7e32 to 8a486d2 Compare May 5, 2026 13:59
@lusoris lusoris merged commit 8f41e08 into master May 5, 2026
54 checks passed
@lusoris lusoris deleted the feat/vmaf-tune-compare-codecs branch May 5, 2026 14:32
@lusoris lusoris restored the feat/vmaf-tune-compare-codecs branch May 6, 2026 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants