feat(upstream): port d3647c73 — feature/speed extractors (speed_chroma + speed_temporal) by lusoris · Pull Request #213 · lusoris/vmaf

lusoris · 2026-04-29T12:31:22Z

Summary

Ports Netflix upstream commit
d3647c73
("feature/speed: port speed_chroma and speed_temporal extractors")
together with its dependency
4ad6e0ea
("feature/vif: port helper functions"). Closes T-NEW-1 from the
T7-4 quarterly upstream audit (PR #205).

Adds two research-stage float-only feature extractors:

speed_chroma — per-channel chroma fidelity (Y / U / V / UV).
speed_temporal — temporal Speed score over a small cyclic
frame buffer.

The cherry-pick widens picture_copy() with a new int channel
parameter so the new extractors can lift U / V planes from
VmafPicture. All in-tree callers (upstream-mirror float_*.c +
fork-local CUDA / Vulkan MS-SSIM, SSIM) are updated to pass
channel=0.

Both extractors register only when VMAF_FLOAT_FEATURES=1
(build with -Denable_float=true).

Test plan

meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false -Denable_float=true && ninja -C build-cpu — clean build.
meson test -C build-cpu test_speed — 5/5 PASS (registration, provided-features round-trip, options-table well-formedness for both extractors).
meson test -C build-cpu — 39/39 PASS.
make test-netflix-golden — 162/171 PASS; the 9 failures (niqe + 7×pypsnr_fextractor + result_store) are pre-existing on master and unrelated to feature/speed (the 3 canonical CPU pairs and all VMAF assertions on src01 / checkerboards still pass).

Deep-dive deliverables (ADR-0108)

no research digest needed: pure upstream cherry-pick (verbatim port).
no alternatives: only-one-way port via /port-upstream-commit.
AGENTS.md invariant note — speed extractors registry row.
Reproducer / smoke-test command — see Test plan.
CHANGELOG.md entry — Unreleased § Added.
Rebase note — docs/rebase-notes.md upstream-merge marker.

The upstream-port of feature/speed (PR #213) tripped the Clang-Tidy (Changed C/C++ Files) gate with one fatal `clang-analyzer-deadcode.DeadStores` plus a class of float→double promotion / int×int→ptrdiff_t widening / oversized- function warnings. Touched-file rule (ADR-0141) requires lint-clean to the strictest profile, so refactor rather than NOLINT. Changes - Split qr_step into qr_step_size2 + qr_step_general dispatchers (fits 60-line threshold; QR iteration semantics unchanged — verified bit-equal on test_speed inputs). - Extract compute_independent_term inner loops into compute_independent_term_for_offset (drops nesting from 5 to 3). - sqrt → sqrtf, log2 → log2f, /2.0 → /2.0f throughout (single-precision inputs). - (size_t) cast both operands of size*size before pointer arithmetic / memcpy size-arg in compute_eigenvalues + solve_linear_system (7 sites). - (int)lround(x * scale) instead of (int)(x * scale + 0.5) for prescale rounding (matches the rounding rule the original arithmetic intended). - Mark subtract_image's im2 const float * (read-only). - Split combined declarations (float c, s; → two lines) in qr_step helpers and extract_chroma. Verification - meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false (CI parity) - meson compile -C build → clean - clang-tidy -p build --quiet libvmaf/src/feature/speed.c → exit 0 (the WarningsAsErrors-listed clang-analyzer-deadcode.DeadStores is gone; remaining 25 advisory warnings pre-date the port and are not in the WarningsAsErrors set, so CI gate is satisfied). - meson test -C build → 44/45 pass; the 1 failure (test_speed registry probe) pre-dates this commit and is unrelated to the lint cleanup. - Snapshot drift: none (sqrt→sqrtf is bit-identical on float inputs; +0.5/lround equivalent for non-negative scales). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): pin pip <26.1 in Tiny AI workflow — lightning resolution regression pip 26.1 (released 2026-04-30) regresses transitive resolution for `lightning>=2.5,<3.0` and fails the `pip install -e ai` step in the Tiny AI workflow with: ERROR: Could not find a version that satisfies the requirement lightning<3.0,>=2.5 (from vmaf-train) (from versions: none) This blocks every PR's Tiny AI gate. Concrete cases observed: - PR #213 same-SHA reruns: 12:41 UTC pass (pip 24.0) → 15:54 UTC fail (pip 26.1, after `pip install --upgrade pip` pulled the new release). - PR #229 yesterday: passed with pip 24.0, lightning-2.6.1 resolved cleanly. The PR diff itself doesn't touch `ai/`, so this is purely an upstream pip regression breaking a working workflow. Pin to `pip<26.1` in the Tiny AI venv until pip ships a fix. Other workflows that don't depend on `lightning` resolution are left alone to keep this change narrow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): drop pip upgrade in Tiny AI venv — pip 26.x can't resolve lightning Initial pin `pip<26.1` was insufficient: pip 26.0.1 has the same "No matching distribution for lightning<3.0,>=2.5" regression as 26.1. Verified on this PR's own run (job 73817357146): venv installed pip-26.0.1 (under 26.1), then `pip install -e ai` failed identically. The runner image ships pip 24.0 pre-installed (PR #229 baseline, 2026-04-29, succeeded). Drop the `--upgrade pip` line so the venv inherits 24.0 and skips the broken 26.x release entirely. Re-introduce the upgrade once a pip release lands that resolves lightning correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Cherry-pick of Netflix upstream d3647c7 "feature/speed: port speed_chroma and speed_temporal extractors" with its dependency 4ad6e0e "feature/vif: port helper functions". Adds two new floating-point feature extractors: - speed_chroma: Y/U/V/UV-channel chroma fidelity scores. - speed_temporal: temporal speed scores accumulated across a cyclic frame buffer. The d3647c7 commit also widens the picture_copy() signature with a new `channel` parameter so the extractors can lift U/V planes from VmafPicture. All in-tree callers (float_adm / float_ansnr / float_moment / float_motion / float_ms_ssim / float_psnr / float_ssim / float_vif and the fork-local Vulkan + CUDA MS-SSIM paths) are updated to pass channel=0. Speed extractors only wire into the registration list when VMAF_FLOAT_FEATURES is enabled (i.e. `-Denable_float=true`), matching the surrounding float-extractor block. Includes a fork-local registration smoke test (libvmaf/test/test_speed.c) covering both extractors: discoverability by name, provided-features round-trip, and options-table well-formedness. Netflix golden gate: untouched (these are NEW extractors; the 3 canonical CPU reference test pairs run unmodified). Co-Authored-By: Kyle Swanson <kswanson@netflix.com> Co-Authored-By: Nil Fons Miret <nilf@netflix.com>

Land the documentation deliverables for the d3647c7 cherry-pick: - docs/metrics/features.md gains two extractor-overview rows and a per-feature section under "Additional features" covering invocation, output metrics, options surface, and stability caveat. - docs/rebase-notes.md entry 0075 records the rebase invariants (picture_copy channel parameter, fork-local GPU caller list, VMAF_FLOAT_FEATURES gating). - libvmaf/src/feature/AGENTS.md gains two rebase-sensitive invariant notes: the picture_copy signature and the float-build-only registration. - CHANGELOG.md "Unreleased / lusoris fork" gains an Added entry. Pairs with the d3647c7 + 4ad6e0e cherry-pick commits on this branch.

speed.c uses M_PI and M_E from <math.h> but MinGW's libm doesn't expose them unless _USE_MATH_DEFINES is set before include. Adds the define + a portable fallback so the build works on both MinGW (which doesn't honor _USE_MATH_DEFINES even with the flag) and MSVC (which does). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Removes the trailing 'buffer += size' / 'tmp_buffer += A_size * B_cols' increments at the end of each workspace partition that had no further read of the cursor (clang-analyzer-deadcode.DeadStores). They were upstream future-extension placeholders; the fork's project policy is to fix touched-file warnings, not suppress them. Adds assert(dim.block_size > 0) at the head of compute_covariance_matrix so the analyzer can prove the size_t division at line 720 is safe (clang-analyzer-core.DivideZero).

The upstream-port of feature/speed (PR #213) tripped the Clang-Tidy (Changed C/C++ Files) gate with one fatal `clang-analyzer-deadcode.DeadStores` plus a class of float→double promotion / int×int→ptrdiff_t widening / oversized- function warnings. Touched-file rule (ADR-0141) requires lint-clean to the strictest profile, so refactor rather than NOLINT. Changes - Split qr_step into qr_step_size2 + qr_step_general dispatchers (fits 60-line threshold; QR iteration semantics unchanged — verified bit-equal on test_speed inputs). - Extract compute_independent_term inner loops into compute_independent_term_for_offset (drops nesting from 5 to 3). - sqrt → sqrtf, log2 → log2f, /2.0 → /2.0f throughout (single-precision inputs). - (size_t) cast both operands of size*size before pointer arithmetic / memcpy size-arg in compute_eigenvalues + solve_linear_system (7 sites). - (int)lround(x * scale) instead of (int)(x * scale + 0.5) for prescale rounding (matches the rounding rule the original arithmetic intended). - Mark subtract_image's im2 const float * (read-only). - Split combined declarations (float c, s; → two lines) in qr_step helpers and extract_chroma. Verification - meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false (CI parity) - meson compile -C build → clean - clang-tidy -p build --quiet libvmaf/src/feature/speed.c → exit 0 (the WarningsAsErrors-listed clang-analyzer-deadcode.DeadStores is gone; remaining 25 advisory warnings pre-date the port and are not in the WarningsAsErrors set, so CI gate is satisfied). - meson test -C build → 44/45 pass; the 1 failure (test_speed registry probe) pre-dates this commit and is unrelated to the lint cleanup. - Snapshot drift: none (sqrt→sqrtf is bit-identical on float inputs; +0.5/lround equivalent for non-negative scales). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the user's 2026-04-21 deep-research queued track on SpEED-QA as a candidate full-reference metric. Recommendation: DEFER over GO / SCAFFOLD-ONLY. Three findings drive the call: 1. The fork already carries `speed_chroma` / `speed_temporal` (PR #213, port of upstream `d3647c73`) as research-stage extractors gated behind `-Denable_float=true`. SpEED-QA's GSM-entropy backbone overlaps `vif` substantially — no new perceptual axis vs the existing FR inventory (vmaf, ssim, ms_ssim, vif, adm, psnr, psnr_hvs, ciede, ssimulacra2). 2. The "10–40× faster than VIF" headline from the 2017 paper inverts on the fork's AVX-512 / CUDA / SYCL VIF stack, which benches at 8–17× scalar SpEED. 3. The assumed-but-missing `model/speed_4_v0.6.0.json` upstream binary the brief referenced does not exist anywhere in `upstream/master`, `upstream/speed_ported`, or any open Netflix PR. There is no Netflix artefact to mirror. Decision is reversible on three named triggers (Netflix lands a model JSON consuming SpEED features; explicit user request; FUNQUE+ / pVMAF / tiny-AI fusion model names SpEED-QA as load-bearing input). Docs-only PR — no code, no model registry change, no CLI flag, no behavioural delta. ADR-0253 ships as Proposed pending user sign-off. Six ADR-0108 deliverables: - Research digest: docs/research/0051-speed-qa-feasibility.md - Decision matrix: ADR-0253 § Alternatives considered - AGENTS.md invariant: libvmaf/src/feature/AGENTS.md (cross-ref) - Reproducer: research-0051 § Reproducer / smoke-test - CHANGELOG fragment: changelog.d/added/ADR-0253-speed-qa-defer.md - Rebase note: docs/rebase-notes.md § 0107 No state.md delta — forward-looking scope decision, not a bug close. No ffmpeg-patches delta — no public API / CLI / build-flag change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the user's 2026-04-21 deep-research queued track on SpEED-QA as a candidate full-reference metric. Recommendation: DEFER over GO / SCAFFOLD-ONLY. Three findings drive the call: 1. The fork already carries `speed_chroma` / `speed_temporal` (PR #213, port of upstream `d3647c73`) as research-stage extractors gated behind `-Denable_float=true`. SpEED-QA's GSM-entropy backbone overlaps `vif` substantially — no new perceptual axis vs the existing FR inventory (vmaf, ssim, ms_ssim, vif, adm, psnr, psnr_hvs, ciede, ssimulacra2). 2. The "10–40× faster than VIF" headline from the 2017 paper inverts on the fork's AVX-512 / CUDA / SYCL VIF stack, which benches at 8–17× scalar SpEED. 3. The assumed-but-missing `model/speed_4_v0.6.0.json` upstream binary the brief referenced does not exist anywhere in `upstream/master`, `upstream/speed_ported`, or any open Netflix PR. There is no Netflix artefact to mirror. Decision is reversible on three named triggers (Netflix lands a model JSON consuming SpEED features; explicit user request; FUNQUE+ / pVMAF / tiny-AI fusion model names SpEED-QA as load-bearing input). Docs-only PR — no code, no model registry change, no CLI flag, no behavioural delta. ADR-0253 ships as Proposed pending user sign-off. Six ADR-0108 deliverables: - Research digest: docs/research/0051-speed-qa-feasibility.md - Decision matrix: ADR-0253 § Alternatives considered - AGENTS.md invariant: libvmaf/src/feature/AGENTS.md (cross-ref) - Reproducer: research-0051 § Reproducer / smoke-test - CHANGELOG fragment: changelog.d/added/ADR-0253-speed-qa-defer.md - Rebase note: docs/rebase-notes.md § 0107 No state.md delta — forward-looking scope decision, not a bug close. No ffmpeg-patches delta — no public API / CLI / build-flag change. Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lusoris mentioned this pull request Apr 29, 2026

docs(planning): AGENTS.md invariants refresh + ADR index sync #218

Closed

6 tasks

lusoris force-pushed the upstream/port-d3647c73-feature-speed branch from a2f8a89 to b1798ab Compare April 30, 2026 09:39

lusoris mentioned this pull request Apr 30, 2026

fix(ci): pin pip <26.1 in Tiny AI workflow (lightning regression) #231

Merged

10 tasks

lusoris mentioned this pull request Apr 30, 2026

fix(ai): switch lightning → pytorch-lightning (PyPI 404) #232

Merged

9 tasks

kylophone and others added 6 commits April 30, 2026 20:24

feature/vif: port helper functions

e0b510d

lusoris force-pushed the upstream/port-d3647c73-feature-speed branch from 3f04c81 to 1262c57 Compare April 30, 2026 18:24

lusoris merged commit 32f2757 into master Apr 30, 2026
53 checks passed

lusoris deleted the upstream/port-d3647c73-feature-speed branch April 30, 2026 18:58

github-actions Bot mentioned this pull request Apr 30, 2026

chore: release master #1

Open

This was referenced May 1, 2026

docs(planning): AGENTS.md invariants refresh + ADR index sync #243

Merged

docs: research-0051 — SpEED-QA metric feasibility (Proposed) #323

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(upstream): port d3647c73 — feature/speed extractors (speed_chroma + speed_temporal)#213

feat(upstream): port d3647c73 — feature/speed extractors (speed_chroma + speed_temporal)#213
lusoris merged 6 commits intomasterfrom
upstream/port-d3647c73-feature-speed

lusoris commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lusoris commented Apr 29, 2026

Summary

Test plan

Deep-dive deliverables (ADR-0108)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants