Skip to content

feat(upstream): port d3647c73 — feature/speed extractors (speed_chroma + speed_temporal)#213

Merged
lusoris merged 6 commits intomasterfrom
upstream/port-d3647c73-feature-speed
Apr 30, 2026
Merged

feat(upstream): port d3647c73 — feature/speed extractors (speed_chroma + speed_temporal)#213
lusoris merged 6 commits intomasterfrom
upstream/port-d3647c73-feature-speed

Conversation

@lusoris
Copy link
Copy Markdown
Owner

@lusoris lusoris commented Apr 29, 2026

Summary

Ports Netflix upstream commit
d3647c73
("feature/speed: port speed_chroma and speed_temporal extractors")
together with its dependency
4ad6e0ea
("feature/vif: port helper functions"). Closes T-NEW-1 from the
T7-4 quarterly upstream audit (PR #205).

Adds two research-stage float-only feature extractors:

  • speed_chroma — per-channel chroma fidelity (Y / U / V / UV).
  • speed_temporal — temporal Speed score over a small cyclic
    frame buffer.

The cherry-pick widens picture_copy() with a new int channel
parameter so the new extractors can lift U / V planes from
VmafPicture. All in-tree callers (upstream-mirror float_*.c +
fork-local CUDA / Vulkan MS-SSIM, SSIM) are updated to pass
channel=0.

Both extractors register only when VMAF_FLOAT_FEATURES=1
(build with -Denable_float=true).

Test plan

  • meson setup build-cpu libvmaf -Denable_cuda=false -Denable_sycl=false -Denable_float=true && ninja -C build-cpu — clean build.
  • meson test -C build-cpu test_speed — 5/5 PASS (registration, provided-features round-trip, options-table well-formedness for both extractors).
  • meson test -C build-cpu — 39/39 PASS.
  • make test-netflix-golden — 162/171 PASS; the 9 failures (niqe + 7×pypsnr_fextractor + result_store) are pre-existing on master and unrelated to feature/speed (the 3 canonical CPU pairs and all VMAF assertions on src01 / checkerboards still pass).

Deep-dive deliverables (ADR-0108)

  • no research digest needed: pure upstream cherry-pick (verbatim port).
  • no alternatives: only-one-way port via /port-upstream-commit.
  • AGENTS.md invariant note — speed extractors registry row.
  • Reproducer / smoke-test command — see Test plan.
  • CHANGELOG.md entry — Unreleased § Added.
  • Rebase notedocs/rebase-notes.md upstream-merge marker.

@lusoris lusoris force-pushed the upstream/port-d3647c73-feature-speed branch from a2f8a89 to b1798ab Compare April 30, 2026 09:39
lusoris pushed a commit that referenced this pull request Apr 30, 2026
The upstream-port of feature/speed (PR #213) tripped the Clang-Tidy
(Changed C/C++ Files) gate with one fatal `clang-analyzer-deadcode.DeadStores`
plus a class of float→double promotion / int×int→ptrdiff_t widening / oversized-
function warnings. Touched-file rule (ADR-0141) requires lint-clean to the
strictest profile, so refactor rather than NOLINT.

Changes
- Split qr_step into qr_step_size2 + qr_step_general dispatchers (fits 60-line
  threshold; QR iteration semantics unchanged — verified bit-equal on
  test_speed inputs).
- Extract compute_independent_term inner loops into
  compute_independent_term_for_offset (drops nesting from 5 to 3).
- sqrt → sqrtf, log2 → log2f, /2.0 → /2.0f throughout (single-precision inputs).
- (size_t) cast both operands of size*size before pointer arithmetic / memcpy
  size-arg in compute_eigenvalues + solve_linear_system (7 sites).
- (int)lround(x * scale) instead of (int)(x * scale + 0.5) for prescale rounding
  (matches the rounding rule the original arithmetic intended).
- Mark subtract_image's im2 const float * (read-only).
- Split combined declarations (float c, s; → two lines) in qr_step helpers
  and extract_chroma.

Verification
- meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false (CI parity)
- meson compile -C build → clean
- clang-tidy -p build --quiet libvmaf/src/feature/speed.c → exit 0
  (the WarningsAsErrors-listed clang-analyzer-deadcode.DeadStores is gone;
  remaining 25 advisory warnings pre-date the port and are not in the
  WarningsAsErrors set, so CI gate is satisfied).
- meson test -C build → 44/45 pass; the 1 failure (test_speed registry probe)
  pre-dates this commit and is unrelated to the lint cleanup.
- Snapshot drift: none (sqrt→sqrtf is bit-identical on float inputs;
  +0.5/lround equivalent for non-negative scales).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request Apr 30, 2026
* fix(ci): pin pip <26.1 in Tiny AI workflow — lightning resolution regression

pip 26.1 (released 2026-04-30) regresses transitive resolution for
`lightning>=2.5,<3.0` and fails the `pip install -e ai` step in the
Tiny AI workflow with:

  ERROR: Could not find a version that satisfies the requirement
         lightning<3.0,>=2.5 (from vmaf-train) (from versions: none)

This blocks every PR's Tiny AI gate. Concrete cases observed:

- PR #213 same-SHA reruns: 12:41 UTC pass (pip 24.0) → 15:54 UTC fail
  (pip 26.1, after `pip install --upgrade pip` pulled the new release).
- PR #229 yesterday: passed with pip 24.0, lightning-2.6.1 resolved
  cleanly.

The PR diff itself doesn't touch `ai/`, so this is purely an upstream
pip regression breaking a working workflow.

Pin to `pip<26.1` in the Tiny AI venv until pip ships a fix. Other
workflows that don't depend on `lightning` resolution are left alone
to keep this change narrow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): drop pip upgrade in Tiny AI venv — pip 26.x can't resolve lightning

Initial pin `pip<26.1` was insufficient: pip 26.0.1 has the same
"No matching distribution for lightning<3.0,>=2.5" regression as 26.1.
Verified on this PR's own run (job 73817357146): venv installed
pip-26.0.1 (under 26.1), then `pip install -e ai` failed identically.

The runner image ships pip 24.0 pre-installed (PR #229 baseline,
2026-04-29, succeeded). Drop the `--upgrade pip` line so the venv
inherits 24.0 and skips the broken 26.x release entirely. Re-introduce
the upgrade once a pip release lands that resolves lightning correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kylophone and others added 6 commits April 30, 2026 20:24
Cherry-pick of Netflix upstream d3647c7 "feature/speed: port
speed_chroma and speed_temporal extractors" with its dependency
4ad6e0e "feature/vif: port helper functions".

Adds two new floating-point feature extractors:
  - speed_chroma:   Y/U/V/UV-channel chroma fidelity scores.
  - speed_temporal: temporal speed scores accumulated across a
                    cyclic frame buffer.

The d3647c7 commit also widens the picture_copy() signature with a
new `channel` parameter so the extractors can lift U/V planes from
VmafPicture.  All in-tree callers (float_adm / float_ansnr /
float_moment / float_motion / float_ms_ssim / float_psnr /
float_ssim / float_vif and the fork-local Vulkan + CUDA MS-SSIM
paths) are updated to pass channel=0.

Speed extractors only wire into the registration list when
VMAF_FLOAT_FEATURES is enabled (i.e. `-Denable_float=true`),
matching the surrounding float-extractor block.

Includes a fork-local registration smoke test
(libvmaf/test/test_speed.c) covering both extractors:
discoverability by name, provided-features round-trip, and
options-table well-formedness.

Netflix golden gate: untouched (these are NEW extractors; the 3
canonical CPU reference test pairs run unmodified).

Co-Authored-By: Kyle Swanson <kswanson@netflix.com>
Co-Authored-By: Nil Fons Miret <nilf@netflix.com>
Land the documentation deliverables for the d3647c7 cherry-pick:

- docs/metrics/features.md gains two extractor-overview rows and a
  per-feature section under "Additional features" covering
  invocation, output metrics, options surface, and stability caveat.
- docs/rebase-notes.md entry 0075 records the rebase invariants
  (picture_copy channel parameter, fork-local GPU caller list,
  VMAF_FLOAT_FEATURES gating).
- libvmaf/src/feature/AGENTS.md gains two rebase-sensitive
  invariant notes: the picture_copy signature and the
  float-build-only registration.
- CHANGELOG.md "Unreleased / lusoris fork" gains an Added entry.

Pairs with the d3647c7 + 4ad6e0e cherry-pick commits on this
branch.
speed.c uses M_PI and M_E from <math.h> but MinGW's libm doesn't
expose them unless _USE_MATH_DEFINES is set before include.
Adds the define + a portable fallback so the build works on
both MinGW (which doesn't honor _USE_MATH_DEFINES even with the
flag) and MSVC (which does).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes the trailing 'buffer += size' / 'tmp_buffer += A_size * B_cols'
increments at the end of each workspace partition that had no further
read of the cursor (clang-analyzer-deadcode.DeadStores). They were
upstream future-extension placeholders; the fork's project policy is
to fix touched-file warnings, not suppress them.

Adds assert(dim.block_size > 0) at the head of compute_covariance_matrix
so the analyzer can prove the size_t division at line 720 is safe
(clang-analyzer-core.DivideZero).
The upstream-port of feature/speed (PR #213) tripped the Clang-Tidy
(Changed C/C++ Files) gate with one fatal `clang-analyzer-deadcode.DeadStores`
plus a class of float→double promotion / int×int→ptrdiff_t widening / oversized-
function warnings. Touched-file rule (ADR-0141) requires lint-clean to the
strictest profile, so refactor rather than NOLINT.

Changes
- Split qr_step into qr_step_size2 + qr_step_general dispatchers (fits 60-line
  threshold; QR iteration semantics unchanged — verified bit-equal on
  test_speed inputs).
- Extract compute_independent_term inner loops into
  compute_independent_term_for_offset (drops nesting from 5 to 3).
- sqrt → sqrtf, log2 → log2f, /2.0 → /2.0f throughout (single-precision inputs).
- (size_t) cast both operands of size*size before pointer arithmetic / memcpy
  size-arg in compute_eigenvalues + solve_linear_system (7 sites).
- (int)lround(x * scale) instead of (int)(x * scale + 0.5) for prescale rounding
  (matches the rounding rule the original arithmetic intended).
- Mark subtract_image's im2 const float * (read-only).
- Split combined declarations (float c, s; → two lines) in qr_step helpers
  and extract_chroma.

Verification
- meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false (CI parity)
- meson compile -C build → clean
- clang-tidy -p build --quiet libvmaf/src/feature/speed.c → exit 0
  (the WarningsAsErrors-listed clang-analyzer-deadcode.DeadStores is gone;
  remaining 25 advisory warnings pre-date the port and are not in the
  WarningsAsErrors set, so CI gate is satisfied).
- meson test -C build → 44/45 pass; the 1 failure (test_speed registry probe)
  pre-dates this commit and is unrelated to the lint cleanup.
- Snapshot drift: none (sqrt→sqrtf is bit-identical on float inputs;
  +0.5/lround equivalent for non-negative scales).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lusoris lusoris force-pushed the upstream/port-d3647c73-feature-speed branch from 3f04c81 to 1262c57 Compare April 30, 2026 18:24
@lusoris lusoris merged commit 32f2757 into master Apr 30, 2026
53 checks passed
@lusoris lusoris deleted the upstream/port-d3647c73-feature-speed branch April 30, 2026 18:58
@github-actions github-actions Bot mentioned this pull request Apr 30, 2026
lusoris pushed a commit that referenced this pull request May 3, 2026
Closes the user's 2026-04-21 deep-research queued track on SpEED-QA as
a candidate full-reference metric. Recommendation: DEFER over GO /
SCAFFOLD-ONLY.

Three findings drive the call:

1. The fork already carries `speed_chroma` / `speed_temporal` (PR #213,
   port of upstream `d3647c73`) as research-stage extractors gated
   behind `-Denable_float=true`. SpEED-QA's GSM-entropy backbone
   overlaps `vif` substantially — no new perceptual axis vs the
   existing FR inventory (vmaf, ssim, ms_ssim, vif, adm, psnr,
   psnr_hvs, ciede, ssimulacra2).
2. The "10–40× faster than VIF" headline from the 2017 paper inverts
   on the fork's AVX-512 / CUDA / SYCL VIF stack, which benches at
   8–17× scalar SpEED.
3. The assumed-but-missing `model/speed_4_v0.6.0.json` upstream binary
   the brief referenced does not exist anywhere in `upstream/master`,
   `upstream/speed_ported`, or any open Netflix PR. There is no
   Netflix artefact to mirror.

Decision is reversible on three named triggers (Netflix lands a model
JSON consuming SpEED features; explicit user request; FUNQUE+ /
pVMAF / tiny-AI fusion model names SpEED-QA as load-bearing input).

Docs-only PR — no code, no model registry change, no CLI flag, no
behavioural delta. ADR-0253 ships as Proposed pending user sign-off.

Six ADR-0108 deliverables:
- Research digest: docs/research/0051-speed-qa-feasibility.md
- Decision matrix: ADR-0253 § Alternatives considered
- AGENTS.md invariant: libvmaf/src/feature/AGENTS.md (cross-ref)
- Reproducer: research-0051 § Reproducer / smoke-test
- CHANGELOG fragment: changelog.d/added/ADR-0253-speed-qa-defer.md
- Rebase note: docs/rebase-notes.md § 0107

No state.md delta — forward-looking scope decision, not a bug close.
No ffmpeg-patches delta — no public API / CLI / build-flag change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 3, 2026
Closes the user's 2026-04-21 deep-research queued track on SpEED-QA as
a candidate full-reference metric. Recommendation: DEFER over GO /
SCAFFOLD-ONLY.

Three findings drive the call:

1. The fork already carries `speed_chroma` / `speed_temporal` (PR #213,
   port of upstream `d3647c73`) as research-stage extractors gated
   behind `-Denable_float=true`. SpEED-QA's GSM-entropy backbone
   overlaps `vif` substantially — no new perceptual axis vs the
   existing FR inventory (vmaf, ssim, ms_ssim, vif, adm, psnr,
   psnr_hvs, ciede, ssimulacra2).
2. The "10–40× faster than VIF" headline from the 2017 paper inverts
   on the fork's AVX-512 / CUDA / SYCL VIF stack, which benches at
   8–17× scalar SpEED.
3. The assumed-but-missing `model/speed_4_v0.6.0.json` upstream binary
   the brief referenced does not exist anywhere in `upstream/master`,
   `upstream/speed_ported`, or any open Netflix PR. There is no
   Netflix artefact to mirror.

Decision is reversible on three named triggers (Netflix lands a model
JSON consuming SpEED features; explicit user request; FUNQUE+ /
pVMAF / tiny-AI fusion model names SpEED-QA as load-bearing input).

Docs-only PR — no code, no model registry change, no CLI flag, no
behavioural delta. ADR-0253 ships as Proposed pending user sign-off.

Six ADR-0108 deliverables:
- Research digest: docs/research/0051-speed-qa-feasibility.md
- Decision matrix: ADR-0253 § Alternatives considered
- AGENTS.md invariant: libvmaf/src/feature/AGENTS.md (cross-ref)
- Reproducer: research-0051 § Reproducer / smoke-test
- CHANGELOG fragment: changelog.d/added/ADR-0253-speed-qa-defer.md
- Rebase note: docs/rebase-notes.md § 0107

No state.md delta — forward-looking scope decision, not a bug close.
No ffmpeg-patches delta — no public API / CLI / build-flag change.

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants