port(common): generalized AVX convolve from Netflix upstream f3a628b4 (ADR-0143) by lusoris · Pull Request #79 · lusoris/vmaf

lusoris · 2026-04-23T15:18:12Z

Summary

Port of Netflix upstream f3a628b4 "feature/common: generalize avx convolution for arbitrary filter widths" (Kyle Swanson, 2026-04-21), second of three clean upstream ports from the 2026-04-22 sync audit.

libvmaf/src/feature/common/convolution_avx.c drops from 2,747 LoC of branch-unrolled kernels specialised for fwidth ∈ {3, 5, 9, 17} to 247 LoC of a single parametric 1-D scanline pair.
New MAX_FWIDTH_AVX_CONV = 17 ceiling in convolution.h lets the VIF AVX2 dispatch in vif_tools.c drop its hard-coded whitelist.
Adopts Netflix's own paired python-golden loosening on two full-VMAF score assertions (places=2 → places=1) per the ADR-0142 Netflix-authority precedent.
ADR-0141 cleanup: 4 scanline helpers static, strides widened to ptrdiff_t, #include <stddef.h>. Zero clang-tidy warnings on the touched file.

See ADR-0143 for full decision matrix + invariants.

Depends on #78 (ADR-0142 precedent). If #78 merges first, this PR rebases cleanly.

Reproducer

meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build            # 32/32 pass
clang-tidy -p build libvmaf/src/feature/common/convolution_avx.c   # zero warnings

Test plan

ninja -C build — clean on x86.
meson test -C build — 32/32 pass.
clang-format — all touched C files clean.
clang-tidy -p build convolution_avx.c — zero warnings.
bash scripts/ci/assertion-density.sh — pass.
bash scripts/ci/check-copyright.sh — pass.
CI status checks including Netflix CPU Golden (D24) — should pass with upstream's places=1 loosening.

Deep-dive deliverables (ADR-0108)

Research digest — no digest needed: Netflix-authored single-commit port with a well-scoped surface (one file rewrite, two call-site updates, two paired golden loosenings); decision matrix captured in the ADR.
Decision matrix — docs/adr/0143-port-netflix-f3a628b4-generalized-avx-convolve.md §Alternatives considered.
AGENTS.md invariant note — libvmaf/src/feature/AGENTS.md gains the static-linkage + ptrdiff_t invariant on rebase.
Reproducer / smoke-test command — see block above.
CHANGELOG.md "lusoris fork" entry — under Unreleased → Changed.
Rebase note — docs/rebase-notes.md entry 0036.

🤖 Generated with Claude Code

Netflix's new python tests for non-default vif_sigma_nsq (`test_run_vmaf_fextractor_with_vif_sigma_nsq`, `test_run_float_vif_fextractor_with_vif_sigma_nsq`) assert on scores produced by upstream 18e8f1c's double-promoted per-pixel log2f arguments. The fork's initial port kept float-discipline throughout, producing scores ~2e-4 below Netflix's expected values — above places=4 tolerance. Project rule #1 (CLAUDE.md §8) protects Netflix golden assertions in python/test/; per the ADR-0142 Netflix-authority carve-out adopted for the paired port set (#78/#79/#80), upstream-authored test values are adopted verbatim. vif_tools.c: - vif_stat_one_pixel now takes `double vif_sigma_nsq` + `double sigma_max_inv` so the log2f args get upstream's double-promotion pattern. - sigma_max_inv computation stays at double for consistency. vif_statistic_s AVX2 dispatch: - Gate the AVX2 fast path on `vif_sigma_nsq == 2.0` (the default). For non-default values, fall through to the scalar reference so Netflix's new tests hit the upstream-matching double-promoted path. A future refactor could port ADR-0139's per-lane scalar-double reduction into vif_statistic_s_avx2 to unlock the AVX2 fast path for all sigma_nsq values. ADR-0142's decision matrix + §Decision update to follow separately if the fork-discipline tradeoff we originally chose needs re-spelling.

…-0142) (#78) * port(vif): vif_sigma_nsq parameter from Netflix upstream 18e8f1c (ADR-0142) Upstream commit promotes VIF's hard-coded neural-noise variance `static const float sigma_nsq = 2` into a runtime-configurable double parameter `vif_sigma_nsq` (alias `snsq`, default 2.0, range [0.0, 5.0]). Threaded through compute_vif → vif_statistic_s and the fork-local AVX2 variant `vif_statistic_s_avx2` (which upstream does not ship; its signature is extended in lockstep so both paths agree on the new 14-argument contract). Fork-specific discipline (ADR-0142 §Decision): - Default-path bit-identity: `vif_sigma_nsq = 2.0` produces scores bit-identical to pre-port master on both scalar + AVX2. `powf(2.0f, 2.0f) / (255² f) == 4.0f / 65025.0f` exactly. - Float arithmetic: compute sites use `const float sigma_nsq = (float)vif_sigma_nsq;` and float-cast `sigma_max_inv`, preserving the ADR-0138 / ADR-0139 float-arithmetic invariant. Upstream's scalar body double-promotes via `sv_sq + vif_sigma_nsq`; the fork deliberately stays float-only. ADR-0141 applied to touched files: - Fork-local `x86/vif_statistic_avx2.c`: stride→ptrdiff_t widening warnings fixed (5 sites); `readability-function-size` NOLINT cites ADR-0141 §Historical debt + backlog T7-5. Verified: 32/32 meson tests pass. Pre-port master has an unrelated CLI `float_vif` loader error; test_feature_extractor covers the dispatch + option parsing path exercised by this port. Deliverables: - ADR-0142 + index row. - rebase-notes.md entry 0035. - CHANGELOG Unreleased → Added. - libvmaf/src/feature/AGENTS.md rebase invariant (float discipline + signature lockstep with upstream). * refactor(vif): touched-file cleanup for ADR-0141 (fn-size + dead code + goldens) CI for PR #78 surfaced a number of clang-tidy errors + warnings on the VIF files this port touches. Per ADR-0141 and explicit user direction ("no NOLINT shortcuts"), these are fixed via real refactors rather than suppressions. vif.c: - Delete the entire `vifdiff` standalone-tool function + its only caller `apply_frame_differencing`. Both had no callers outside the TU and existed purely as legacy debugging infrastructure. - Split `compute_vif` into 5 single-responsibility helpers: `resolve_kernelscale_index`, `slice_vif_buffers`, `decimate_to_next_scale`, `compute_vif_at_scale`, `finalize_vif_score`. Two new structs `VifBuffers` and `VifScaleCtx` package the shared state cleanly. - Drop the VIF_OPT_DEBUG_DUMP block (writes intermediate float images to disk during extraction — developer-only debug flag not enabled in any release or CI build). - Fix `fflush(stdout)` discard with explicit `(void)` casts. - Replace `if (!(data_buf = aligned_malloc(...)))` with a separate assignment + null check. - Add ptrdiff_t cast on `scores[2 * scale]` pointer-offset multiplication. - Drop stale `#include "offset.h"` and add `#include "vif.h"` so the `compute_vif` declaration is visible in this TU (no more misc-use-internal-linkage false positive). vif_tools.c: - Extract `vif_stat_one_pixel` per-pixel helper from `vif_statistic_s`; the outer function is now the AVX2 dispatch + accumulation loop. - Extract `vif_mirror_tap` as a shared edge-mirror helper used by all three scalar convolve fallbacks. - Extract `vif_vpass_row_s`, `vif_vpass_row_sq_s`, `vif_vpass_row_xy_s`, `vif_hpass_row_s` helpers covering the scalar vertical/horizontal pass kernels. Both `vif_filter1d_s`, `vif_filter1d_sq_s`, and `vif_filter1d_xy_s` are now thin wrappers over these. - Widen all stride→pointer-offset multiplications to ptrdiff_t. - Split every multi-declaration. float_vif.c: - Extract `publish_debug_vif_features` helper; `extract` itself is now a slim dispatcher. - Add a redundant `extern VmafFeatureExtractor vmaf_fex_float_vif;` declaration so single-TU analysis sees the registry symbol's external linkage. x86/vif_statistic_avx2.{c,h}: - Introduce a proper header for the fork-local `vif_statistic_s_avx2` symbol. `vif_tools.c` includes it under ARCH_X86 instead of forward-declaring inline. all.c: - Drop the stale forward-declaration of `compute_vif` (pre-ADR-0142 signature, unused in this TU — the caller uses `vif.h`). Verified: 32/32 meson tests pass; zero clang-tidy warnings or errors across vif.c, vif_tools.c, float_vif.c, x86/vif_statistic_avx2.c. * chore(python): black-reformat python tests Apply black to the two python test files the port touched. The upstream content is verbatim-copied from Netflix; the fork runs black so the files need reformatting to match fork style. Values / assertions are unchanged. * fix(vif): match upstream double-promotion for vif_sigma_nsq Netflix's new python tests for non-default vif_sigma_nsq (`test_run_vmaf_fextractor_with_vif_sigma_nsq`, `test_run_float_vif_fextractor_with_vif_sigma_nsq`) assert on scores produced by upstream 18e8f1c's double-promoted per-pixel log2f arguments. The fork's initial port kept float-discipline throughout, producing scores ~2e-4 below Netflix's expected values — above places=4 tolerance. Project rule #1 (CLAUDE.md §8) protects Netflix golden assertions in python/test/; per the ADR-0142 Netflix-authority carve-out adopted for the paired port set (#78/#79/#80), upstream-authored test values are adopted verbatim. vif_tools.c: - vif_stat_one_pixel now takes `double vif_sigma_nsq` + `double sigma_max_inv` so the log2f args get upstream's double-promotion pattern. - sigma_max_inv computation stays at double for consistency. vif_statistic_s AVX2 dispatch: - Gate the AVX2 fast path on `vif_sigma_nsq == 2.0` (the default). For non-default values, fall through to the scalar reference so Netflix's new tests hit the upstream-matching double-promoted path. A future refactor could port ADR-0139's per-lane scalar-double reduction into vif_statistic_s_avx2 to unlock the AVX2 fast path for all sigma_nsq values. ADR-0142's decision matrix + §Decision update to follow separately if the fork-discipline tradeoff we originally chose needs re-spelling. * chore: drop user-machine absolute path from .vscode/settings.json bff261a's `git add -u` accidentally committed two lines targeting a workstation-specific /home/... path. Revert to master's version. * feature/vif: port helper functions, bugfix for edge mirroring * loosen assertion precision for vif mirror bugfix --------- Co-authored-by: Lusoris <lusoris@pm.me>

…(ADR-0143) Netflix upstream replaces four large specialised AVX convolve kernels in libvmaf/src/feature/common/convolution_avx.c (hard-coded to fwidth ∈ {3, 5, 9, 17}) with a single parametric 1-D scanline pair. File shrinks from 2,747 LoC to 247 LoC. New MAX_FWIDTH_AVX_CONV = 17 ceiling in convolution.h lets the VIF dispatch in vif_tools.c drop its fwidth whitelist in favour of `fwidth <= MAX_FWIDTH_AVX_CONV`. Netflix ships paired python-golden loosening: VMAF_score + VMAFEXEC_score assertions move from places=2 (±0.005) to places=1 (±0.05). The generalised kernel's accumulation order differs at ULP scale from the specialised paths, producing small final-VMAF drift well below perceptual discriminability. Adopted per the ADR-0142 Netflix-authority precedent: project rule #1 addresses fork drift, not upstream-authored test updates the fork must track. ADR-0141 touched-file cleanup on convolution_avx.c: - Four convolution_f32_avx_s_1d_*_scanline helpers changed from external linkage to `static` (no other TU uses them after the specialised-path removal; upstream leaves them extern out of habit). Silences misc-use-internal-linkage warnings. - Stride parameters widened from `int` to `ptrdiff_t` in the static helpers, with `(ptrdiff_t)` casts at public-wrapper multiplication sites. Silences 20+ bugprone-implicit-widening-of-multiplication-result warnings. - `#include <stddef.h>` added for ptrdiff_t. - Zero clang-tidy warnings on the touched file after cleanup. Verified: 32/32 meson tests pass. Netflix CPU golden leg exercises both loosened assertions; they're on upstream's own authority. Deliverables: - ADR-0143 + index row. - rebase-notes.md entry 0036 (fork-delta catalogue for rebase). - CHANGELOG Unreleased → Changed. - libvmaf/src/feature/AGENTS.md rebase invariant (static linkage + ptrdiff_t helper signatures on rebase).

lusoris mentioned this pull request Apr 23, 2026

port(vif): edge-mirror bugfix from Netflix upstream 41d42c9e+bc744aa3 (ADR-0144) #80

Closed

13 tasks

lusoris force-pushed the port/upstream-f3a628b4-generalized-avx-convolve branch from a0c7a18 to a74538c Compare April 23, 2026 20:39

lusoris merged commit 025a754 into master Apr 23, 2026
45 checks passed

lusoris deleted the port/upstream-f3a628b4-generalized-avx-convolve branch April 23, 2026 20:55

github-actions Bot mentioned this pull request Apr 23, 2026

chore: release master #1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

port(common): generalized AVX convolve from Netflix upstream f3a628b4 (ADR-0143)#79

port(common): generalized AVX convolve from Netflix upstream f3a628b4 (ADR-0143)#79
lusoris merged 1 commit intomasterfrom
port/upstream-f3a628b4-generalized-avx-convolve

lusoris commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lusoris commented Apr 23, 2026

Summary

Reproducer

Test plan

Deep-dive deliverables (ADR-0108)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant