Skip to content

port(common): generalized AVX convolve from Netflix upstream f3a628b4 (ADR-0143)#79

Merged
lusoris merged 1 commit intomasterfrom
port/upstream-f3a628b4-generalized-avx-convolve
Apr 23, 2026
Merged

port(common): generalized AVX convolve from Netflix upstream f3a628b4 (ADR-0143)#79
lusoris merged 1 commit intomasterfrom
port/upstream-f3a628b4-generalized-avx-convolve

Conversation

@lusoris
Copy link
Copy Markdown
Owner

@lusoris lusoris commented Apr 23, 2026

Summary

Port of Netflix upstream f3a628b4 "feature/common: generalize avx convolution for arbitrary filter widths" (Kyle Swanson, 2026-04-21), second of three clean upstream ports from the 2026-04-22 sync audit.

  • libvmaf/src/feature/common/convolution_avx.c drops from 2,747 LoC of branch-unrolled kernels specialised for fwidth ∈ {3, 5, 9, 17} to 247 LoC of a single parametric 1-D scanline pair.
  • New MAX_FWIDTH_AVX_CONV = 17 ceiling in convolution.h lets the VIF AVX2 dispatch in vif_tools.c drop its hard-coded whitelist.
  • Adopts Netflix's own paired python-golden loosening on two full-VMAF score assertions (places=2places=1) per the ADR-0142 Netflix-authority precedent.
  • ADR-0141 cleanup: 4 scanline helpers static, strides widened to ptrdiff_t, #include <stddef.h>. Zero clang-tidy warnings on the touched file.

See ADR-0143 for full decision matrix + invariants.

Depends on #78 (ADR-0142 precedent). If #78 merges first, this PR rebases cleanly.

Reproducer

meson setup build -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build            # 32/32 pass
clang-tidy -p build libvmaf/src/feature/common/convolution_avx.c   # zero warnings

Test plan

  • ninja -C build — clean on x86.
  • meson test -C build — 32/32 pass.
  • clang-format — all touched C files clean.
  • clang-tidy -p build convolution_avx.c — zero warnings.
  • bash scripts/ci/assertion-density.sh — pass.
  • bash scripts/ci/check-copyright.sh — pass.
  • CI status checks including Netflix CPU Golden (D24) — should pass with upstream's places=1 loosening.

Deep-dive deliverables (ADR-0108)

  • Research digest — no digest needed: Netflix-authored single-commit port with a well-scoped surface (one file rewrite, two call-site updates, two paired golden loosenings); decision matrix captured in the ADR.
  • Decision matrixdocs/adr/0143-port-netflix-f3a628b4-generalized-avx-convolve.md §Alternatives considered.
  • AGENTS.md invariant notelibvmaf/src/feature/AGENTS.md gains the static-linkage + ptrdiff_t invariant on rebase.
  • Reproducer / smoke-test command — see block above.
  • CHANGELOG.md "lusoris fork" entry — under Unreleased → Changed.
  • Rebase notedocs/rebase-notes.md entry 0036.

🤖 Generated with Claude Code

lusoris pushed a commit that referenced this pull request Apr 23, 2026
Netflix's new python tests for non-default vif_sigma_nsq
(`test_run_vmaf_fextractor_with_vif_sigma_nsq`,
`test_run_float_vif_fextractor_with_vif_sigma_nsq`) assert on scores
produced by upstream 18e8f1c's double-promoted per-pixel log2f
arguments. The fork's initial port kept float-discipline throughout,
producing scores ~2e-4 below Netflix's expected values — above
places=4 tolerance.

Project rule #1 (CLAUDE.md §8) protects Netflix golden assertions in
python/test/; per the ADR-0142 Netflix-authority carve-out adopted
for the paired port set (#78/#79/#80), upstream-authored test values
are adopted verbatim.

vif_tools.c:
  - vif_stat_one_pixel now takes `double vif_sigma_nsq` +
    `double sigma_max_inv` so the log2f args get upstream's
    double-promotion pattern.
  - sigma_max_inv computation stays at double for consistency.

vif_statistic_s AVX2 dispatch:
  - Gate the AVX2 fast path on `vif_sigma_nsq == 2.0` (the default).
    For non-default values, fall through to the scalar reference so
    Netflix's new tests hit the upstream-matching double-promoted
    path. A future refactor could port ADR-0139's per-lane
    scalar-double reduction into vif_statistic_s_avx2 to unlock the
    AVX2 fast path for all sigma_nsq values.

ADR-0142's decision matrix + §Decision update to follow separately
if the fork-discipline tradeoff we originally chose needs re-spelling.
lusoris added a commit that referenced this pull request Apr 23, 2026
…-0142) (#78)

* port(vif): vif_sigma_nsq parameter from Netflix upstream 18e8f1c (ADR-0142)

Upstream commit promotes VIF's hard-coded neural-noise variance
`static const float sigma_nsq = 2` into a runtime-configurable double
parameter `vif_sigma_nsq` (alias `snsq`, default 2.0, range [0.0, 5.0]).
Threaded through compute_vif → vif_statistic_s and the fork-local AVX2
variant `vif_statistic_s_avx2` (which upstream does not ship; its
signature is extended in lockstep so both paths agree on the new
14-argument contract).

Fork-specific discipline (ADR-0142 §Decision):
- Default-path bit-identity: `vif_sigma_nsq = 2.0` produces scores
  bit-identical to pre-port master on both scalar + AVX2. `powf(2.0f,
  2.0f) / (255² f) == 4.0f / 65025.0f` exactly.
- Float arithmetic: compute sites use `const float sigma_nsq =
  (float)vif_sigma_nsq;` and float-cast `sigma_max_inv`, preserving
  the ADR-0138 / ADR-0139 float-arithmetic invariant. Upstream's
  scalar body double-promotes via `sv_sq + vif_sigma_nsq`; the fork
  deliberately stays float-only.

ADR-0141 applied to touched files:
- Fork-local `x86/vif_statistic_avx2.c`: stride→ptrdiff_t widening
  warnings fixed (5 sites); `readability-function-size` NOLINT cites
  ADR-0141 §Historical debt + backlog T7-5.

Verified: 32/32 meson tests pass. Pre-port master has an unrelated
CLI `float_vif` loader error; test_feature_extractor covers the
dispatch + option parsing path exercised by this port.

Deliverables:
- ADR-0142 + index row.
- rebase-notes.md entry 0035.
- CHANGELOG Unreleased → Added.
- libvmaf/src/feature/AGENTS.md rebase invariant (float discipline
  + signature lockstep with upstream).

* refactor(vif): touched-file cleanup for ADR-0141 (fn-size + dead code + goldens)

CI for PR #78 surfaced a number of clang-tidy errors + warnings on the
VIF files this port touches. Per ADR-0141 and explicit user direction
("no NOLINT shortcuts"), these are fixed via real refactors rather than
suppressions.

vif.c:
  - Delete the entire `vifdiff` standalone-tool function + its only
    caller `apply_frame_differencing`. Both had no callers outside the
    TU and existed purely as legacy debugging infrastructure.
  - Split `compute_vif` into 5 single-responsibility helpers:
    `resolve_kernelscale_index`, `slice_vif_buffers`,
    `decimate_to_next_scale`, `compute_vif_at_scale`,
    `finalize_vif_score`. Two new structs `VifBuffers` and
    `VifScaleCtx` package the shared state cleanly.
  - Drop the VIF_OPT_DEBUG_DUMP block (writes intermediate float
    images to disk during extraction — developer-only debug flag not
    enabled in any release or CI build).
  - Fix `fflush(stdout)` discard with explicit `(void)` casts.
  - Replace `if (!(data_buf = aligned_malloc(...)))` with a separate
    assignment + null check.
  - Add ptrdiff_t cast on `scores[2 * scale]` pointer-offset
    multiplication.
  - Drop stale `#include "offset.h"` and add `#include "vif.h"` so the
    `compute_vif` declaration is visible in this TU (no more
    misc-use-internal-linkage false positive).

vif_tools.c:
  - Extract `vif_stat_one_pixel` per-pixel helper from
    `vif_statistic_s`; the outer function is now the AVX2 dispatch +
    accumulation loop.
  - Extract `vif_mirror_tap` as a shared edge-mirror helper used by
    all three scalar convolve fallbacks.
  - Extract `vif_vpass_row_s`, `vif_vpass_row_sq_s`,
    `vif_vpass_row_xy_s`, `vif_hpass_row_s` helpers covering the
    scalar vertical/horizontal pass kernels. Both `vif_filter1d_s`,
    `vif_filter1d_sq_s`, and `vif_filter1d_xy_s` are now thin
    wrappers over these.
  - Widen all stride→pointer-offset multiplications to ptrdiff_t.
  - Split every multi-declaration.

float_vif.c:
  - Extract `publish_debug_vif_features` helper; `extract` itself is
    now a slim dispatcher.
  - Add a redundant `extern VmafFeatureExtractor vmaf_fex_float_vif;`
    declaration so single-TU analysis sees the registry symbol's
    external linkage.

x86/vif_statistic_avx2.{c,h}:
  - Introduce a proper header for the fork-local
    `vif_statistic_s_avx2` symbol. `vif_tools.c` includes it under
    ARCH_X86 instead of forward-declaring inline.

all.c:
  - Drop the stale forward-declaration of `compute_vif` (pre-ADR-0142
    signature, unused in this TU — the caller uses `vif.h`).

Verified: 32/32 meson tests pass; zero clang-tidy warnings or errors
across vif.c, vif_tools.c, float_vif.c, x86/vif_statistic_avx2.c.

* chore(python): black-reformat python tests

Apply black to the two python test files the port touched. The upstream
content is verbatim-copied from Netflix; the fork runs black so the
files need reformatting to match fork style. Values / assertions are
unchanged.

* fix(vif): match upstream double-promotion for vif_sigma_nsq

Netflix's new python tests for non-default vif_sigma_nsq
(`test_run_vmaf_fextractor_with_vif_sigma_nsq`,
`test_run_float_vif_fextractor_with_vif_sigma_nsq`) assert on scores
produced by upstream 18e8f1c's double-promoted per-pixel log2f
arguments. The fork's initial port kept float-discipline throughout,
producing scores ~2e-4 below Netflix's expected values — above
places=4 tolerance.

Project rule #1 (CLAUDE.md §8) protects Netflix golden assertions in
python/test/; per the ADR-0142 Netflix-authority carve-out adopted
for the paired port set (#78/#79/#80), upstream-authored test values
are adopted verbatim.

vif_tools.c:
  - vif_stat_one_pixel now takes `double vif_sigma_nsq` +
    `double sigma_max_inv` so the log2f args get upstream's
    double-promotion pattern.
  - sigma_max_inv computation stays at double for consistency.

vif_statistic_s AVX2 dispatch:
  - Gate the AVX2 fast path on `vif_sigma_nsq == 2.0` (the default).
    For non-default values, fall through to the scalar reference so
    Netflix's new tests hit the upstream-matching double-promoted
    path. A future refactor could port ADR-0139's per-lane
    scalar-double reduction into vif_statistic_s_avx2 to unlock the
    AVX2 fast path for all sigma_nsq values.

ADR-0142's decision matrix + §Decision update to follow separately
if the fork-discipline tradeoff we originally chose needs re-spelling.

* chore: drop user-machine absolute path from .vscode/settings.json

bff261a's `git add -u` accidentally committed two lines targeting a
workstation-specific /home/... path. Revert to master's version.

* feature/vif: port helper functions, bugfix for edge mirroring

* loosen assertion precision for vif mirror bugfix

---------

Co-authored-by: Lusoris <lusoris@pm.me>
…(ADR-0143)

Netflix upstream replaces four large specialised AVX convolve kernels
in libvmaf/src/feature/common/convolution_avx.c (hard-coded to
fwidth ∈ {3, 5, 9, 17}) with a single parametric 1-D scanline pair.
File shrinks from 2,747 LoC to 247 LoC. New MAX_FWIDTH_AVX_CONV = 17
ceiling in convolution.h lets the VIF dispatch in vif_tools.c drop
its fwidth whitelist in favour of `fwidth <= MAX_FWIDTH_AVX_CONV`.

Netflix ships paired python-golden loosening: VMAF_score +
VMAFEXEC_score assertions move from places=2 (±0.005) to places=1
(±0.05). The generalised kernel's accumulation order differs at
ULP scale from the specialised paths, producing small final-VMAF
drift well below perceptual discriminability. Adopted per the
ADR-0142 Netflix-authority precedent: project rule #1 addresses
fork drift, not upstream-authored test updates the fork must track.

ADR-0141 touched-file cleanup on convolution_avx.c:
- Four convolution_f32_avx_s_1d_*_scanline helpers changed from
  external linkage to `static` (no other TU uses them after the
  specialised-path removal; upstream leaves them extern out of
  habit). Silences misc-use-internal-linkage warnings.
- Stride parameters widened from `int` to `ptrdiff_t` in the
  static helpers, with `(ptrdiff_t)` casts at public-wrapper
  multiplication sites. Silences 20+
  bugprone-implicit-widening-of-multiplication-result warnings.
- `#include <stddef.h>` added for ptrdiff_t.
- Zero clang-tidy warnings on the touched file after cleanup.

Verified: 32/32 meson tests pass. Netflix CPU golden leg
exercises both loosened assertions; they're on upstream's own
authority.

Deliverables:
- ADR-0143 + index row.
- rebase-notes.md entry 0036 (fork-delta catalogue for rebase).
- CHANGELOG Unreleased → Changed.
- libvmaf/src/feature/AGENTS.md rebase invariant (static linkage +
  ptrdiff_t helper signatures on rebase).
@lusoris lusoris force-pushed the port/upstream-f3a628b4-generalized-avx-convolve branch from a0c7a18 to a74538c Compare April 23, 2026 20:39
@lusoris lusoris merged commit 025a754 into master Apr 23, 2026
45 checks passed
@lusoris lusoris deleted the port/upstream-f3a628b4-generalized-avx-convolve branch April 23, 2026 20:55
@github-actions github-actions Bot mentioned this pull request Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant