feat(batch-a): port four OPEN Netflix upstream PRs (ADR-0131/0132/0134/0135) by lusoris · Pull Request #72 · lusoris/vmaf

lusoris · 2026-04-20T20:52:28Z

Summary

Batch-A of the upstream small-fix sweep: ports substance from four open Netflix/vmaf PRs — each corrected during port where upstream had defects. All four upstream PRs are still unmerged as of 2026-04-20 (oldest is 21 months old, none has a merge signal), so the fork carries the fixes explicitly with ADRs documenting deviation.

T0-1 (CUDA) — [#1381] Change cuMemFreeAsync to cuMemFree in vmaf_cuda_picture_free() Netflix/vmaf#1382 cuMemFreeAsync → cuMemFree in vmaf_cuda_picture_free. Fixes the Assertion 0 failed crash reported in VMAF CUDA vmaf_cuda_picture_free() Assertion 0 failed Netflix/vmaf#1381 when two or more VMAF CUDA sessions free pictures concurrently. This unblocks running multiple VMAF instances in the same ffmpeg pipeline on nvidia, which has been failing with the multi-session data hazard until now. ADR-0131.
T4-4 (feature_collector) — Bugfix for adding/removing models. Netflix/vmaf#1406 mount/unmount list-corruption with ≥3 mounted models. Fork adds a correct local-cursor traversal and splits -ENOENT from -EINVAL. Test body refactored into shared 3-model helpers so the upstream inline 60-LoC duplication doesn't trip clang-tidy readability-function-size (JPL-P10 rule 4). ADR-0132.
T4-5 (meson) — build: declare dependency to use it as subproject Netflix/vmaf#1451 declare_dependency + override_dependency('libvmaf', ...) so the fork is consumable as a meson subproject via dependency('libvmaf'). ADR-0134.
T4-6 (public API) — Expose builtin model versions Netflix/vmaf#1424 vmaf_model_version_next iterator. Ported with three correctness corrections upstream omits: (1) NULL-pointer arithmetic UB via missing else, (2) off-by-one returning the {0} sentinel via idx < CNT, (3) const-qualifier mismatches in the test blocking -Werror. ADR-0135 documents each defect with C11 §§ references.

Six commits on the branch — four ports + one docs bundle + one clang-format fixup. All 30/30 meson tests pass locally; assertion-density + copyright headers green.

Type

fix — bug fix (T0-1, T4-4)
feat — new feature (T4-6 iterator)
build / ci — tooling / infra (T4-5)
port — cherry-pick from upstream Netflix/vmaf (all four)

Checklist

Commits follow Conventional Commits.
make format && make lint equivalent (pre-commit + scripts/ci/*) is green locally.
Unit tests pass: meson test -C build → 30/30 OK.
No SIMD/GPU numerical-path changes — T0-1 is a driver-API swap, not a kernel.
No new .c/.cpp/.cu files; license headers on touched upstream files preserved per ADR-0025.
Not a breaking change (iterator is purely additive).

Netflix golden-data gate (ADR-0024)

I did not modify any assertAlmostEqual(...) score in the Netflix golden Python tests.
No golden-value change claimed or needed.

Cross-backend numerical results

Not applicable — no kernel or scoring-path changes.

Deep-dive deliverables (ADR-0108)

Research digest — docs/research/0009-batch-a-upstream-port-strategy.md (why port OPEN PRs now, per-PR defect analysis, port-from-PR-tip rationale, batch-shape justification).
Decision matrix — each of ADR-0131 / 0132 / 0134 / 0135 carries its own ## Alternatives considered table with the wait-for-upstream / port-verbatim / port-with-corrections options weighed.
AGENTS.md invariant note — added new Rebase-sensitive invariants entry to libvmaf/src/cuda/AGENTS.md (picture_cuda.c sync-free contract) and a bullet under the existing section in libvmaf/src/feature/AGENTS.md (feature_collector mount/unmount traversal + test helpers).
Reproducer / smoke-test command — pasted below under "Reproducer".
CHANGELOG.md "lusoris fork" entry — two Added bullets (T4-5, T4-6) and two Fixed bullets (T0-1, T4-4) under the fork's Unreleased section.
Rebase note — docs/rebase-notes.md §0031 — full Touches / Invariant / Re-test with per-file conflict predictions and "keep fork version" resolution policy for when upstream merges any of the four PRs.

Reproducer

# Build without GPU to exercise T4-4, T4-5, T4-6 on any machine.
meson setup build libvmaf -Denable_cuda=false -Denable_sycl=false
ninja -C build
meson test -C build  # expect 30/30 OK

# T4-6 sanity: iterate the built-in model versions via the new API.
cat > /tmp/iter.c <<'C'
#include <stdio.h>
#include <libvmaf/model.h>
int main(void) {
    const void *h = NULL; const char *v = NULL; unsigned n = 0;
    while ((h = vmaf_model_version_next(h, &v)) != NULL) { printf("%s\n", v); n++; }
    printf("total: %u\n", n);
    return 0;
}
C
cc /tmp/iter.c -Ibuild/include -Lbuild/src -lvmaf -o /tmp/iter
LD_LIBRARY_PATH=build/src /tmp/iter

# T0-1 sanity (requires CUDA): concurrent-session tear-down no longer asserts.
# Reproduce Netflix#1381 scenario — two VMAF sessions on one GPU.
meson setup build-cuda libvmaf -Denable_cuda=true -Denable_sycl=false
ninja -C build-cuda
build-cuda/test/test_pic_preallocation  # exercises picture lifecycle

Known follow-ups

Ensure the parsing works properly with different locales Netflix/vmaf#1430 (thread-local RNG port, T4-3) is ~600 LoC and was held out of this batch; tracked as T4-3-standalone in the backlog audit.
When Netflix merges any of [#1381] Change cuMemFreeAsync to cuMemFree in vmaf_cuda_picture_free() Netflix/vmaf#1382 / Bugfix for adding/removing models. Netflix/vmaf#1406 / Expose builtin model versions Netflix/vmaf#1424 / build: declare dependency to use it as subproject Netflix/vmaf#1451, /sync-upstream will see conflicts in the four files carrying fork corrections — resolution is always "keep fork version" (rationale in rebase-notes §0031).

🤖 Generated with Claude Code

…p (ADR-0136) (#73) The ADR-0108 deep-dive checklist parser matched item labels literally (e.g. `AGENTS.md invariant note`), but the PR template wraps some of those labels in backticks/asterisks (e.g. `` - [ ] **`AGENTS.md` invariant note** —``). A PR author who ticked the template verbatim produced a body line where `.md` and `invariant note` were separated by a backtick + space, so the literal grep rejected conforming PRs. PR #72 tripped the exact failure on first CI — fixing the PR body "by hand" is the wrong layer. Fix: `tr -d '\`*_'` the body once before grepping, in both the parse and the diff-verification steps. Stripped chars do not appear inside any of the six deliverable labels, so the normalisation is loss-free for the matching path. Case-insensitive matching was already enabled. Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Replace cuMemFreeAsync(ptr, priv->cuda.str) with synchronous cuMemFree(ptr) at libvmaf/src/cuda/picture_cuda.c:247 to fix the assertion-0 crash in vmaf_cuda_picture_free() reported upstream as Netflix#1381 and addressed by the open upstream PR Netflix#1382. The fork already issues cuStreamSynchronize(priv->cuda.str) two lines earlier, so the async variant offered no overlap benefit - cuMemFree is both correct and non-regressive for performance. Fixes T0-1 in .workingdir2/BACKLOG.md. First commit of Batch-A upstream small-fix sweep. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…0132) Fix vmaf_feature_collector_mount_model list corruption on >=3 mounts (the previous **head traversal mutated the list instead of walking a local cursor) and align vmaf_feature_collector_unmount_model to return -ENOENT for not-found instead of -EINVAL. Test coverage extended to a 3-element mount/unmount sequence with insertion-order verification. Upstream duplicated the setup across both tests; refactored into a shared load_three_test_models / destroy_three_test_models helper to keep each test body under the JPL Power-of-10 rule-4 size threshold. T4-4 in .workingdir2/BACKLOG.md. Second commit of Batch-A. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ncy (ADR-0134) Appends declare_dependency(link_with: libvmaf, include_directories: [libvmaf_inc]) + meson.override_dependency('libvmaf', libvmaf_dep) to libvmaf/src/meson.build so consumers can use the fork as a meson subproject with the standard dependency('libvmaf') idiom. Fork uses trailing-comma style to match fork build-file conventions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Add public vmaf_model_version_next(prev, &version) iterator. Ports upstream PR#1424's API surface verbatim while correcting three latent defects during port: - NULL-pointer arithmetic UB on first call (upstream missed an else between the two if-branches, so NULL+1 on the second). - Off-by-one returning the {0} sentinel at end of iteration — condition must be idx+1 < CNT, not idx < CNT. - const-qualifier mismatches in the test (upstream used char* / void* against a const-qualified API, not allowed in C11). Early-returns NULL when BUILT_IN_MODEL_CNT == 0 so zero-models build configurations link cleanly. Test asserts the iterator both hands out the stored version pointer and visits every model exactly once. Doxygen header-doc replaces upstream's one-line comment. docs/api/index.md gains a programmatic-discovery example per the CLAUDE §12 r10 per-surface doc rule. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Bundles the five in-tree deep-dive deliverables for the Batch-A PR (ADR-0108 §Per-surface minimum bars; sixth deliverable — the reproducer/smoke-test — lives in the PR description). - docs/research/0009-batch-a-upstream-port-strategy.md: research digest covering why port OPEN upstream PRs now, per-PR defect analysis (three latent bugs corrected in Netflix#1424, test refactor in Netflix#1406), port-from-PR-tip rationale, and batch-shape justification. - CHANGELOG.md: Added entries for T4-5 (meson declare_dependency) and T4-6 (vmaf_model_version_next iterator); Fixed entries for T0-1 (CUDA cuMemFree) and T4-4 (feature_collector list corruption). - docs/rebase-notes.md §0031: Touches / Invariant / Re-test for all four ports with explicit "keep fork version on upstream conflict" resolution policy and file-by-file conflict predictions. - libvmaf/src/feature/AGENTS.md: new rebase-sensitive invariant for feature_collector mount/unmount traversal + shared test helpers. - libvmaf/src/cuda/AGENTS.md: new Rebase-sensitive invariants section for picture_cuda.c synchronous cuMemFree contract. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… port The `load_three_test_models` signature and one `mu_assert` split arguments across two lines where .clang-format prefers a single line — caught by pre-commit on the Batch-A deep-dive pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

lusoris mentioned this pull request Apr 20, 2026

ci(rule-enforcement): strip markdown emphasis before deliverables grep (ADR-0136) #73

Merged

6 tasks

Lusoris and others added 6 commits April 20, 2026 23:44

lusoris force-pushed the feat/batch-a-upstream-small-fix-sweep branch from 387c8f0 to 31559ba Compare April 20, 2026 21:46

lusoris merged commit df622ea into master Apr 20, 2026
41 of 42 checks passed

lusoris deleted the feat/batch-a-upstream-small-fix-sweep branch April 20, 2026 21:46

github-actions Bot mentioned this pull request Apr 20, 2026

chore: release master #1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(batch-a): port four OPEN Netflix upstream PRs (ADR-0131/0132/0134/0135)#72

feat(batch-a): port four OPEN Netflix upstream PRs (ADR-0131/0132/0134/0135)#72
lusoris merged 6 commits intomasterfrom
feat/batch-a-upstream-small-fix-sweep

lusoris commented Apr 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lusoris commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type

Checklist

Netflix golden-data gate (ADR-0024)

Cross-backend numerical results

Deep-dive deliverables (ADR-0108)

Reproducer

Known follow-ups

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lusoris commented Apr 20, 2026 •

edited

Loading