refactor(gpu): land per-backend kernel scaffolding templates (CUDA + Vulkan, no migrations) by lusoris · Pull Request #229 · lusoris/vmaf

lusoris · 2026-04-29T18:49:29Z

Summary

Introduces per-backend GPU kernel scaffolding templates as header-only inline helpers — libvmaf/src/cuda/kernel_template.h (296 LOC) and libvmaf/src/vulkan/kernel_template.h (410 LOC) — that absorb the lifecycle boilerplate every fork-added GPU feature kernel currently re-implements by hand (CUDA: private non-blocking stream + 2 events + device-accumulator + pinned-readback; Vulkan: descriptor-set layout + pipeline + descriptor pool + per-frame command-buffer + fence).
Templates only — no kernel migrations. Each future kernel migration ships in its own PR gated by the places=4 cross-backend-diff lane (per ADR-0214) plus the Netflix CPU golden gate. Three deferred-migration follow-up T-rows added to CHANGELOG.
Per-backend (not cross-backend) because CUDA's async-stream + event model and Vulkan's command-buffer + fence + descriptor-pool model share no concrete shape. Helper functions (not macros) for cuda-gdb / Nsight / RenderDoc step-debugging.

Test plan

CUDA full build: meson setup libvmaf/build-cuda libvmaf -Denable_cuda=true -Denable_nvcc=true -Denable_vulkan=disabled -Denable_sycl=false && ninja -C libvmaf/build-cuda — green.
CUDA tests: meson test -C libvmaf/build-cuda — 45/45 passing.
Vulkan full build: meson setup libvmaf/build-vulkan libvmaf -Denable_vulkan=enabled -Denable_cuda=false -Denable_sycl=false && ninja -C libvmaf/build-vulkan — green.
Vulkan tests: meson test -C libvmaf/build-vulkan — 41/41 passing.
Header smoke compile (templates instantiated in isolated TU) — both compile cleanly.
pre-commit run --files on every touched file — all checks pass (trailing whitespace, EOF, clang-format, copyright headers, conventional commit).
No Netflix golden assertions touched.
No existing kernel implementations touched.

Deep-dive deliverables (ADR-0108)

no research digest needed: refactor of established pattern; cites sister GPU-template scope-analysis report.
Decision matrix — ADR-0221 § Alternatives considered (per-backend vs cross-backend; templates-only vs templates+migrations; macros vs helpers).
AGENTS.md invariant note — kernel-template contract row in libvmaf/src/cuda/AGENTS.md and a new libvmaf/src/vulkan/AGENTS.md.
Reproducer / smoke-test command — see Test plan.
CHANGELOG.md entry — Unreleased § Added.
Rebase note — docs/rebase-notes.md entry 0095.

🤖 Generated with Claude Code

…ghtning Initial pin `pip<26.1` was insufficient: pip 26.0.1 has the same "No matching distribution for lightning<3.0,>=2.5" regression as 26.1. Verified on this PR's own run (job 73817357146): venv installed pip-26.0.1 (under 26.1), then `pip install -e ai` failed identically. The runner image ships pip 24.0 pre-installed (PR #229 baseline, 2026-04-29, succeeded). Drop the `--upgrade pip` line so the venv inherits 24.0 and skips the broken 26.x release entirely. Re-introduce the upgrade once a pip release lands that resolves lightning correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): pin pip <26.1 in Tiny AI workflow — lightning resolution regression pip 26.1 (released 2026-04-30) regresses transitive resolution for `lightning>=2.5,<3.0` and fails the `pip install -e ai` step in the Tiny AI workflow with: ERROR: Could not find a version that satisfies the requirement lightning<3.0,>=2.5 (from vmaf-train) (from versions: none) This blocks every PR's Tiny AI gate. Concrete cases observed: - PR #213 same-SHA reruns: 12:41 UTC pass (pip 24.0) → 15:54 UTC fail (pip 26.1, after `pip install --upgrade pip` pulled the new release). - PR #229 yesterday: passed with pip 24.0, lightning-2.6.1 resolved cleanly. The PR diff itself doesn't touch `ai/`, so this is purely an upstream pip regression breaking a working workflow. Pin to `pip<26.1` in the Tiny AI venv until pip ships a fix. Other workflows that don't depend on `lightning` resolution are left alone to keep this change narrow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): drop pip upgrade in Tiny AI venv — pip 26.x can't resolve lightning Initial pin `pip<26.1` was insufficient: pip 26.0.1 has the same "No matching distribution for lightning<3.0,>=2.5" regression as 26.1. Verified on this PR's own run (job 73817357146): venv installed pip-26.0.1 (under 26.1), then `pip install -e ai` failed identically. The runner image ships pip 24.0 pre-installed (PR #229 baseline, 2026-04-29, succeeded). Drop the `--upgrade pip` line so the venv inherits 24.0 and skips the broken 26.x release entirely. Re-introduce the upgrade once a pip release lands that resolves lightning correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ai): switch lightning → pytorch-lightning (PyPI 404) Lightning AI un-published the `lightning` distribution from PyPI on 2026-04-30. Verified: $ curl -sSI https://pypi.org/pypi/lightning/json HTTP/2 404 $ curl -sSI https://pypi.org/pypi/pytorch-lightning/json HTTP/2 200 # version 2.6.1 — same wheel that pip resolved yesterday Yesterday's PR #229 successfully downloaded `lightning-2.6.1-py3-none-any.whl`; today every PR's Tiny AI gate fails with "No matching distribution found for lightning<3.0,>=2.5 (from versions: none)" — the resolver is correctly reporting the 404. PR #231's earlier `pip<26.1` pin was treating a misdiagnosis (pip resolver bug); the pin is harmless but unnecessary now and will be dropped in a follow-up. `pytorch-lightning >= 2.0` ships both `import lightning` and `import pytorch_lightning` entry points, so `import lightning as L` in ai/src/vmaf_train/{datamodule,train,models/*}.py and the train scripts continues to work without source changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ai): migrate `import lightning` → `import pytorch_lightning` The previous commit on this branch swapped the PyPI distribution name (`lightning>=2.5,<3.0` → `pytorch-lightning>=2.5,<3.0`) under the incorrect assumption that pytorch-lightning ships both the `pytorch_lightning` and `lightning` import namespaces. CI proved otherwise: pytest collection errors on this branch with `ModuleNotFoundError: No module named 'lightning'` from ai/tests/test_export_roundtrip.py and ai/tests/test_registry.py (import-side test discovery; these import from vmaf_train). The pytorch-lightning distribution only exposes `pytorch_lightning` as a top-level import. The `lightning` shim was an exclusive of the now-404 `lightning` distribution. Migrated all six call sites: ai/src/vmaf_train/datamodule.py ai/src/vmaf_train/train.py (also `lightning.pytorch.callbacks` → `pytorch_lightning.callbacks`) ai/src/vmaf_train/models/fr_regressor.py ai/src/vmaf_train/models/nr_metric.py ai/src/vmaf_train/models/learned_filter.py ai/scripts/train_konvid.py (same callbacks rename) API surface is unchanged — pytorch_lightning >= 2.0 has the same LightningModule / LightningDataModule / Trainer / callbacks API as the lightning distribution did. Updated the pyproject comment to reflect the namespace reality. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Copilot wasn't able to review any files in this pull request.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…er) (#371) * feat(tools): vmaf-tune Phase E — per-title bitrate ladder (game-changer) Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(docs): renumber phase-e ADR 0277→0295 + research 0066→0068 (collisions) --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…er) (#433) Scaffolds the Phase E ladder generator (ADR-0277) — the highest-leverage gap surfaced by PR #354's capability audit (Bucket #6). Mirrors the Netflix per-title encoding paper: sample (resolution × target-VMAF), take the Pareto upper-convex hull on (bitrate, vmaf), pick n rungs along the hull, emit an HLS / DASH / JSON manifest. Currently scaffold-only: the production sampler that drives Phase B's target-VMAF bisect (PR #347) lands once that PR merges. Default sampler raises NotImplementedError; tests inject a synthetic stub modelled on the Netflix paper's R-D curves. - New module tools/vmaf-tune/src/vmaftune/ladder.py — build_ladder, convex_hull (Pareto filter + diminishing-returns envelope), select_knees (log-bitrate or VMAF spacing), emit_manifest (HLS / DASH / JSON), and a build_and_emit convenience. - New `vmaf-tune ladder` CLI subcommand with the canonical 5-rung 1080p/720p/480p/360p/240p default rendition set. - 15 new ladder tests (28 total in tools/vmaf-tune/tests/) covering hull correctness on a synthetic Netflix-paper-shaped cloud, knee selection invariants, and HLS / DASH / JSON manifest emit shape. - ADR-0277 (Proposed; flips to Accepted once Phase B integration PR lands and a real-corpus PLCC validation digest reports the delta). - Research-0054 surveys the algorithm space (Netflix per-title paper, Apple HLS authoring spec, JND-spaced, BO sampling). - docs/usage/vmaf-tune.md gains a "Per-title ladder (Phase E)" section with the canonical invocation. - CHANGELOG, rebase-notes (#229), AGENTS.md invariant note. Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lusoris mentioned this pull request Apr 30, 2026

fix(ci): pin pip <26.1 in Tiny AI workflow (lightning regression) #231

Merged

10 tasks

Copilot AI review requested due to automatic review settings May 1, 2026 18:13

lusoris closed this May 1, 2026

lusoris force-pushed the refactor/gpu-kernel-templates branch from ad0c29a to 3130ca4 Compare May 1, 2026 18:13

Copilot AI reviewed May 1, 2026

View reviewed changes

lusoris mentioned this pull request May 1, 2026

refactor(gpu): land per-backend kernel scaffolding templates (CUDA + Vulkan, no migrations) #254

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(gpu): land per-backend kernel scaffolding templates (CUDA + Vulkan, no migrations)#229

refactor(gpu): land per-backend kernel scaffolding templates (CUDA + Vulkan, no migrations)#229
lusoris wants to merge 0 commit intomasterfrom
refactor/gpu-kernel-templates

lusoris commented Apr 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lusoris commented Apr 29, 2026

Summary

Test plan

Deep-dive deliverables (ADR-0108)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants