Skip to content

refactor(vulkan): T-GPU-DEDUP-23 — ms_ssim_vulkan migrated to kernel_template (2-bundle)#289

Merged
lusoris merged 1 commit intomasterfrom
refactor/ms-ssim-vulkan-template
May 3, 2026
Merged

refactor(vulkan): T-GPU-DEDUP-23 — ms_ssim_vulkan migrated to kernel_template (2-bundle)#289
lusoris merged 1 commit intomasterfrom
refactor/ms-ssim-vulkan-template

Conversation

@lusoris
Copy link
Copy Markdown
Owner

@lusoris lusoris commented May 2, 2026

Summary

ms_ssim_vulkan becomes a 2-bundle consumer of
vulkan/kernel_template.h. Two distinct pipeline shapes
(decimate = 2 SSBO bindings, ssim = 10 bindings) prevent
collapsing to a single bundle — _add_variant() only siblings
pipelines under the same layout. State drops 7 long-lived
pipeline-object fields for two VmafVulkanKernelPipeline
bundles (pl_decimate + pl_ssim) each owning their own
descriptor pool.

decimate_pipelines[0] aliases pl_decimate.pipeline (base =
scale 0). The other 3 decimate slots are siblings via
_add_variant(). ssim_pipeline_horiz[0] aliases
pl_ssim.pipeline (scale 0, pass 0); the other 9 ssim slots are
variants.

Test plan

  • ninja -C build-vulkan clean
  • meson test test_vulkan_smoke test_vulkan_async_pending_fence test_vulkan_pic_preallocation — all green
  • pre-commit run --files <touched> clean
  • Numerical: float_ms_ssim Netflix-pair smoke (576×324×48f) mean 0.963241, same as pre-migration

Base branch

This PR targets refactor/vulkan-template-add-variant (#272) — uses _add_variant() which is not yet on master.

Six deep-dive deliverables (ADR-0108)

  • (1) Research digest: no digest needed: routine refactor
  • (2) Decision matrix: no alternatives: only-one-way fix (distinct DSLs force two bundles; same shape as ssimulacra2 batch follow-up)
  • (3) AGENTS.md invariant note: libvmaf/src/vulkan/AGENTS.md — multi-bundle kernels: per-bundle pool + alias-skip on destroy
  • (4) Reproducer / smoke-test command: see Test plan above
  • (5) CHANGELOG fragment: entry under ### Changed
  • (6) Rebase note: 0123

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 2, 2026 14:22
@lusoris lusoris marked this pull request as draft May 2, 2026 14:29
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Refactors ms_ssim_vulkan to consume vulkan/kernel_template.h using two VmafVulkanKernelPipeline bundles (decimate + ssim) due to differing descriptor-set layouts, and updates project docs/changelog to document the multi-bundle + aliasing destroy rules.

Changes:

  • Migrate ms_ssim_vulkan.c pipeline setup/teardown to kernel-template bundles (pl_decimate, pl_ssim) and per-bundle descriptor pools.
  • Document the “multi-bundle + alias slot skip on destroy” invariant in Vulkan agent notes and rebase notes.
  • Add a CHANGELOG.md entry describing the migration and invariants.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
libvmaf/src/vulkan/AGENTS.md Documents the multi-bundle kernel pattern and alias-slot destroy-skip invariant.
libvmaf/src/feature/vulkan/ms_ssim_vulkan.c Refactors pipeline creation/destruction and descriptor allocation to use kernel-template bundles.
docs/rebase-notes.md Adds rebase entry detailing the 2-bundle migration invariants.
CHANGELOG.md Adds release note describing the migration and behavioral invariants.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 222 to +229
VkComputePipelineCreateInfo cpci = {
.sType = VK_STRUCTURE_TYPE_COMPUTE_PIPELINE_CREATE_INFO,
.stage =
{
.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO,
.stage = VK_SHADER_STAGE_COMPUTE_BIT,
.module = s->decimate_shader,
.pName = "main",
.pSpecializationInfo = &spec_info,
},
.layout = s->decimate_pl,
};
if (vkCreateComputePipelines(s->ctx->device, VK_NULL_HANDLE, 1, &cpci, NULL, out_pipeline) !=
VK_SUCCESS)
return -ENOMEM;
return 0;
return vmaf_vulkan_kernel_pipeline_add_variant(s->ctx, &s->pl_decimate, &cpci, out_pipeline);
Comment on lines 283 to +290
VkComputePipelineCreateInfo cpci = {
.sType = VK_STRUCTURE_TYPE_COMPUTE_PIPELINE_CREATE_INFO,
.stage =
{
.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO,
.stage = VK_SHADER_STAGE_COMPUTE_BIT,
.module = s->ssim_shader,
.pName = "main",
.pSpecializationInfo = &spec_info,
},
.layout = s->ssim_pl,
};
if (vkCreateComputePipelines(s->ctx->device, VK_NULL_HANDLE, 1, &cpci, NULL, out_pipeline) !=
VK_SUCCESS)
return -ENOMEM;
return 0;
return vmaf_vulkan_kernel_pipeline_add_variant(s->ctx, &s->pl_ssim, &cpci, out_pipeline);
Comment on lines +309 to +316
.pipeline_create_info =
{
.stage =
{
.pName = "main",
.pSpecializationInfo = &spec_info,
},
},
Comment on lines +346 to +353
.pipeline_create_info =
{
.stage =
{
.pName = "main",
.pSpecializationInfo = &spec_info,
},
},
Comment thread CHANGELOG.md
Comment on lines +108 to +110
~75 LOC of two `_pipeline_create()` + variant loops.
`close_fex()`'s 8×`vkDestroy*` sweep across two pipeline shapes
collapses to two `_pipeline_destroy()` calls plus the
@lusoris lusoris force-pushed the refactor/vulkan-template-add-variant branch from af40030 to 2ed2ec8 Compare May 2, 2026 15:00
Base automatically changed from refactor/vulkan-template-add-variant to master May 2, 2026 15:21
@lusoris lusoris marked this pull request as ready for review May 2, 2026 20:29
@lusoris lusoris force-pushed the refactor/ms-ssim-vulkan-template branch 2 times, most recently from 1f66be9 to c03d90a Compare May 2, 2026 21:17
@lusoris lusoris marked this pull request as draft May 2, 2026 21:26
@lusoris lusoris marked this pull request as ready for review May 2, 2026 23:16
…template (2-bundle)

State drops 7 long-lived pipeline-object fields (decimate_dsl,
decimate_pl, decimate_shader, ssim_dsl, ssim_pl, ssim_shader,
shared desc_pool) for two VmafVulkanKernelPipeline bundles
(pl_decimate + pl_ssim) — distinct DSLs (2 vs 10 SSBOs) force two
bundles since _add_variant only siblings under one layout.

decimate_pipelines[0] aliases pl_decimate.pipeline (template's
base = scale 0); the other 3 decimate slots are siblings via
_add_variant. ssim_pipeline_horiz[0] aliases pl_ssim.pipeline
(scale 0, pass 0); the other 9 ssim slots (4× horiz scales 1..4 +
5× vert scales 0..4) are variants.

create_pipelines shrinks ~115 LOC of vkCreateDescriptorSetLayout/
vkCreatePipelineLayout/vkCreateShaderModule/vkCreateDescriptorPool
boilerplate to ~75 LOC of two _pipeline_create + variant loops.
close_fex's 8×vkDestroy sweep across two pipeline shapes collapses
to two _pipeline_destroy calls plus the per-variant destroys.
alloc_descriptor_set takes the bundle pointer (per-bundle pool +
DSL) instead of the shared s->desc_pool. Net -18 LOC in the .c.

Numerical: float_ms_ssim Netflix-pair smoke (576×324×48f) reports
mean 0.963241, identical to pre-migration. Same shaders, spec
constants, push constants, dispatch order.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lusoris lusoris force-pushed the refactor/ms-ssim-vulkan-template branch from c03d90a to cdf78ba Compare May 2, 2026 23:16
@lusoris lusoris merged commit e204d47 into master May 3, 2026
54 checks passed
@lusoris lusoris deleted the refactor/ms-ssim-vulkan-template branch May 3, 2026 00:34
lusoris pushed a commit that referenced this pull request May 3, 2026
…it-side template + fence pool + descriptor pre-alloc

Bundles three follow-up optimizations from the 2026-05-02 Vulkan profile
(docs/development/vulkan-dedup-profile-2026-05-02.md) into one PR against
libvmaf/src/vulkan/kernel_template.h plus the four kernel TUs that have
already adopted the template on master:

- T-GPU-DEDUP-25: migrate the four kernels onto new
  vmaf_vulkan_kernel_submit_acquire / _end_and_wait / _free helpers.
- T-GPU-OPT-VK-1: VmafVulkanKernelSubmitPool pre-allocates VkFence +
  VkCommandBuffer pairs in init() and recycles them per frame via
  vkResetFences + vkResetCommandBuffer (the context-level command pool
  already carries VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT).
- T-GPU-OPT-VK-4: vmaf_vulkan_kernel_descriptor_sets_alloc allocates
  descriptor sets once in init(); the per-frame path collapses to a
  single vkUpdateDescriptorSets at init() (buffers are init-time-stable
  for all four migrated kernels).

Migrated kernels:
- psnr_hvs_vulkan
- vif_vulkan
- float_vif_vulkan (worst offender for descriptor churn — 7 sets/frame)
- float_adm_vulkan

Out of scope (rationale in ADR-0252 §Alternatives considered):
- ssimulacra2_vulkan: profile shows it is host-side CPU-bound (1 s/frame
  scalar XYB conversion to preserve the places=2 cross-backend gate),
  not Vulkan-API-bound. SIMD vectorisation (T-GPU-OPT-VK-2) is the right
  follow-up for that kernel.
- adm_vulkan, ms_ssim_vulkan: their template-adoption PRs (#287, #289)
  have not yet merged to master. Migration follows once they land.

Bit-exactness preserved:
- 48-frame vmaf_float_v0.6.1 max abs diff vs prior Vulkan output: 0.0
- cross_backend_vif_diff places=4 vif: 0/48 mismatches across 4 scales
- cross_backend_vif_diff places=4 adm: 0/48 mismatches across 4 scales
- All 50 meson tests pass under NVIDIA proprietary ICD.

Pre-existing CPU-vs-Vulkan psnr_hvs drift (max abs diff 8.3e-5) is
unchanged by this PR.

ADR-0252 + research-0051 + AGENTS.md update + CHANGELOG entry +
rebase-notes entry 0125 ship in this PR.

Closes T-GPU-DEDUP-25, T-GPU-OPT-VK-1, T-GPU-OPT-VK-4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris pushed a commit that referenced this pull request May 3, 2026
…it-side template + fence pool + descriptor pre-alloc

Bundles three follow-up optimizations from the 2026-05-02 Vulkan profile
(docs/development/vulkan-dedup-profile-2026-05-02.md) into one PR against
libvmaf/src/vulkan/kernel_template.h plus the four kernel TUs that have
already adopted the template on master:

- T-GPU-DEDUP-25: migrate the four kernels onto new
  vmaf_vulkan_kernel_submit_acquire / _end_and_wait / _free helpers.
- T-GPU-OPT-VK-1: VmafVulkanKernelSubmitPool pre-allocates VkFence +
  VkCommandBuffer pairs in init() and recycles them per frame via
  vkResetFences + vkResetCommandBuffer (the context-level command pool
  already carries VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT).
- T-GPU-OPT-VK-4: vmaf_vulkan_kernel_descriptor_sets_alloc allocates
  descriptor sets once in init(); the per-frame path collapses to a
  single vkUpdateDescriptorSets at init() (buffers are init-time-stable
  for all four migrated kernels).

Migrated kernels:
- psnr_hvs_vulkan
- vif_vulkan
- float_vif_vulkan (worst offender for descriptor churn — 7 sets/frame)
- float_adm_vulkan

Out of scope (rationale in ADR-0252 §Alternatives considered):
- ssimulacra2_vulkan: profile shows it is host-side CPU-bound (1 s/frame
  scalar XYB conversion to preserve the places=2 cross-backend gate),
  not Vulkan-API-bound. SIMD vectorisation (T-GPU-OPT-VK-2) is the right
  follow-up for that kernel.
- adm_vulkan, ms_ssim_vulkan: their template-adoption PRs (#287, #289)
  have not yet merged to master. Migration follows once they land.

Bit-exactness preserved:
- 48-frame vmaf_float_v0.6.1 max abs diff vs prior Vulkan output: 0.0
- cross_backend_vif_diff places=4 vif: 0/48 mismatches across 4 scales
- cross_backend_vif_diff places=4 adm: 0/48 mismatches across 4 scales
- All 50 meson tests pass under NVIDIA proprietary ICD.

Pre-existing CPU-vs-Vulkan psnr_hvs drift (max abs diff 8.3e-5) is
unchanged by this PR.

ADR-0252 + research-0051 + AGENTS.md update + CHANGELOG entry +
rebase-notes entry 0125 ship in this PR.

Closes T-GPU-DEDUP-25, T-GPU-OPT-VK-1, T-GPU-OPT-VK-4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris added a commit that referenced this pull request May 3, 2026
…it-side template + fence pool + descriptor pre-alloc (#319)

* feat(vulkan): T-GPU-DEDUP-25 + T-GPU-OPT-VK-1 + T-GPU-OPT-VK-4 — submit-side template + fence pool + descriptor pre-alloc

Bundles three follow-up optimizations from the 2026-05-02 Vulkan profile
(docs/development/vulkan-dedup-profile-2026-05-02.md) into one PR against
libvmaf/src/vulkan/kernel_template.h plus the four kernel TUs that have
already adopted the template on master:

- T-GPU-DEDUP-25: migrate the four kernels onto new
  vmaf_vulkan_kernel_submit_acquire / _end_and_wait / _free helpers.
- T-GPU-OPT-VK-1: VmafVulkanKernelSubmitPool pre-allocates VkFence +
  VkCommandBuffer pairs in init() and recycles them per frame via
  vkResetFences + vkResetCommandBuffer (the context-level command pool
  already carries VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT).
- T-GPU-OPT-VK-4: vmaf_vulkan_kernel_descriptor_sets_alloc allocates
  descriptor sets once in init(); the per-frame path collapses to a
  single vkUpdateDescriptorSets at init() (buffers are init-time-stable
  for all four migrated kernels).

Migrated kernels:
- psnr_hvs_vulkan
- vif_vulkan
- float_vif_vulkan (worst offender for descriptor churn — 7 sets/frame)
- float_adm_vulkan

Out of scope (rationale in ADR-0252 §Alternatives considered):
- ssimulacra2_vulkan: profile shows it is host-side CPU-bound (1 s/frame
  scalar XYB conversion to preserve the places=2 cross-backend gate),
  not Vulkan-API-bound. SIMD vectorisation (T-GPU-OPT-VK-2) is the right
  follow-up for that kernel.
- adm_vulkan, ms_ssim_vulkan: their template-adoption PRs (#287, #289)
  have not yet merged to master. Migration follows once they land.

Bit-exactness preserved:
- 48-frame vmaf_float_v0.6.1 max abs diff vs prior Vulkan output: 0.0
- cross_backend_vif_diff places=4 vif: 0/48 mismatches across 4 scales
- cross_backend_vif_diff places=4 adm: 0/48 mismatches across 4 scales
- All 50 meson tests pass under NVIDIA proprietary ICD.

Pre-existing CPU-vs-Vulkan psnr_hvs drift (max abs diff 8.3e-5) is
unchanged by this PR.

ADR-0252 + research-0051 + AGENTS.md update + CHANGELOG entry +
rebase-notes entry 0125 ship in this PR.

Closes T-GPU-DEDUP-25, T-GPU-OPT-VK-1, T-GPU-OPT-VK-4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: re-trigger after canonical (N) deliverable labels

---------

Co-authored-by: Lusoris <lusoris@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants