Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Refactors ms_ssim_vulkan to consume vulkan/kernel_template.h using two VmafVulkanKernelPipeline bundles (decimate + ssim) due to differing descriptor-set layouts, and updates project docs/changelog to document the multi-bundle + aliasing destroy rules.
Changes:
- Migrate
ms_ssim_vulkan.cpipeline setup/teardown to kernel-template bundles (pl_decimate,pl_ssim) and per-bundle descriptor pools. - Document the “multi-bundle + alias slot skip on destroy” invariant in Vulkan agent notes and rebase notes.
- Add a
CHANGELOG.mdentry describing the migration and invariants.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| libvmaf/src/vulkan/AGENTS.md | Documents the multi-bundle kernel pattern and alias-slot destroy-skip invariant. |
| libvmaf/src/feature/vulkan/ms_ssim_vulkan.c | Refactors pipeline creation/destruction and descriptor allocation to use kernel-template bundles. |
| docs/rebase-notes.md | Adds rebase entry detailing the 2-bundle migration invariants. |
| CHANGELOG.md | Adds release note describing the migration and behavioral invariants. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
222
to
+229
| VkComputePipelineCreateInfo cpci = { | ||
| .sType = VK_STRUCTURE_TYPE_COMPUTE_PIPELINE_CREATE_INFO, | ||
| .stage = | ||
| { | ||
| .sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO, | ||
| .stage = VK_SHADER_STAGE_COMPUTE_BIT, | ||
| .module = s->decimate_shader, | ||
| .pName = "main", | ||
| .pSpecializationInfo = &spec_info, | ||
| }, | ||
| .layout = s->decimate_pl, | ||
| }; | ||
| if (vkCreateComputePipelines(s->ctx->device, VK_NULL_HANDLE, 1, &cpci, NULL, out_pipeline) != | ||
| VK_SUCCESS) | ||
| return -ENOMEM; | ||
| return 0; | ||
| return vmaf_vulkan_kernel_pipeline_add_variant(s->ctx, &s->pl_decimate, &cpci, out_pipeline); |
Comment on lines
283
to
+290
| VkComputePipelineCreateInfo cpci = { | ||
| .sType = VK_STRUCTURE_TYPE_COMPUTE_PIPELINE_CREATE_INFO, | ||
| .stage = | ||
| { | ||
| .sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO, | ||
| .stage = VK_SHADER_STAGE_COMPUTE_BIT, | ||
| .module = s->ssim_shader, | ||
| .pName = "main", | ||
| .pSpecializationInfo = &spec_info, | ||
| }, | ||
| .layout = s->ssim_pl, | ||
| }; | ||
| if (vkCreateComputePipelines(s->ctx->device, VK_NULL_HANDLE, 1, &cpci, NULL, out_pipeline) != | ||
| VK_SUCCESS) | ||
| return -ENOMEM; | ||
| return 0; | ||
| return vmaf_vulkan_kernel_pipeline_add_variant(s->ctx, &s->pl_ssim, &cpci, out_pipeline); |
Comment on lines
+309
to
+316
| .pipeline_create_info = | ||
| { | ||
| .stage = | ||
| { | ||
| .pName = "main", | ||
| .pSpecializationInfo = &spec_info, | ||
| }, | ||
| }, |
Comment on lines
+346
to
+353
| .pipeline_create_info = | ||
| { | ||
| .stage = | ||
| { | ||
| .pName = "main", | ||
| .pSpecializationInfo = &spec_info, | ||
| }, | ||
| }, |
Comment on lines
+108
to
+110
| ~75 LOC of two `_pipeline_create()` + variant loops. | ||
| `close_fex()`'s 8×`vkDestroy*` sweep across two pipeline shapes | ||
| collapses to two `_pipeline_destroy()` calls plus the |
af40030 to
2ed2ec8
Compare
1f66be9 to
c03d90a
Compare
…template (2-bundle) State drops 7 long-lived pipeline-object fields (decimate_dsl, decimate_pl, decimate_shader, ssim_dsl, ssim_pl, ssim_shader, shared desc_pool) for two VmafVulkanKernelPipeline bundles (pl_decimate + pl_ssim) — distinct DSLs (2 vs 10 SSBOs) force two bundles since _add_variant only siblings under one layout. decimate_pipelines[0] aliases pl_decimate.pipeline (template's base = scale 0); the other 3 decimate slots are siblings via _add_variant. ssim_pipeline_horiz[0] aliases pl_ssim.pipeline (scale 0, pass 0); the other 9 ssim slots (4× horiz scales 1..4 + 5× vert scales 0..4) are variants. create_pipelines shrinks ~115 LOC of vkCreateDescriptorSetLayout/ vkCreatePipelineLayout/vkCreateShaderModule/vkCreateDescriptorPool boilerplate to ~75 LOC of two _pipeline_create + variant loops. close_fex's 8×vkDestroy sweep across two pipeline shapes collapses to two _pipeline_destroy calls plus the per-variant destroys. alloc_descriptor_set takes the bundle pointer (per-bundle pool + DSL) instead of the shared s->desc_pool. Net -18 LOC in the .c. Numerical: float_ms_ssim Netflix-pair smoke (576×324×48f) reports mean 0.963241, identical to pre-migration. Same shaders, spec constants, push constants, dispatch order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c03d90a to
cdf78ba
Compare
13 tasks
lusoris
pushed a commit
that referenced
this pull request
May 3, 2026
…it-side template + fence pool + descriptor pre-alloc Bundles three follow-up optimizations from the 2026-05-02 Vulkan profile (docs/development/vulkan-dedup-profile-2026-05-02.md) into one PR against libvmaf/src/vulkan/kernel_template.h plus the four kernel TUs that have already adopted the template on master: - T-GPU-DEDUP-25: migrate the four kernels onto new vmaf_vulkan_kernel_submit_acquire / _end_and_wait / _free helpers. - T-GPU-OPT-VK-1: VmafVulkanKernelSubmitPool pre-allocates VkFence + VkCommandBuffer pairs in init() and recycles them per frame via vkResetFences + vkResetCommandBuffer (the context-level command pool already carries VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT). - T-GPU-OPT-VK-4: vmaf_vulkan_kernel_descriptor_sets_alloc allocates descriptor sets once in init(); the per-frame path collapses to a single vkUpdateDescriptorSets at init() (buffers are init-time-stable for all four migrated kernels). Migrated kernels: - psnr_hvs_vulkan - vif_vulkan - float_vif_vulkan (worst offender for descriptor churn — 7 sets/frame) - float_adm_vulkan Out of scope (rationale in ADR-0252 §Alternatives considered): - ssimulacra2_vulkan: profile shows it is host-side CPU-bound (1 s/frame scalar XYB conversion to preserve the places=2 cross-backend gate), not Vulkan-API-bound. SIMD vectorisation (T-GPU-OPT-VK-2) is the right follow-up for that kernel. - adm_vulkan, ms_ssim_vulkan: their template-adoption PRs (#287, #289) have not yet merged to master. Migration follows once they land. Bit-exactness preserved: - 48-frame vmaf_float_v0.6.1 max abs diff vs prior Vulkan output: 0.0 - cross_backend_vif_diff places=4 vif: 0/48 mismatches across 4 scales - cross_backend_vif_diff places=4 adm: 0/48 mismatches across 4 scales - All 50 meson tests pass under NVIDIA proprietary ICD. Pre-existing CPU-vs-Vulkan psnr_hvs drift (max abs diff 8.3e-5) is unchanged by this PR. ADR-0252 + research-0051 + AGENTS.md update + CHANGELOG entry + rebase-notes entry 0125 ship in this PR. Closes T-GPU-DEDUP-25, T-GPU-OPT-VK-1, T-GPU-OPT-VK-4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
pushed a commit
that referenced
this pull request
May 3, 2026
…it-side template + fence pool + descriptor pre-alloc Bundles three follow-up optimizations from the 2026-05-02 Vulkan profile (docs/development/vulkan-dedup-profile-2026-05-02.md) into one PR against libvmaf/src/vulkan/kernel_template.h plus the four kernel TUs that have already adopted the template on master: - T-GPU-DEDUP-25: migrate the four kernels onto new vmaf_vulkan_kernel_submit_acquire / _end_and_wait / _free helpers. - T-GPU-OPT-VK-1: VmafVulkanKernelSubmitPool pre-allocates VkFence + VkCommandBuffer pairs in init() and recycles them per frame via vkResetFences + vkResetCommandBuffer (the context-level command pool already carries VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT). - T-GPU-OPT-VK-4: vmaf_vulkan_kernel_descriptor_sets_alloc allocates descriptor sets once in init(); the per-frame path collapses to a single vkUpdateDescriptorSets at init() (buffers are init-time-stable for all four migrated kernels). Migrated kernels: - psnr_hvs_vulkan - vif_vulkan - float_vif_vulkan (worst offender for descriptor churn — 7 sets/frame) - float_adm_vulkan Out of scope (rationale in ADR-0252 §Alternatives considered): - ssimulacra2_vulkan: profile shows it is host-side CPU-bound (1 s/frame scalar XYB conversion to preserve the places=2 cross-backend gate), not Vulkan-API-bound. SIMD vectorisation (T-GPU-OPT-VK-2) is the right follow-up for that kernel. - adm_vulkan, ms_ssim_vulkan: their template-adoption PRs (#287, #289) have not yet merged to master. Migration follows once they land. Bit-exactness preserved: - 48-frame vmaf_float_v0.6.1 max abs diff vs prior Vulkan output: 0.0 - cross_backend_vif_diff places=4 vif: 0/48 mismatches across 4 scales - cross_backend_vif_diff places=4 adm: 0/48 mismatches across 4 scales - All 50 meson tests pass under NVIDIA proprietary ICD. Pre-existing CPU-vs-Vulkan psnr_hvs drift (max abs diff 8.3e-5) is unchanged by this PR. ADR-0252 + research-0051 + AGENTS.md update + CHANGELOG entry + rebase-notes entry 0125 ship in this PR. Closes T-GPU-DEDUP-25, T-GPU-OPT-VK-1, T-GPU-OPT-VK-4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lusoris
added a commit
that referenced
this pull request
May 3, 2026
…it-side template + fence pool + descriptor pre-alloc (#319) * feat(vulkan): T-GPU-DEDUP-25 + T-GPU-OPT-VK-1 + T-GPU-OPT-VK-4 — submit-side template + fence pool + descriptor pre-alloc Bundles three follow-up optimizations from the 2026-05-02 Vulkan profile (docs/development/vulkan-dedup-profile-2026-05-02.md) into one PR against libvmaf/src/vulkan/kernel_template.h plus the four kernel TUs that have already adopted the template on master: - T-GPU-DEDUP-25: migrate the four kernels onto new vmaf_vulkan_kernel_submit_acquire / _end_and_wait / _free helpers. - T-GPU-OPT-VK-1: VmafVulkanKernelSubmitPool pre-allocates VkFence + VkCommandBuffer pairs in init() and recycles them per frame via vkResetFences + vkResetCommandBuffer (the context-level command pool already carries VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT). - T-GPU-OPT-VK-4: vmaf_vulkan_kernel_descriptor_sets_alloc allocates descriptor sets once in init(); the per-frame path collapses to a single vkUpdateDescriptorSets at init() (buffers are init-time-stable for all four migrated kernels). Migrated kernels: - psnr_hvs_vulkan - vif_vulkan - float_vif_vulkan (worst offender for descriptor churn — 7 sets/frame) - float_adm_vulkan Out of scope (rationale in ADR-0252 §Alternatives considered): - ssimulacra2_vulkan: profile shows it is host-side CPU-bound (1 s/frame scalar XYB conversion to preserve the places=2 cross-backend gate), not Vulkan-API-bound. SIMD vectorisation (T-GPU-OPT-VK-2) is the right follow-up for that kernel. - adm_vulkan, ms_ssim_vulkan: their template-adoption PRs (#287, #289) have not yet merged to master. Migration follows once they land. Bit-exactness preserved: - 48-frame vmaf_float_v0.6.1 max abs diff vs prior Vulkan output: 0.0 - cross_backend_vif_diff places=4 vif: 0/48 mismatches across 4 scales - cross_backend_vif_diff places=4 adm: 0/48 mismatches across 4 scales - All 50 meson tests pass under NVIDIA proprietary ICD. Pre-existing CPU-vs-Vulkan psnr_hvs drift (max abs diff 8.3e-5) is unchanged by this PR. ADR-0252 + research-0051 + AGENTS.md update + CHANGELOG entry + rebase-notes entry 0125 ship in this PR. Closes T-GPU-DEDUP-25, T-GPU-OPT-VK-1, T-GPU-OPT-VK-4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: re-trigger after canonical (N) deliverable labels --------- Co-authored-by: Lusoris <lusoris@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ms_ssim_vulkanbecomes a 2-bundle consumer ofvulkan/kernel_template.h. Two distinct pipeline shapes(decimate = 2 SSBO bindings, ssim = 10 bindings) prevent
collapsing to a single bundle —
_add_variant()only siblingspipelines under the same layout. State drops 7 long-lived
pipeline-object fields for two
VmafVulkanKernelPipelinebundles (
pl_decimate+pl_ssim) each owning their owndescriptor pool.
decimate_pipelines[0]aliasespl_decimate.pipeline(base =scale 0). The other 3 decimate slots are siblings via
_add_variant().ssim_pipeline_horiz[0]aliasespl_ssim.pipeline(scale 0, pass 0); the other 9 ssim slots arevariants.
Test plan
ninja -C build-vulkancleanmeson test test_vulkan_smoke test_vulkan_async_pending_fence test_vulkan_pic_preallocation— all greenpre-commit run --files <touched>cleanfloat_ms_ssimNetflix-pair smoke (576×324×48f) mean 0.963241, same as pre-migrationBase branch
This PR targets
refactor/vulkan-template-add-variant(#272) — uses_add_variant()which is not yet on master.Six deep-dive deliverables (ADR-0108)
no digest needed: routine refactorno alternatives: only-one-way fix(distinct DSLs force two bundles; same shape as ssimulacra2 batch follow-up)AGENTS.mdinvariant note:libvmaf/src/vulkan/AGENTS.md— multi-bundle kernels: per-bundle pool + alias-skip on destroy### Changed🤖 Generated with Claude Code