[ET-VK] Fix pack_fp_linear_weight for devices without VK_KHR_16bit_storage by SS-JIA · Pull Request #18642 · pytorch/executorch

SS-JIA · 2026-04-01T18:29:23Z

Stack from ghstack (oldest at bottom):

-> [ET-VK] Fix pack_fp_linear_weight for devices without VK_KHR_16bit_storage #18642

The pack_fp_linear_weight prepack shader crashes on devices that lack
VK_KHR_16bit_storage support because the half-precision variant reads from a
float16_t[] staging buffer, which requires that extension.

This applies the same two-dtype pattern used by nchw_to_image and
conv2d_dw_prepack_weights: a new BUF_DTYPE shader parameter allows the
staging buffer to use float32 ([half, float] combo) while the packed output
remains half-precision. The runtime selects the correct variant via
get_staging_dtype_for(), which returns kFloat when the device lacks fp16
buffer support.

All three call sites that construct the pack_fp_linear_weight shader name
(Linear.cpp, Conv1dPW.cpp, Conv2dPW.cpp) are updated to append the staging
dtype suffix.

Authored with Claude.

Differential Revision: D99133993

…orage The `pack_fp_linear_weight` prepack shader crashes on devices that lack `VK_KHR_16bit_storage` support because the half-precision variant reads from a `float16_t[]` staging buffer, which requires that extension. This applies the same two-dtype pattern used by `nchw_to_image` and `conv2d_dw_prepack_weights`: a new `BUF_DTYPE` shader parameter allows the staging buffer to use float32 (`[half, float]` combo) while the packed output remains half-precision. The runtime selects the correct variant via `get_staging_dtype_for()`, which returns `kFloat` when the device lacks fp16 buffer support. All three call sites that construct the `pack_fp_linear_weight` shader name (Linear.cpp, Conv1dPW.cpp, Conv2dPW.cpp) are updated to append the staging dtype suffix. Authored with Claude. Differential Revision: [D99133993](https://our.internmc.facebook.com/intern/diff/D99133993/) [ghstack-poisoned]

pytorch-bot · 2026-04-01T18:29:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18642

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Pending, 2 Unrelated Failures

As of commit 44b193b with merge base ad235f8 ():

NEW FAILURE - The following job has failed:

Build Presets / apple (ios) / build (gh)
The process '/opt/homebrew/bin/git' failed with exit code 128

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…orage The `pack_fp_linear_weight` prepack shader crashes on devices that lack `VK_KHR_16bit_storage` support because the half-precision variant reads from a `float16_t[]` staging buffer, which requires that extension. This applies the same two-dtype pattern used by `nchw_to_image` and `conv2d_dw_prepack_weights`: a new `BUF_DTYPE` shader parameter allows the staging buffer to use float32 (`[half, float]` combo) while the packed output remains half-precision. The runtime selects the correct variant via `get_staging_dtype_for()`, which returns `kFloat` when the device lacks fp16 buffer support. All three call sites that construct the `pack_fp_linear_weight` shader name (Linear.cpp, Conv1dPW.cpp, Conv2dPW.cpp) are updated to append the staging dtype suffix. Authored with Claude. Differential Revision: [D99133993](https://our.internmc.facebook.com/intern/diff/D99133993/) ghstack-source-id: 361148853 Pull Request resolved: #18642

github-actions · 2026-04-01T18:30:21Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@SS-JIA

…orage (#18653) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #18642 by @SS-JIA ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/515/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/515/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/515/orig Differential Revision: [D99133993](https://our.internmc.facebook.com/intern/diff/D99133993/) @diff-train-skip-merge Co-authored-by: ssjia <ssjia@devvm26340.ftw0.facebook.com>

@SS-JIA

…orage (pytorch#18653) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: pytorch#18642 by @SS-JIA ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/515/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/515/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/515/orig Differential Revision: [D99133993](https://our.internmc.facebook.com/intern/diff/D99133993/) @diff-train-skip-merge Co-authored-by: ssjia <ssjia@devvm26340.ftw0.facebook.com>

@SS-JIA

…orage (#18653) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #18642 by @SS-JIA ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/515/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/515/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/515/orig Differential Revision: [D99133993](https://our.internmc.facebook.com/intern/diff/D99133993/) @diff-train-skip-merge Co-authored-by: ssjia <ssjia@devvm26340.ftw0.facebook.com>

@SS-JIA

…orage (#18653) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #18642 by @SS-JIA ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/515/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/515/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/515/orig Differential Revision: [D99133993](https://our.internmc.facebook.com/intern/diff/D99133993/) @diff-train-skip-merge Co-authored-by: ssjia <ssjia@devvm26340.ftw0.facebook.com>

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 1, 2026

meta-codesync Bot added fb-exported meta-exported labels Apr 1, 2026

manuelcandales approved these changes Apr 1, 2026

View reviewed changes

meta-codesync Bot merged commit c979227 into gh/SS-JIA/515/base Apr 1, 2026
160 of 168 checks passed

meta-codesync Bot deleted the gh/SS-JIA/515/head branch April 1, 2026 21:16

meta-codesync Bot temporarily deployed to cherry-pick-bot April 1, 2026 21:16 Inactive

pytorchbot mentioned this pull request Apr 1, 2026

[ET-VK] Fix pack_fp_linear_weight for devices without VK_KHR_16bit_storage #18653

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK] Fix pack_fp_linear_weight for devices without VK_KHR_16bit_storage#18642

[ET-VK] Fix pack_fp_linear_weight for devices without VK_KHR_16bit_storage#18642
meta-codesync[bot] merged 1 commit intogh/SS-JIA/515/basefrom
gh/SS-JIA/515/head

SS-JIA commented Apr 1, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SS-JIA commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18642

❌ 1 New Failure, 1 Pending, 2 Unrelated Failures

Uh oh!

github-actions Bot commented Apr 1, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SS-JIA commented Apr 1, 2026 •

edited

Loading

pytorch-bot Bot commented Apr 1, 2026 •

edited

Loading

This PR needs a `release notes:` label