-
Notifications
You must be signed in to change notification settings - Fork 722
[ET-VK] Better separation of quantized vs non-quantized memory layouts #15794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
As title. The current implementation of split_with_sizes uses functions from the `Copy.[h|cpp]` file in particular `add_copy_channel_offset_node`. However, the shaders dispatched by this function have a critical bug where the output tensor is passed in separately with difference access types, i.e.
```cpp
graph.execute_nodes().emplace_back(new DispatchNode(
graph,
VK_KERNEL_FROM_STR(kernel_name),
global_size,
local_size,
// Inputs and Outputs
{
{out, vkapi::kWrite},
{out, vkapi::kRead},
{in, vkapi::kRead},
},
```
This creates many validation layer errors because the memory barriers for the resource cannot be formed properly. The shader essentially relies on undefined behaviour to work correctly
To fix, this diff re-implements the operator from scratch with a dedicated compute shader.
Differential Revision: [D86910642](https://our.internmc.facebook.com/intern/diff/D86910642/)
[ghstack-poisoned]
As title. Make sure that ops that do not support quantized tensors do not get assigned memory layouts that are intended for quantized tensors. Differential Revision: [D86910639](https://our.internmc.facebook.com/intern/diff/D86910639/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15794
Note: Links to docs will display an error until the docs builds have been completed. ❌ 10 New Failures, 4 Unrelated FailuresAs of commit 4babf9c with merge base 7600df8 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
…non-quantized memory layouts" As title. Make sure that ops that do not support quantized tensors do not get assigned memory layouts that are intended for quantized tensors. Differential Revision: [D86910639](https://our.internmc.facebook.com/intern/diff/D86910639/) [ghstack-poisoned]
…mory layouts" As title. Make sure that ops that do not support quantized tensors do not get assigned memory layouts that are intended for quantized tensors. Differential Revision: [D86910639](https://our.internmc.facebook.com/intern/diff/D86910639/) [ghstack-poisoned]
…non-quantized memory layouts" As title. Make sure that ops that do not support quantized tensors do not get assigned memory layouts that are intended for quantized tensors. Differential Revision: [D86910639](https://our.internmc.facebook.com/intern/diff/D86910639/) [ghstack-poisoned]
…mory layouts" As title. Make sure that ops that do not support quantized tensors do not get assigned memory layouts that are intended for quantized tensors. Differential Revision: [D86910639](https://our.internmc.facebook.com/intern/diff/D86910639/) [ghstack-poisoned]
…non-quantized memory layouts" As title. Make sure that ops that do not support quantized tensors do not get assigned memory layouts that are intended for quantized tensors. Differential Revision: [D86910639](https://our.internmc.facebook.com/intern/diff/D86910639/) [ghstack-poisoned]
…mory layouts" As title. Make sure that ops that do not support quantized tensors do not get assigned memory layouts that are intended for quantized tensors. Differential Revision: [D86910639](https://our.internmc.facebook.com/intern/diff/D86910639/) [ghstack-poisoned]
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #15829 * #15796 * #15795 * #15794 * __->__ #15793 As title. The current implementation of split_with_sizes uses functions from the `Copy.[h|cpp]` file in particular `add_copy_channel_offset_node`. However, the shaders dispatched by this function have a critical bug where the output tensor is passed in separately with difference access types, i.e. ```cpp graph.execute_nodes().emplace_back(new DispatchNode( graph, VK_KERNEL_FROM_STR(kernel_name), global_size, local_size, // Inputs and Outputs { {out, vkapi::kWrite}, {out, vkapi::kRead}, {in, vkapi::kRead}, }, ``` This creates many validation layer errors because the memory barriers for the resource cannot be formed properly. The shader essentially relies on undefined behaviour to work correctly To fix, this diff re-implements the operator from scratch with a dedicated compute shader. Differential Revision: [D86910642](https://our.internmc.facebook.com/intern/diff/D86910642/) --------- Co-authored-by: ssjia <ssjia@devvm26340.ftw0.facebook.com>
) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #15829 * #15796 * __->__ #15795 * #15794 * #15793 Title says it all! Add two additional export options: 1. `skip_memory_planning` - skips the memory planning pass which can be useful for debugging. 2. `small_texture_limits` - sets the default texture limit to be (2048, 2048, 2048) which is compatible with more devices (i.e. desktop/laptop GPUs) compared to the default (16384, 16384, 2048) which is more targeted for mobile GPUs Also adds some improvements to the export script that were made while debugging the `YOLO_NAS` model (#15700) Differential Revision: [D86910640](https://our.internmc.facebook.com/intern/diff/D86910640/) --------- Co-authored-by: ssjia <ssjia@devvm26340.ftw0.facebook.com>
…eMetadata (#15796) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #15829 * __->__ #15796 * #15795 * #15794 * #15793 Title says it all! Motivation: code simplification and allows these ops to handle high dim tensors. Differential Revision: [D86910641](https://our.internmc.facebook.com/intern/diff/D86910641/) --------- Co-authored-by: ssjia <ssjia@devvm26340.ftw0.facebook.com>
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #15829 * #15796 * #15795 * #15794 * #15793 Title says it all! Adds `int32` and `uint8` shader variants to a bunch of operators that don't currently have variants for these dtypes, but should. This should prevent folks from running into dtype crashes at runtime when using the Vulkan delegate. Differential Revision: [D87082724](https://our.internmc.facebook.com/intern/diff/D87082724/) Co-authored-by: ssjia <ssjia@devvm1479.ncg0.facebook.com>
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #15829 * #15796 * #15795 * #15794 * #15793 Title says it all! Adds `int32` and `uint8` shader variants to a bunch of operators that don't currently have variants for these dtypes, but should. This should prevent folks from running into dtype crashes at runtime when using the Vulkan delegate. Differential Revision: [D87082724](https://our.internmc.facebook.com/intern/diff/D87082724/) Co-authored-by: ssjia <ssjia@devvm1479.ncg0.facebook.com> (cherry picked from commit a6c5921)
Stack from ghstack (oldest at bottom):
As title. Make sure that ops that do not support quantized tensors do not get assigned memory layouts that are intended for quantized tensors.
Differential Revision: D86910639