-
Notifications
You must be signed in to change notification settings - Fork 722
[ET-VK][ez] Migrate slice/select shaders to use BufferMetadata/TextureMetadata #15796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
As title. The current implementation of split_with_sizes uses functions from the `Copy.[h|cpp]` file in particular `add_copy_channel_offset_node`. However, the shaders dispatched by this function have a critical bug where the output tensor is passed in separately with difference access types, i.e.
```cpp
graph.execute_nodes().emplace_back(new DispatchNode(
graph,
VK_KERNEL_FROM_STR(kernel_name),
global_size,
local_size,
// Inputs and Outputs
{
{out, vkapi::kWrite},
{out, vkapi::kRead},
{in, vkapi::kRead},
},
```
This creates many validation layer errors because the memory barriers for the resource cannot be formed properly. The shader essentially relies on undefined behaviour to work correctly
To fix, this diff re-implements the operator from scratch with a dedicated compute shader.
Differential Revision: [D86910642](https://our.internmc.facebook.com/intern/diff/D86910642/)
[ghstack-poisoned]
As title. Make sure that ops that do not support quantized tensors do not get assigned memory layouts that are intended for quantized tensors. Differential Revision: [D86910639](https://our.internmc.facebook.com/intern/diff/D86910639/) [ghstack-poisoned]
Title says it all! Add two additional export options: 1. `skip_memory_planning` - skips the memory planning pass which can be useful for debugging. 2. `small_texture_limits` - sets the default texture limit to be (2048, 2048, 2048) which is compatible with more devices (i.e. desktop/laptop GPUs) compared to the default (16384, 16384, 2048) which is more targeted for mobile GPUs Also adds some improvements to the export script that were made while debugging the `YOLO_NAS` model (#15700) Differential Revision: [D86910640](https://our.internmc.facebook.com/intern/diff/D86910640/) [ghstack-poisoned]
…eMetadata Title says it all! Motivation: code simplification and allows these ops to handle high dim tensors. Differential Revision: [D86910641](https://our.internmc.facebook.com/intern/diff/D86910641/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15796
Note: Links to docs will display an error until the docs builds have been completed. ❌ 9 New Failures, 4 Unrelated FailuresAs of commit b6868b9 with merge base 7600df8 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…eMetadata Title says it all! Motivation: code simplification and allows these ops to handle high dim tensors. Differential Revision: [D86910641](https://our.internmc.facebook.com/intern/diff/D86910641/) ghstack-source-id: 322864453 Pull Request resolved: #15796
This PR needs a
|
…o use BufferMetadata/TextureMetadata" Title says it all! Motivation: code simplification and allows these ops to handle high dim tensors. Differential Revision: [D86910641](https://our.internmc.facebook.com/intern/diff/D86910641/) [ghstack-poisoned]
…data/TextureMetadata" Title says it all! Motivation: code simplification and allows these ops to handle high dim tensors. Differential Revision: [D86910641](https://our.internmc.facebook.com/intern/diff/D86910641/) [ghstack-poisoned]
…eMetadata Pull Request resolved: #15796 Title says it all! Motivation: code simplification and allows these ops to handle high dim tensors. ghstack-source-id: 323042847 @exported-using-ghexport Differential Revision: [D86910641](https://our.internmc.facebook.com/intern/diff/D86910641/)
…o use BufferMetadata/TextureMetadata" Title says it all! Motivation: code simplification and allows these ops to handle high dim tensors. Differential Revision: [D86910641](https://our.internmc.facebook.com/intern/diff/D86910641/) [ghstack-poisoned]
…data/TextureMetadata" Title says it all! Motivation: code simplification and allows these ops to handle high dim tensors. Differential Revision: [D86910641](https://our.internmc.facebook.com/intern/diff/D86910641/) [ghstack-poisoned]
…eMetadata Pull Request resolved: #15796 Title says it all! Motivation: code simplification and allows these ops to handle high dim tensors. ghstack-source-id: 323216101 @exported-using-ghexport Differential Revision: [D86910641](https://our.internmc.facebook.com/intern/diff/D86910641/)
…o use BufferMetadata/TextureMetadata" Title says it all! Motivation: code simplification and allows these ops to handle high dim tensors. Differential Revision: [D86910641](https://our.internmc.facebook.com/intern/diff/D86910641/) [ghstack-poisoned]
…data/TextureMetadata" Title says it all! Motivation: code simplification and allows these ops to handle high dim tensors. Differential Revision: [D86910641](https://our.internmc.facebook.com/intern/diff/D86910641/) [ghstack-poisoned]
…eMetadata Pull Request resolved: #15796 Title says it all! Motivation: code simplification and allows these ops to handle high dim tensors. ghstack-source-id: 323317727 @exported-using-ghexport Differential Revision: [D86910641](https://our.internmc.facebook.com/intern/diff/D86910641/)
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #15829 * #15796 * #15795 * #15794 * __->__ #15793 As title. The current implementation of split_with_sizes uses functions from the `Copy.[h|cpp]` file in particular `add_copy_channel_offset_node`. However, the shaders dispatched by this function have a critical bug where the output tensor is passed in separately with difference access types, i.e. ```cpp graph.execute_nodes().emplace_back(new DispatchNode( graph, VK_KERNEL_FROM_STR(kernel_name), global_size, local_size, // Inputs and Outputs { {out, vkapi::kWrite}, {out, vkapi::kRead}, {in, vkapi::kRead}, }, ``` This creates many validation layer errors because the memory barriers for the resource cannot be formed properly. The shader essentially relies on undefined behaviour to work correctly To fix, this diff re-implements the operator from scratch with a dedicated compute shader. Differential Revision: [D86910642](https://our.internmc.facebook.com/intern/diff/D86910642/) --------- Co-authored-by: ssjia <ssjia@devvm26340.ftw0.facebook.com>
#15794) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #15829 * #15796 * #15795 * __->__ #15794 * #15793 As title. Make sure that ops that do not support quantized tensors do not get assigned memory layouts that are intended for quantized tensors. Differential Revision: [D86910639](https://our.internmc.facebook.com/intern/diff/D86910639/) --------- Co-authored-by: ssjia <ssjia@devvm26340.ftw0.facebook.com>
) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #15829 * #15796 * __->__ #15795 * #15794 * #15793 Title says it all! Add two additional export options: 1. `skip_memory_planning` - skips the memory planning pass which can be useful for debugging. 2. `small_texture_limits` - sets the default texture limit to be (2048, 2048, 2048) which is compatible with more devices (i.e. desktop/laptop GPUs) compared to the default (16384, 16384, 2048) which is more targeted for mobile GPUs Also adds some improvements to the export script that were made while debugging the `YOLO_NAS` model (#15700) Differential Revision: [D86910640](https://our.internmc.facebook.com/intern/diff/D86910640/) --------- Co-authored-by: ssjia <ssjia@devvm26340.ftw0.facebook.com>
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #15829 * #15796 * #15795 * #15794 * #15793 Title says it all! Adds `int32` and `uint8` shader variants to a bunch of operators that don't currently have variants for these dtypes, but should. This should prevent folks from running into dtype crashes at runtime when using the Vulkan delegate. Differential Revision: [D87082724](https://our.internmc.facebook.com/intern/diff/D87082724/) Co-authored-by: ssjia <ssjia@devvm1479.ncg0.facebook.com>
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #15829 * #15796 * #15795 * #15794 * #15793 Title says it all! Adds `int32` and `uint8` shader variants to a bunch of operators that don't currently have variants for these dtypes, but should. This should prevent folks from running into dtype crashes at runtime when using the Vulkan delegate. Differential Revision: [D87082724](https://our.internmc.facebook.com/intern/diff/D87082724/) Co-authored-by: ssjia <ssjia@devvm1479.ncg0.facebook.com> (cherry picked from commit a6c5921)
Stack from ghstack (oldest at bottom):
Title says it all!
Motivation: code simplification and allows these ops to handle high dim tensors.
Differential Revision: D86910641