-
Notifications
You must be signed in to change notification settings - Fork 712
[ET-VK] Implement slice as a view #4848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4848
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D61666462 |
## Context TSIA. Implement slice as a view operator. This is only valid under the following conditions: * All dims preceding the sliced dim in the dim order have a size of 1 * start is 0 * step is 1 The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments. To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration. Differential Revision: [D61666462](https://our.internmc.facebook.com/intern/diff/D61666462/) ghstack-source-id: 239353693 Pull Request resolved: #4848
|
This pull request was exported from Phabricator. Differential Revision: D61666462 |
Pull Request resolved: #4848 ## Context TSIA. Implement slice as a view operator. This is only valid under the following conditions: * All dims preceding the sliced dim in the dim order have a size of 1 * start is 0 * step is 1 The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments. To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration. ghstack-source-id: 239403250 @exported-using-ghexport Differential Revision: [D61666462](https://our.internmc.facebook.com/intern/diff/D61666462/)
|
This pull request was exported from Phabricator. Differential Revision: D61666462 |
Pull Request resolved: #4848 ## Context TSIA. Implement slice as a view operator. This is only valid under the following conditions: * All dims preceding the sliced dim in the dim order have a size of 1 * start is 0 * step is 1 The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments. To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration. ghstack-source-id: 239495547 @exported-using-ghexport Differential Revision: [D61666462](https://our.internmc.facebook.com/intern/diff/D61666462/)
* Qualcomm AI Engine Direct - Apply spin quant R1 and R2 Summary: - Add a argument optimized_rotation_path to specify the optimized rotation file - Refer to https://github.com/facebookresearch/SpinQuant?tab=readme-ov-file to apply R1 R2 * remove not used * address review * rename the rotation file to apply_spin_quant_r1_r2 * fix name in TARGETS --------- Co-authored-by: Sheng Feng Wu <shewu@qti.qualcomm.com>
Differential Revision: D62278416 Pull Request resolved: #5141
Differential Revision: D62417216 Pull Request resolved: #5213
Summary: - Utility to skip operator annotation, unskipped nodes will be gathered into submodules and lowered with quantization annotation. Skipped nodes could either fallback to cpu or delegated with HTP fp16. - Fix uplevel breakage. - Refactor & retire some outdated implmentation.
Differential Revision: D62428363 Pull Request resolved: #5220
Differential Revision: D62420000 Pull Request resolved: #5147
Differential Revision: D62402292 Pull Request resolved: #5200
…ma (#5221) * Qualcomm AI Engine Direct - Fixed the order of the transforms for llama * fixed ci --------- Co-authored-by: Sheng Feng Wu <shewu@qti.qualcomm.com>
Differential Revision: D62408596 Pull Request resolved: #5204
Differential Revision: D62411342 Pull Request resolved: #5224
Differential Revision: D62329462 Pull Request resolved: #5158
Differential Revision: D60601742 Pull Request resolved: #5121
Differential Revision: D62459696 Pull Request resolved: #5234
Summary: This logs the metrics from the size command when building with run.sh Pull Request resolved: #5342 Reviewed By: manuelcandales Differential Revision: D62874679 Pulled By: digantdesai fbshipit-source-id: f69bfa12c48101e540e684a590f78b546903cb42
Summary: Adding "px" unit for PyTorch site (i.e. https://pytorch.org/executorch/main/llm/llama-demo-android.html) will have same image widths as readme in github Pull Request resolved: #5540 Reviewed By: Riandy, kirklandsign Differential Revision: D63226892 Pulled By: cmodi-meta fbshipit-source-id: 5cfa30ee0ab156c1e004405cdc7dd99a0f61d2c2
Summary: The code under examples/... is a proxy for user code, and users should never declare code under the `torch::` or `executorch::` namespaces. Move this code under the `example::` namespace to make it more clear that users should use their own namespaces when writing code like this. Pull Request resolved: #5478 Test Plan: - Built using the instructions at https://github.com/pytorch/executorch/blob/main/examples/mediatek/README.md Reviewed By: JacobSzwejbka, cccclai Differential Revision: D62992974 Pulled By: dbort fbshipit-source-id: b01f1b33d2853a0555ae19d79769a5bb6d0ba853
Summary: examples/ code should use the new `executorch::` namespaces. Pull Request resolved: #5516 Test Plan: Built the app using the instructions at https://github.com/pytorch/executorch/blob/main/examples/demo-apps/apple_ios/LLaMA/README.md Reviewed By: larryliu0820 Differential Revision: D63138639 Pulled By: dbort fbshipit-source-id: fffb6d35d425dd733eead1b24ee8b9f2831e65c0
Summary: Pull Request resolved: #5546 The last prompt sent would be included in `getConversationHistory()` + adding it prior to sending it with the generate(). It looks like this got move during the rebasing. To fix this we now call `getConversationHistory()` prior to adding the rawPrompt to a Message. In regards to model response, I noticed that it did not really change the quality of the response. (tested with Llama 3.1) Reviewed By: Riandy Differential Revision: D62761977 fbshipit-source-id: 2f975983965fe837147f1ffb8b5dcfa8f2061895
Summary: Example code should use the new `executorch::` namespace wherever possible, and should not define code under the `torch::` namespace. Pull Request resolved: #5512 Test Plan: - Built llava changes with `bash .ci/scripts/test_llava.sh` Reviewed By: JacobSzwejbka, larryliu0820 Differential Revision: D63133181 Pulled By: dbort fbshipit-source-id: 5796b85eef053f3b3e4ba0e27a3a26ae48747b5a
Summary: Use the names in the new `executorch::` namespace. Pull Request resolved: #5495 Test Plan: ``` ./examples/devtools/build_example_runner.sh ``` Reviewed By: larryliu0820 Differential Revision: D63047148 Pulled By: dbort fbshipit-source-id: e0e3af1c130aaf409ecc142c28d75f0a44d88fa3
Summary: Pull Request resolved: #5548 Converting the input to and from float32 is faster than not using the op. h/t to torchchat, which does this already (though it had a bug, which I sent a patch for). Reviewed By: kimishpatel Differential Revision: D63158951 fbshipit-source-id: 58c90d141ee403536c03a3b731f8547790fc9440
Summary: This PR adds a CI job for phi-3-mini Pull Request resolved: #5532 Test Plan: The CI Job is green: https://github.com/pytorch/executorch/actions/runs/10967809307/job/30458161933?pr=5532 Reviewed By: iseeyuan Differential Revision: D63157703 Pulled By: helunwencser fbshipit-source-id: fc7f54e166062443f396e7a304712f7b60e5db90
Summary: Preview in GitHub for consistency: https://github.com/pytorch/executorch/blob/fix-images/examples/demo-apps/android/LlamaDemo/README.md Pull Request resolved: #5550 Test Plan: Doc preview: https://docs-preview.pytorch.org/pytorch/executorch/5550/llm/llama-demo-android.html Rendered GitHub preview: https://github.com/pytorch/executorch/blob/fix-images/examples/demo-apps/android/LlamaDemo/README.md?rgh-link-date=2024-09-23T20%3A12%3A32Z Reviewed By: cmodi-meta, kirklandsign Differential Revision: D63281004 Pulled By: svekars fbshipit-source-id: cc5710624cb9bdab9056558c94f127b3bc12b96c
…5515) Summary: Pull Request resolved: #5515 Storing QMat2 in a texture gives way to two main problems: - Indexing is a mess and additional computation is required to take into account the fact that we are reading ivec4's and only using half of the values - There is no texel fetching in int8. The texel is read in int32 and needs to be casted Keeping QMat2 in a buffer performs better because, although reading from buffers is slower, removing the extra computation compensates for this. {F1863459327} This diff also moves the scales_and_zeros tensor to Channels Packed in texture implementations because it just makes more sense, I had done some terrible indexing shennanigans before. ghstack-source-id: 244258611 exported-using-ghexport Reviewed By: yipjustin Differential Revision: D62504978 fbshipit-source-id: df2fdf87f75140be0a316576c8ffad67feefd6d7
Summary: Pull Request resolved: #5499 Seems to block bfloat16 stories110M as exported by torchchat (and we should have op coverage for bfloat16 anyway). ghstack-source-id: 243857968 Reviewed By: larryliu0820 Differential Revision: D63054001 fbshipit-source-id: 530b479872643f878912592c7b260d71e6e05804
Summary: Pull Request resolved: #5500 ghstack-source-id: 243857969 Reviewed By: digantdesai, larryliu0820 Differential Revision: D63057744 fbshipit-source-id: 9e1fb6f6479adb1575c5aed61b9da3c774586ba3
Summary: Pull Request resolved: #5519 Discovered these were missing while trying to use the following diff. ghstack-source-id: 243867517 exported-using-ghexport Reviewed By: digantdesai Differential Revision: D63147276 fbshipit-source-id: bf75fb0fe452e2e68a34271ca1250cdb90657e5a
Summary: Pull Request resolved: #5553 Adding support to load Llama Guard model and run prompt classification task Reviewed By: cmodi-meta, kirklandsign Differential Revision: D63148252 fbshipit-source-id: 482559e694da05bdec75b9a2dbd76163c686e47d
Summary: Pull Request resolved: #5560 ## Context Refactor operator test code generation scripts, such that components can be re-used to generate operator benchmarks. In broad strokes, the refactors implemented by this diff are as follows: * Improve granularity of Python modules * Replace `test` with `correctness_test`, to make it clear that we are generating correctness tests. **Note that I haven't changed the top level target name `compute_graph_op_tests_bin` since I believe it would be too verbose. ghstack-source-id: 244283559 exported-using-ghexport Reviewed By: nathanaelsee Differential Revision: D63286131 fbshipit-source-id: 1177ea381e6381045f1c97491dd7ec006690f574
Summary: Pull Request resolved: #5561 ## Context Use the automatic test generation infrastructure to generate operator benchmarks. The overall concept is the same as the test generation; we just structure the generated code in the style of the google benchmark library instead of GTEST. ghstack-source-id: 244287193 Reviewed By: derekxu, nathanaelsee Differential Revision: D63286132 fbshipit-source-id: 25c379accf6664dfca8232db81772b638b41c758
Summary: Add separate tests for Ethos-U85 to all backend operator tests. Updated ethos-u-vela version to support more operators. Signed-off-by: Per Åstrand <[per.astrand@arm.com](mailto:per.astrand@arm.com)> Signed-off-by: Tom Allsop <[tom.allsop@arm.com](mailto:tom.allsop@arm.com)> Pull Request resolved: #5346 Reviewed By: manuelcandales Differential Revision: D62875027 Pulled By: digantdesai fbshipit-source-id: 3bf238d81957258ee93ae235d575beff8a575191
Summary: Pull Request resolved: #5565 Swap to using method meta so we can be finer grained about this check Reviewed By: dbort Differential Revision: D62983475 fbshipit-source-id: c4599c5ecad0409cd8b2670464c4e9e8809b49ad
## Context TSIA. Implement slice as a view operator. This is only valid under the following conditions: * All dims preceding the sliced dim in the dim order have a size of 1 * start is 0 * step is 1 The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments. To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration. Differential Revision: [D61666462](https://our.internmc.facebook.com/intern/diff/D61666462/) [ghstack-poisoned]
## Context TSIA. Implement slice as a view operator. This is only valid under the following conditions: * All dims preceding the sliced dim in the dim order have a size of 1 * start is 0 * step is 1 The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments. To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration. Differential Revision: [D61666462](https://our.internmc.facebook.com/intern/diff/D61666462/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D61666462 |
## Context TSIA. Implement slice as a view operator. This is only valid under the following conditions: * All dims preceding the sliced dim in the dim order have a size of 1 * start is 0 * step is 1 The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments. To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration. Differential Revision: [D61666462](https://our.internmc.facebook.com/intern/diff/D61666462/) [ghstack-poisoned]
## Context TSIA. Implement slice as a view operator. This is only valid under the following conditions: * All dims preceding the sliced dim in the dim order have a size of 1 * start is 0 * step is 1 The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments. To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration. Differential Revision: [D61666462](https://our.internmc.facebook.com/intern/diff/D61666462/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D61666462 |
Stack from ghstack (oldest at bottom):
MemoryAccessFlagsinstead ofMemoryAccessTypewhen binding #5583Context
TSIA. Implement slice as a view operator. This is only valid under the following conditions:
The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments.
To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration.
Differential Revision: D61666462