[ET-VK] Implement slice as a view #4848

SS-JIA · 2024-08-22T18:51:15Z

Stack from ghstack (oldest at bottom):

Context

TSIA. Implement slice as a view operator. This is only valid under the following conditions:

All dims preceding the sliced dim in the dim order have a size of 1
start is 0
step is 1

The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments.

To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration.

Differential Revision: D61666462

pytorch-bot · 2024-08-22T18:51:18Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4848

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-08-22T18:51:59Z

This pull request was exported from Phabricator. Differential Revision: D61666462

## Context TSIA. Implement slice as a view operator. This is only valid under the following conditions: * All dims preceding the sliced dim in the dim order have a size of 1 * start is 0 * step is 1 The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments. To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration. Differential Revision: [D61666462](https://our.internmc.facebook.com/intern/diff/D61666462/) ghstack-source-id: 239353693 Pull Request resolved: #4848

facebook-github-bot · 2024-08-22T22:52:18Z

This pull request was exported from Phabricator. Differential Revision: D61666462

Pull Request resolved: #4848 ## Context TSIA. Implement slice as a view operator. This is only valid under the following conditions: * All dims preceding the sliced dim in the dim order have a size of 1 * start is 0 * step is 1 The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments. To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration. ghstack-source-id: 239403250 @exported-using-ghexport Differential Revision: [D61666462](https://our.internmc.facebook.com/intern/diff/D61666462/)

facebook-github-bot · 2024-08-23T16:23:54Z

This pull request was exported from Phabricator. Differential Revision: D61666462

Pull Request resolved: #4848 ## Context TSIA. Implement slice as a view operator. This is only valid under the following conditions: * All dims preceding the sliced dim in the dim order have a size of 1 * start is 0 * step is 1 The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments. To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration. ghstack-source-id: 239495547 @exported-using-ghexport Differential Revision: [D61666462](https://our.internmc.facebook.com/intern/diff/D61666462/)

* Qualcomm AI Engine Direct - Apply spin quant R1 and R2 Summary: - Add a argument optimized_rotation_path to specify the optimized rotation file - Refer to https://github.com/facebookresearch/SpinQuant?tab=readme-ov-file to apply R1 R2 * remove not used * address review * rename the rotation file to apply_spin_quant_r1_r2 * fix name in TARGETS --------- Co-authored-by: Sheng Feng Wu <shewu@qti.qualcomm.com>

Differential Revision: D62278416 Pull Request resolved: #5141

Differential Revision: D62417216 Pull Request resolved: #5213

Summary: - Utility to skip operator annotation, unskipped nodes will be gathered into submodules and lowered with quantization annotation. Skipped nodes could either fallback to cpu or delegated with HTP fp16. - Fix uplevel breakage. - Refactor & retire some outdated implmentation.

Differential Revision: D62428363 Pull Request resolved: #5220

Differential Revision: D62420000 Pull Request resolved: #5147

Differential Revision: D62402292 Pull Request resolved: #5200

…ma (#5221) * Qualcomm AI Engine Direct - Fixed the order of the transforms for llama * fixed ci --------- Co-authored-by: Sheng Feng Wu <shewu@qti.qualcomm.com>

Differential Revision: D62408596 Pull Request resolved: #5204

Differential Revision: D62411342 Pull Request resolved: #5224

Differential Revision: D62329462 Pull Request resolved: #5158

Differential Revision: D60601742 Pull Request resolved: #5121

Differential Revision: D62459696 Pull Request resolved: #5234

Summary: This logs the metrics from the size command when building with run.sh Pull Request resolved: #5342 Reviewed By: manuelcandales Differential Revision: D62874679 Pulled By: digantdesai fbshipit-source-id: f69bfa12c48101e540e684a590f78b546903cb42

Summary: Adding "px" unit for PyTorch site (i.e. https://pytorch.org/executorch/main/llm/llama-demo-android.html) will have same image widths as readme in github Pull Request resolved: #5540 Reviewed By: Riandy, kirklandsign Differential Revision: D63226892 Pulled By: cmodi-meta fbshipit-source-id: 5cfa30ee0ab156c1e004405cdc7dd99a0f61d2c2

Summary: The code under examples/... is a proxy for user code, and users should never declare code under the `torch::` or `executorch::` namespaces. Move this code under the `example::` namespace to make it more clear that users should use their own namespaces when writing code like this. Pull Request resolved: #5478 Test Plan: - Built using the instructions at https://github.com/pytorch/executorch/blob/main/examples/mediatek/README.md Reviewed By: JacobSzwejbka, cccclai Differential Revision: D62992974 Pulled By: dbort fbshipit-source-id: b01f1b33d2853a0555ae19d79769a5bb6d0ba853

Summary: examples/ code should use the new `executorch::` namespaces. Pull Request resolved: #5516 Test Plan: Built the app using the instructions at https://github.com/pytorch/executorch/blob/main/examples/demo-apps/apple_ios/LLaMA/README.md Reviewed By: larryliu0820 Differential Revision: D63138639 Pulled By: dbort fbshipit-source-id: fffb6d35d425dd733eead1b24ee8b9f2831e65c0

Summary: Pull Request resolved: #5546 The last prompt sent would be included in `getConversationHistory()` + adding it prior to sending it with the generate(). It looks like this got move during the rebasing. To fix this we now call `getConversationHistory()` prior to adding the rawPrompt to a Message. In regards to model response, I noticed that it did not really change the quality of the response. (tested with Llama 3.1) Reviewed By: Riandy Differential Revision: D62761977 fbshipit-source-id: 2f975983965fe837147f1ffb8b5dcfa8f2061895

Summary: Example code should use the new `executorch::` namespace wherever possible, and should not define code under the `torch::` namespace. Pull Request resolved: #5512 Test Plan: - Built llava changes with `bash .ci/scripts/test_llava.sh` Reviewed By: JacobSzwejbka, larryliu0820 Differential Revision: D63133181 Pulled By: dbort fbshipit-source-id: 5796b85eef053f3b3e4ba0e27a3a26ae48747b5a

Summary: Use the names in the new `executorch::` namespace. Pull Request resolved: #5495 Test Plan: ``` ./examples/devtools/build_example_runner.sh ``` Reviewed By: larryliu0820 Differential Revision: D63047148 Pulled By: dbort fbshipit-source-id: e0e3af1c130aaf409ecc142c28d75f0a44d88fa3

Summary: Pull Request resolved: #5548 Converting the input to and from float32 is faster than not using the op. h/t to torchchat, which does this already (though it had a bug, which I sent a patch for). Reviewed By: kimishpatel Differential Revision: D63158951 fbshipit-source-id: 58c90d141ee403536c03a3b731f8547790fc9440

Summary: This PR adds a CI job for phi-3-mini Pull Request resolved: #5532 Test Plan: The CI Job is green: https://github.com/pytorch/executorch/actions/runs/10967809307/job/30458161933?pr=5532 Reviewed By: iseeyuan Differential Revision: D63157703 Pulled By: helunwencser fbshipit-source-id: fc7f54e166062443f396e7a304712f7b60e5db90

Summary: Preview in GitHub for consistency: https://github.com/pytorch/executorch/blob/fix-images/examples/demo-apps/android/LlamaDemo/README.md Pull Request resolved: #5550 Test Plan: Doc preview: https://docs-preview.pytorch.org/pytorch/executorch/5550/llm/llama-demo-android.html Rendered GitHub preview: https://github.com/pytorch/executorch/blob/fix-images/examples/demo-apps/android/LlamaDemo/README.md?rgh-link-date=2024-09-23T20%3A12%3A32Z Reviewed By: cmodi-meta, kirklandsign Differential Revision: D63281004 Pulled By: svekars fbshipit-source-id: cc5710624cb9bdab9056558c94f127b3bc12b96c

…5515) Summary: Pull Request resolved: #5515 Storing QMat2 in a texture gives way to two main problems: - Indexing is a mess and additional computation is required to take into account the fact that we are reading ivec4's and only using half of the values - There is no texel fetching in int8. The texel is read in int32 and needs to be casted Keeping QMat2 in a buffer performs better because, although reading from buffers is slower, removing the extra computation compensates for this. {F1863459327} This diff also moves the scales_and_zeros tensor to Channels Packed in texture implementations because it just makes more sense, I had done some terrible indexing shennanigans before. ghstack-source-id: 244258611 exported-using-ghexport Reviewed By: yipjustin Differential Revision: D62504978 fbshipit-source-id: df2fdf87f75140be0a316576c8ffad67feefd6d7

Summary: Pull Request resolved: #5499 Seems to block bfloat16 stories110M as exported by torchchat (and we should have op coverage for bfloat16 anyway). ghstack-source-id: 243857968 Reviewed By: larryliu0820 Differential Revision: D63054001 fbshipit-source-id: 530b479872643f878912592c7b260d71e6e05804

Summary: Pull Request resolved: #5500 ghstack-source-id: 243857969 Reviewed By: digantdesai, larryliu0820 Differential Revision: D63057744 fbshipit-source-id: 9e1fb6f6479adb1575c5aed61b9da3c774586ba3

Summary: Pull Request resolved: #5519 Discovered these were missing while trying to use the following diff. ghstack-source-id: 243867517 exported-using-ghexport Reviewed By: digantdesai Differential Revision: D63147276 fbshipit-source-id: bf75fb0fe452e2e68a34271ca1250cdb90657e5a

#5520) Summary: Pull Request resolved: #5520 reserved ghstack-source-id: 243867516 exported-using-ghexport Reviewed By: larryliu0820 Differential Revision: D63147278 fbshipit-source-id: d5aefbf2509a1eca4c32bbbe7224e7b996fa1e57

Summary: Pull Request resolved: #5553 Adding support to load Llama Guard model and run prompt classification task Reviewed By: cmodi-meta, kirklandsign Differential Revision: D63148252 fbshipit-source-id: 482559e694da05bdec75b9a2dbd76163c686e47d

Summary: Pull Request resolved: #5560 ## Context Refactor operator test code generation scripts, such that components can be re-used to generate operator benchmarks. In broad strokes, the refactors implemented by this diff are as follows: * Improve granularity of Python modules * Replace `test` with `correctness_test`, to make it clear that we are generating correctness tests. **Note that I haven't changed the top level target name `compute_graph_op_tests_bin` since I believe it would be too verbose. ghstack-source-id: 244283559 exported-using-ghexport Reviewed By: nathanaelsee Differential Revision: D63286131 fbshipit-source-id: 1177ea381e6381045f1c97491dd7ec006690f574

Summary: Pull Request resolved: #5561 ## Context Use the automatic test generation infrastructure to generate operator benchmarks. The overall concept is the same as the test generation; we just structure the generated code in the style of the google benchmark library instead of GTEST. ghstack-source-id: 244287193 Reviewed By: derekxu, nathanaelsee Differential Revision: D63286132 fbshipit-source-id: 25c379accf6664dfca8232db81772b638b41c758

Summary: Add separate tests for Ethos-U85 to all backend operator tests. Updated ethos-u-vela version to support more operators. Signed-off-by: Per Åstrand <[per.astrand@arm.com](mailto:per.astrand@arm.com)> Signed-off-by: Tom Allsop <[tom.allsop@arm.com](mailto:tom.allsop@arm.com)> Pull Request resolved: #5346 Reviewed By: manuelcandales Differential Revision: D62875027 Pulled By: digantdesai fbshipit-source-id: 3bf238d81957258ee93ae235d575beff8a575191

Summary: Pull Request resolved: #5565 Swap to using method meta so we can be finer grained about this check Reviewed By: dbort Differential Revision: D62983475 fbshipit-source-id: c4599c5ecad0409cd8b2670464c4e9e8809b49ad

## Context TSIA. Implement slice as a view operator. This is only valid under the following conditions: * All dims preceding the sliced dim in the dim order have a size of 1 * start is 0 * step is 1 The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments. To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration. Differential Revision: [D61666462](https://our.internmc.facebook.com/intern/diff/D61666462/) [ghstack-poisoned]

facebook-github-bot · 2024-09-24T16:12:59Z

This pull request was exported from Phabricator. Differential Revision: D61666462

## Context TSIA. Implement slice as a view operator. This is only valid under the following conditions: * All dims preceding the sliced dim in the dim order have a size of 1 * start is 0 * step is 1 The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments. To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration. Differential Revision: [D61666462](https://our.internmc.facebook.com/intern/diff/D61666462/) [ghstack-poisoned]

facebook-github-bot · 2024-09-24T16:30:46Z

This pull request was exported from Phabricator. Differential Revision: D61666462

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 22, 2024

This was referenced Aug 22, 2024

[ET-VK] Add buffer implementation for matrix multiplication #4845

Merged

[ET-VK][Ez] Add utilities to check if one vTensor is a view of another #4846

Merged

facebook-github-bot added the fb-exported label Aug 22, 2024

SS-JIA mentioned this pull request Aug 22, 2024

[ET-VK] Add transpose op as view operator #4847

Closed

SS-JIA changed the base branch from gh/SS-JIA/64/base to gh/SS-JIA/63/head August 22, 2024 18:54

shewu-quic and others added 13 commits September 10, 2024 02:00

Restore constant segment

549f14b

Differential Revision: D62278416 Pull Request resolved: #5141

Add Half/BFloat16 tests for op_mul

e826de3

Differential Revision: D62417216 Pull Request resolved: #5213

Switch over backend tests to export_for_training

30acae5

Differential Revision: D62428363 Pull Request resolved: #5220

[LLava] Fix stats for C++ runner

db34239

Differential Revision: D62420000 Pull Request resolved: #5147

Update bundled_program to use new namespace

02304d7

Differential Revision: D62402292 Pull Request resolved: #5200

Qualcomm AI Engine Direct - Fixed the order of the transforms for lla…

c76b22f

…ma (#5221) * Qualcomm AI Engine Direct - Fixed the order of the transforms for llama * fixed ci --------- Co-authored-by: Sheng Feng Wu <shewu@qti.qualcomm.com>

Android refactor cmake build

d38ca81

Differential Revision: D62408596 Pull Request resolved: #5204

Android: Leverage prefillPrompt and prefillImage on Llava

a4d67e2

Differential Revision: D62411342 Pull Request resolved: #5224

Update the minimum C++ version to C++17

b54206d

Differential Revision: D62329462 Pull Request resolved: #5158

Introduce PlatformMemoryAllocator

4ce0f9d

Differential Revision: D60601742 Pull Request resolved: #5121

Use dynamic bound by default.

2b50c76

Differential Revision: D62459696 Pull Request resolved: #5234

zingo and others added 22 commits September 23, 2024 09:24

Support bfloat16 in op_index_put (#5500)

badd76e

Summary: Pull Request resolved: #5500 ghstack-source-id: 243857969 Reviewed By: digantdesai, larryliu0820 Differential Revision: D63057744 fbshipit-source-id: 9e1fb6f6479adb1575c5aed61b9da3c774586ba3

SS-JIA mentioned this pull request Sep 24, 2024

[ET-VK][ez] Use MemoryAccessFlags instead of MemoryAccessType when binding #5583

Closed

SS-JIA added 2 commits September 24, 2024 09:30

SS-JIA mentioned this pull request Sep 24, 2024

[ET-VK][ez] Introduce convenience constexpr for memory access types #5584

Closed

SS-JIA closed this Sep 24, 2024

SS-JIA deleted the gh/SS-JIA/64/head branch January 24, 2025 19:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ET-VK] Implement slice as a view #4848

[ET-VK] Implement slice as a view #4848

Uh oh!

SS-JIA commented Aug 22, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 22, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented Aug 22, 2024

Uh oh!

facebook-github-bot commented Aug 22, 2024

Uh oh!

facebook-github-bot commented Aug 23, 2024

Uh oh!

facebook-github-bot commented Sep 24, 2024

Uh oh!

facebook-github-bot commented Sep 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

45 participants

[ET-VK] Implement slice as a view #4848

[ET-VK] Implement slice as a view #4848

Uh oh!

Conversation

SS-JIA commented Aug 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Uh oh!

pytorch-bot bot commented Aug 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4848

Uh oh!

facebook-github-bot commented Aug 22, 2024

Uh oh!

facebook-github-bot commented Aug 22, 2024

Uh oh!

facebook-github-bot commented Aug 23, 2024

Uh oh!

facebook-github-bot commented Sep 24, 2024

Uh oh!

facebook-github-bot commented Sep 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

45 participants

SS-JIA commented Aug 22, 2024 •

edited

Loading

pytorch-bot bot commented Aug 22, 2024 •

edited

Loading