Skip to content

Conversation

@SS-JIA
Copy link
Contributor

@SS-JIA SS-JIA commented Aug 22, 2024

Stack from ghstack (oldest at bottom):

Context

TSIA. Implement slice as a view operator. This is only valid under the following conditions:

  • All dims preceding the sliced dim in the dim order have a size of 1
  • start is 0
  • step is 1

The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments.

To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration.

Differential Revision: D61666462

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 22, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4848

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61666462

SS-JIA added a commit that referenced this pull request Aug 22, 2024
## Context

TSIA. Implement slice as a view operator. This is only valid under the following conditions:

* All dims preceding the sliced dim in the dim order have a size of 1
* start is 0
* step is 1

The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments.

To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration.

Differential Revision: [D61666462](https://our.internmc.facebook.com/intern/diff/D61666462/)

ghstack-source-id: 239353693
Pull Request resolved: #4848
@SS-JIA SS-JIA changed the base branch from gh/SS-JIA/64/base to gh/SS-JIA/63/head August 22, 2024 18:54
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61666462

SS-JIA added a commit that referenced this pull request Aug 22, 2024
Pull Request resolved: #4848

## Context

TSIA. Implement slice as a view operator. This is only valid under the following conditions:

* All dims preceding the sliced dim in the dim order have a size of 1
* start is 0
* step is 1

The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments.

To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration.
ghstack-source-id: 239403250
@exported-using-ghexport

Differential Revision: [D61666462](https://our.internmc.facebook.com/intern/diff/D61666462/)
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61666462

SS-JIA added a commit that referenced this pull request Aug 23, 2024
Pull Request resolved: #4848

## Context

TSIA. Implement slice as a view operator. This is only valid under the following conditions:

* All dims preceding the sliced dim in the dim order have a size of 1
* start is 0
* step is 1

The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments.

To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration.
ghstack-source-id: 239495547
@exported-using-ghexport

Differential Revision: [D61666462](https://our.internmc.facebook.com/intern/diff/D61666462/)
shewu-quic and others added 13 commits September 10, 2024 02:00
* Qualcomm AI Engine Direct - Apply spin quant R1 and R2

Summary:
- Add a argument optimized_rotation_path to specify the optimized rotation file
- Refer to https://github.com/facebookresearch/SpinQuant?tab=readme-ov-file to apply R1 R2

* remove not used

* address review

* rename the rotation file to apply_spin_quant_r1_r2

* fix name in TARGETS

---------

Co-authored-by: Sheng Feng Wu <shewu@qti.qualcomm.com>
Differential Revision: D62278416

Pull Request resolved: #5141
Differential Revision: D62417216

Pull Request resolved: #5213
Summary:
- Utility to skip operator annotation, unskipped nodes will be
  gathered into submodules and lowered with quantization annotation.
  Skipped nodes could either fallback to cpu or delegated with HTP fp16.
- Fix uplevel breakage.
- Refactor & retire some outdated implmentation.
Differential Revision: D62428363

Pull Request resolved: #5220
Differential Revision: D62420000

Pull Request resolved: #5147
Differential Revision: D62402292

Pull Request resolved: #5200
…ma (#5221)

* Qualcomm AI Engine Direct - Fixed the order of the transforms for llama

* fixed ci

---------

Co-authored-by: Sheng Feng Wu <shewu@qti.qualcomm.com>
Differential Revision: D62408596

Pull Request resolved: #5204
Differential Revision: D62411342

Pull Request resolved: #5224
Differential Revision: D62329462

Pull Request resolved: #5158
Differential Revision: D60601742

Pull Request resolved: #5121
Differential Revision: D62459696

Pull Request resolved: #5234
zingo and others added 22 commits September 23, 2024 09:24
Summary:
This logs the metrics from the size command when building with run.sh

Pull Request resolved: #5342

Reviewed By: manuelcandales

Differential Revision: D62874679

Pulled By: digantdesai

fbshipit-source-id: f69bfa12c48101e540e684a590f78b546903cb42
Summary:
Adding "px" unit for PyTorch site (i.e. https://pytorch.org/executorch/main/llm/llama-demo-android.html) will have same image widths as readme in github

Pull Request resolved: #5540

Reviewed By: Riandy, kirklandsign

Differential Revision: D63226892

Pulled By: cmodi-meta

fbshipit-source-id: 5cfa30ee0ab156c1e004405cdc7dd99a0f61d2c2
Summary:
The code under examples/... is a proxy for user code, and users should never declare code under the `torch::` or `executorch::` namespaces.

Move this code under the `example::` namespace to make it more clear that users should use their own namespaces when writing code like this.

Pull Request resolved: #5478

Test Plan: - Built using the instructions at https://github.com/pytorch/executorch/blob/main/examples/mediatek/README.md

Reviewed By: JacobSzwejbka, cccclai

Differential Revision: D62992974

Pulled By: dbort

fbshipit-source-id: b01f1b33d2853a0555ae19d79769a5bb6d0ba853
Summary:
examples/ code should use the new `executorch::` namespaces.

Pull Request resolved: #5516

Test Plan: Built the app using the instructions at https://github.com/pytorch/executorch/blob/main/examples/demo-apps/apple_ios/LLaMA/README.md

Reviewed By: larryliu0820

Differential Revision: D63138639

Pulled By: dbort

fbshipit-source-id: fffb6d35d425dd733eead1b24ee8b9f2831e65c0
Summary:
Pull Request resolved: #5546

The last prompt sent would be included in `getConversationHistory()` + adding it prior to sending it with the generate(). It looks like this got move during the rebasing.

To fix this we now call `getConversationHistory()` prior to adding the rawPrompt to a Message.

In regards to model response, I noticed that it did not really change the quality of the response. (tested with Llama 3.1)

Reviewed By: Riandy

Differential Revision: D62761977

fbshipit-source-id: 2f975983965fe837147f1ffb8b5dcfa8f2061895
Summary:
Example code should use the new `executorch::` namespace wherever possible, and should not define code under the `torch::` namespace.

Pull Request resolved: #5512

Test Plan: - Built llava changes with `bash .ci/scripts/test_llava.sh`

Reviewed By: JacobSzwejbka, larryliu0820

Differential Revision: D63133181

Pulled By: dbort

fbshipit-source-id: 5796b85eef053f3b3e4ba0e27a3a26ae48747b5a
Summary:
Use the names in the new `executorch::` namespace.

Pull Request resolved: #5495

Test Plan:
```
./examples/devtools/build_example_runner.sh
```

Reviewed By: larryliu0820

Differential Revision: D63047148

Pulled By: dbort

fbshipit-source-id: e0e3af1c130aaf409ecc142c28d75f0a44d88fa3
Summary:
Pull Request resolved: #5548

Converting the input to and from float32 is faster than not using the op. h/t to torchchat, which does this already (though it had a bug, which I sent a patch for).

Reviewed By: kimishpatel

Differential Revision: D63158951

fbshipit-source-id: 58c90d141ee403536c03a3b731f8547790fc9440
Summary:
This PR adds a CI job for phi-3-mini

Pull Request resolved: #5532

Test Plan: The CI Job is green: https://github.com/pytorch/executorch/actions/runs/10967809307/job/30458161933?pr=5532

Reviewed By: iseeyuan

Differential Revision: D63157703

Pulled By: helunwencser

fbshipit-source-id: fc7f54e166062443f396e7a304712f7b60e5db90
Summary:
Preview in GitHub for consistency: https://github.com/pytorch/executorch/blob/fix-images/examples/demo-apps/android/LlamaDemo/README.md

Pull Request resolved: #5550

Test Plan:
Doc preview: https://docs-preview.pytorch.org/pytorch/executorch/5550/llm/llama-demo-android.html
Rendered GitHub preview: https://github.com/pytorch/executorch/blob/fix-images/examples/demo-apps/android/LlamaDemo/README.md?rgh-link-date=2024-09-23T20%3A12%3A32Z

Reviewed By: cmodi-meta, kirklandsign

Differential Revision: D63281004

Pulled By: svekars

fbshipit-source-id: cc5710624cb9bdab9056558c94f127b3bc12b96c
…5515)

Summary:
Pull Request resolved: #5515

Storing QMat2 in a texture gives way to two main problems:

 - Indexing is a mess and additional computation is required to take into account the fact that we are reading ivec4's and only using half of the values
 - There is no texel fetching in int8. The texel is read in int32 and needs to be casted

Keeping QMat2 in a buffer performs better because, although reading from buffers is slower, removing the extra computation compensates for this.

 {F1863459327}

This diff also moves the scales_and_zeros tensor to Channels Packed in texture implementations because it just makes more sense, I had done some terrible indexing shennanigans before.

ghstack-source-id: 244258611
exported-using-ghexport

Reviewed By: yipjustin

Differential Revision: D62504978

fbshipit-source-id: df2fdf87f75140be0a316576c8ffad67feefd6d7
Summary:
Pull Request resolved: #5499

Seems to block bfloat16 stories110M as exported by torchchat (and we should have op coverage for bfloat16 anyway).
ghstack-source-id: 243857968

Reviewed By: larryliu0820

Differential Revision: D63054001

fbshipit-source-id: 530b479872643f878912592c7b260d71e6e05804
Summary:
Pull Request resolved: #5500

ghstack-source-id: 243857969

Reviewed By: digantdesai, larryliu0820

Differential Revision: D63057744

fbshipit-source-id: 9e1fb6f6479adb1575c5aed61b9da3c774586ba3
Summary:
Pull Request resolved: #5519

Discovered these were missing while trying to use the following diff.
ghstack-source-id: 243867517
exported-using-ghexport

Reviewed By: digantdesai

Differential Revision: D63147276

fbshipit-source-id: bf75fb0fe452e2e68a34271ca1250cdb90657e5a
#5520)

Summary:
Pull Request resolved: #5520

reserved
ghstack-source-id: 243867516
exported-using-ghexport

Reviewed By: larryliu0820

Differential Revision: D63147278

fbshipit-source-id: d5aefbf2509a1eca4c32bbbe7224e7b996fa1e57
Summary:
Pull Request resolved: #5553

Adding support to load Llama Guard model and run prompt classification task

Reviewed By: cmodi-meta, kirklandsign

Differential Revision: D63148252

fbshipit-source-id: 482559e694da05bdec75b9a2dbd76163c686e47d
Summary:
Pull Request resolved: #5560

## Context

Refactor operator test code generation scripts, such that components can be re-used to generate operator benchmarks.

In broad strokes, the refactors implemented by this diff are as follows:

* Improve granularity of Python modules
* Replace `test` with `correctness_test`, to make it clear that we are generating correctness tests. **Note that I haven't changed the top level target name `compute_graph_op_tests_bin` since I believe it would be too verbose.
ghstack-source-id: 244283559
exported-using-ghexport

Reviewed By: nathanaelsee

Differential Revision: D63286131

fbshipit-source-id: 1177ea381e6381045f1c97491dd7ec006690f574
Summary:
Pull Request resolved: #5561

## Context

Use the automatic test generation infrastructure to generate operator benchmarks. The overall concept is the same as the test generation; we just structure the generated code in the style of the google benchmark library instead of GTEST.
ghstack-source-id: 244287193

Reviewed By: derekxu, nathanaelsee

Differential Revision: D63286132

fbshipit-source-id: 25c379accf6664dfca8232db81772b638b41c758
Summary:
Add separate tests for Ethos-U85 to all backend operator tests.
Updated ethos-u-vela version to support more operators.

Signed-off-by: Per Åstrand <[per.astrand@arm.com](mailto:per.astrand@arm.com)>
Signed-off-by: Tom Allsop <[tom.allsop@arm.com](mailto:tom.allsop@arm.com)>

Pull Request resolved: #5346

Reviewed By: manuelcandales

Differential Revision: D62875027

Pulled By: digantdesai

fbshipit-source-id: 3bf238d81957258ee93ae235d575beff8a575191
Summary:
Pull Request resolved: #5565

Swap to using method meta so we can be finer grained about this check

Reviewed By: dbort

Differential Revision: D62983475

fbshipit-source-id: c4599c5ecad0409cd8b2670464c4e9e8809b49ad
## Context

TSIA. Implement slice as a view operator. This is only valid under the following conditions:

* All dims preceding the sliced dim in the dim order have a size of 1
* start is 0
* step is 1

The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments.

To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration.

Differential Revision: [D61666462](https://our.internmc.facebook.com/intern/diff/D61666462/)

[ghstack-poisoned]
## Context

TSIA. Implement slice as a view operator. This is only valid under the following conditions:

* All dims preceding the sliced dim in the dim order have a size of 1
* start is 0
* step is 1

The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments.

To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration.

Differential Revision: [D61666462](https://our.internmc.facebook.com/intern/diff/D61666462/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61666462

## Context

TSIA. Implement slice as a view operator. This is only valid under the following conditions:

* All dims preceding the sliced dim in the dim order have a size of 1
* start is 0
* step is 1

The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments.

To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration.

Differential Revision: [D61666462](https://our.internmc.facebook.com/intern/diff/D61666462/)

[ghstack-poisoned]
## Context

TSIA. Implement slice as a view operator. This is only valid under the following conditions:

* All dims preceding the sliced dim in the dim order have a size of 1
* start is 0
* step is 1

The reasoning for these restrictions is so that the offset of the slice view with respect to the source buffer is 0. More details are in the comments.

To test the operator effectively, this diff also extends the test codegen to handle multiple test suites for one operator, each with a different configuration.

Differential Revision: [D61666462](https://our.internmc.facebook.com/intern/diff/D61666462/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61666462

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.