[ET-VK][INT4-MM] Move QMat2 to buffer storage and scales_and_zeros to Channels Packed #5515

SS-JIA · 2024-09-20T18:45:21Z

Stack from ghstack (oldest at bottom):

-> [ET-VK][INT4-MM] Move QMat2 to buffer storage and scales_and_zeros to Channels Packed #5515

Storing QMat2 in a texture gives way to two main problems:

Indexing is a mess and additional computation is required to take into account the fact that we are reading ivec4's and only using half of the values
There is no texel fetching in int8. The texel is read in int32 and needs to be casted

Keeping QMat2 in a buffer performs better because, although reading from buffers is slower, removing the extra computation compensates for this.

{F1863459327}

This diff also moves the scales_and_zeros tensor to Channels Packed in texture implementations because it just makes more sense, I had done some terrible indexing shennanigans before.

Differential Revision: D62504978

… Channels Packed Storing QMat2 in a texture gives way to two main problems: - Indexing is a mess and additional computation is required to take into account the fact that we are reading ivec4's and only using half of the values - There is no texel fetching in int8. The texel is read in int32 and needs to be casted Keeping QMat2 in a buffer performs better because, although reading from buffers is slower, removing the extra computation compensates for this. {F1863459327} This diff also moves the scales_and_zeros tensor to Channels Packed in texture implementations because it just makes more sense, I had done some terrible indexing shennanigans before. Differential Revision: [D62504978](https://our.internmc.facebook.com/intern/diff/D62504978/) [ghstack-poisoned]

pytorch-bot · 2024-09-20T18:45:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5515

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2144171 with merge base 0ec003b ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

… Channels Packed Storing QMat2 in a texture gives way to two main problems: - Indexing is a mess and additional computation is required to take into account the fact that we are reading ivec4's and only using half of the values - There is no texel fetching in int8. The texel is read in int32 and needs to be casted Keeping QMat2 in a buffer performs better because, although reading from buffers is slower, removing the extra computation compensates for this. {F1863459327} This diff also moves the scales_and_zeros tensor to Channels Packed in texture implementations because it just makes more sense, I had done some terrible indexing shennanigans before. Differential Revision: [D62504978](https://our.internmc.facebook.com/intern/diff/D62504978/) ghstack-source-id: 243833256 Pull Request resolved: #5515

facebook-github-bot · 2024-09-20T18:45:35Z

This pull request was exported from Phabricator. Differential Revision: D62504978

…nd_zeros to Channels Packed" Storing QMat2 in a texture gives way to two main problems: - Indexing is a mess and additional computation is required to take into account the fact that we are reading ivec4's and only using half of the values - There is no texel fetching in int8. The texel is read in int32 and needs to be casted Keeping QMat2 in a buffer performs better because, although reading from buffers is slower, removing the extra computation compensates for this. {F1863459327} This diff also moves the scales_and_zeros tensor to Channels Packed in texture implementations because it just makes more sense, I had done some terrible indexing shennanigans before. Differential Revision: [D62504978](https://our.internmc.facebook.com/intern/diff/D62504978/) [ghstack-poisoned]

facebook-github-bot · 2024-09-23T21:37:25Z

This pull request was exported from Phabricator. Differential Revision: D62504978

… Channels Packed Pull Request resolved: #5515 Storing QMat2 in a texture gives way to two main problems: - Indexing is a mess and additional computation is required to take into account the fact that we are reading ivec4's and only using half of the values - There is no texel fetching in int8. The texel is read in int32 and needs to be casted Keeping QMat2 in a buffer performs better because, although reading from buffers is slower, removing the extra computation compensates for this. {F1863459327} This diff also moves the scales_and_zeros tensor to Channels Packed in texture implementations because it just makes more sense, I had done some terrible indexing shennanigans before. ghstack-source-id: 244258611 @exported-using-ghexport Differential Revision: [D62504978](https://our.internmc.facebook.com/intern/diff/D62504978/)

facebook-github-bot · 2024-09-23T22:40:00Z

This pull request has been merged in 2eae7a9.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 20, 2024

facebook-github-bot added the fb-exported label Sep 20, 2024

yipjustin approved these changes Sep 23, 2024

View reviewed changes

facebook-github-bot closed this in 2eae7a9 Sep 23, 2024

facebook-github-bot added the Merged label Sep 23, 2024

SS-JIA deleted the gh/SS-JIA/87/head branch January 24, 2025 19:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ET-VK][INT4-MM] Move QMat2 to buffer storage and scales_and_zeros to Channels Packed #5515

[ET-VK][INT4-MM] Move QMat2 to buffer storage and scales_and_zeros to Channels Packed #5515

Uh oh!

SS-JIA commented Sep 20, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 20, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented Sep 20, 2024

Uh oh!

facebook-github-bot commented Sep 23, 2024

Uh oh!

facebook-github-bot commented Sep 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[ET-VK][INT4-MM] Move QMat2 to buffer storage and scales_and_zeros to Channels Packed #5515

[ET-VK][INT4-MM] Move QMat2 to buffer storage and scales_and_zeros to Channels Packed #5515

Uh oh!

Conversation

SS-JIA commented Sep 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5515

✅ No Failures

Uh oh!

facebook-github-bot commented Sep 20, 2024

Uh oh!

facebook-github-bot commented Sep 23, 2024

Uh oh!

facebook-github-bot commented Sep 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SS-JIA commented Sep 20, 2024 •

edited

Loading

pytorch-bot bot commented Sep 20, 2024 •

edited

Loading