-
Notifications
You must be signed in to change notification settings - Fork 700
[ET-VK][INT4-MM] Move QMat2 to buffer storage and scales_and_zeros to Channels Packed #5515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… Channels Packed
Storing QMat2 in a texture gives way to two main problems:
- Indexing is a mess and additional computation is required to take into account the fact that we are reading ivec4's and only using half of the values
- There is no texel fetching in int8. The texel is read in int32 and needs to be casted
Keeping QMat2 in a buffer performs better because, although reading from buffers is slower, removing the extra computation compensates for this.
{F1863459327}
This diff also moves the scales_and_zeros tensor to Channels Packed in texture implementations because it just makes more sense, I had done some terrible indexing shennanigans before.
Differential Revision: [D62504978](https://our.internmc.facebook.com/intern/diff/D62504978/)
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5515
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 2144171 with merge base 0ec003b ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
… Channels Packed
Storing QMat2 in a texture gives way to two main problems:
- Indexing is a mess and additional computation is required to take into account the fact that we are reading ivec4's and only using half of the values
- There is no texel fetching in int8. The texel is read in int32 and needs to be casted
Keeping QMat2 in a buffer performs better because, although reading from buffers is slower, removing the extra computation compensates for this.
{F1863459327}
This diff also moves the scales_and_zeros tensor to Channels Packed in texture implementations because it just makes more sense, I had done some terrible indexing shennanigans before.
Differential Revision: [D62504978](https://our.internmc.facebook.com/intern/diff/D62504978/)
ghstack-source-id: 243833256
Pull Request resolved: #5515
|
This pull request was exported from Phabricator. Differential Revision: D62504978 |
…nd_zeros to Channels Packed"
Storing QMat2 in a texture gives way to two main problems:
- Indexing is a mess and additional computation is required to take into account the fact that we are reading ivec4's and only using half of the values
- There is no texel fetching in int8. The texel is read in int32 and needs to be casted
Keeping QMat2 in a buffer performs better because, although reading from buffers is slower, removing the extra computation compensates for this.
{F1863459327}
This diff also moves the scales_and_zeros tensor to Channels Packed in texture implementations because it just makes more sense, I had done some terrible indexing shennanigans before.
Differential Revision: [D62504978](https://our.internmc.facebook.com/intern/diff/D62504978/)
[ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D62504978 |
… Channels Packed Pull Request resolved: #5515 Storing QMat2 in a texture gives way to two main problems: - Indexing is a mess and additional computation is required to take into account the fact that we are reading ivec4's and only using half of the values - There is no texel fetching in int8. The texel is read in int32 and needs to be casted Keeping QMat2 in a buffer performs better because, although reading from buffers is slower, removing the extra computation compensates for this. {F1863459327} This diff also moves the scales_and_zeros tensor to Channels Packed in texture implementations because it just makes more sense, I had done some terrible indexing shennanigans before. ghstack-source-id: 244258611 @exported-using-ghexport Differential Revision: [D62504978](https://our.internmc.facebook.com/intern/diff/D62504978/)
|
This pull request has been merged in 2eae7a9. |
Stack from ghstack (oldest at bottom):
Storing QMat2 in a texture gives way to two main problems:
Keeping QMat2 in a buffer performs better because, although reading from buffers is slower, removing the extra computation compensates for this.
{F1863459327}
This diff also moves the scales_and_zeros tensor to Channels Packed in texture implementations because it just makes more sense, I had done some terrible indexing shennanigans before.
Differential Revision: D62504978