-
Notifications
You must be signed in to change notification settings - Fork 796
[SYCL][Docs][Joint matrix] Add overloads and restrictions for the offset load store #15499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| void joint_matrix_store(Group g, | ||
| const joint_matrix<Group, T, use::a, Rows, Cols, Layout> &res, | ||
| multi_ptr<T, Space, IsDecorated> dest, size_t stride); | ||
| multi_ptr<T, Space, IsDecorated> dest, size_t Stride); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why capitalize this parameter name? All the other parameter names start with a lower case letter. Our style is that function parameter names are lower case (snake_case) while template parameter names are upper case (CamelCase).
I see below that you have added parameter names RowIndex and ColIndex. These should be row_index and col_index to be consistent.
|
|
||
| - The `Stride` argument to `joint_matrix_load` and | ||
| `joint_matrix_store` must be a multiple of 8 bytes. Also, `Stride` | ||
| should not exceed `2^24^` bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stride parameter is the number of elements, not the number of bytes. It would be better to reword this like:
The
strideparameter tojoint_matrix_loadandjoint_matrix_storehas the following restrictions:
- The value
stride * sizeof(T1)must be a multiple of 8, and- The value of
stride * sizeof(T1)must not exceed 224.
| these checked APIs: | ||
|
|
||
| - The `Stride` argument must be a multiple of 8 bytes. Also, `Stride` | ||
| should not exceed `2^24^` bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment in the other file about the wording of this restriction.
|
@intel/llvm-gatekeepers, please help merge this. |
|
|
||
| - If these restrictions are not satisfied, users can switch to slower | ||
| implementations of `joint_matrix_load` and `joint_matrix_store` by | ||
| setting the driver flag `IGC_JointMatrixLoadStoreOpt=1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dkhaldi , It is better to use IGC_JointMatrixLoadStoreOpt=2, as more optimizations may kick in, especially for big shapes.
joint_matrix_loadandjoint_matrix_storewhere the offsets are separated from the base pointer and added as separate arguments. I kept the same name as the expectation is to remove the regular variants once the new ones are used instead.joint_matrix_load/storeon PVC since in the current implementation, no runtime checks are added as they are expensive. The fall back to 1d load/store is done using a flag instead.