-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][PI] New device information descriptors: max_global_work_groups and max_work_groups #4064
Conversation
SYCL currently does not provide a way to query a device to get the maximum number of work groups that can be submitted in each dimension. This query does not exist in openCL, but now that GPU are offered through the PI, this query becomes more relevant as different vendors/devices have different limits. This commit implements the feature for the host device, level-zero, openCL, ROCm and CUDA. If the query is not applicable, the maximum acceptable value is returned.
Hello. Thanks for adding this! A few questions/comments:
If you agree with changes falling out from the above but want me to propose the wording for anything, I'm happy to help. For reference, the wording of the |
Hello, thanks for your comments.
At least if that value could be accessible in a header for info queries it would prevent future errors.
Do you think there could be a way to specialise
And
|
There are already some queries that are tied to a specific kernel. Backends seem to have kernel-independent queries for max number of work-groups, but to make sure that you're aware of the possibility, check Table 133 at https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_kernel_information_descriptors. These are queries from the
SYCL already has something like this for the number of work-items in a work-group. For individual dimensions one can query
There has been talk about this before, but I don't think it exists in any spec yet. This capability probably should exist, though. @Pennycook @gmlueck do either of you know of any existing precedent for this? I suspect that we'd want to pass the dimensionality information as part of the |
Yes, that's exactly why I was proposing that, maybe something like |
…ro) and added bound check The bound check is probably not usefull yet for cuda and rocm
By looking at the current spec I realize that there is some lack of uniformity. |
It would make the naming shorter and more consistent, for sure. But the name then becomes (almost) a substring of |
Maybe I don't understand the question, but it seems like
|
Hello, |
Agreed, that following the
It's a little unfortunate, though, to add a temporary extension like this that will end up changing once DPC++ implements the SYCL 2020 info descriptors. |
sycl/doc/extensions/MaxWorkGroupQueries/max_work_group_query.md
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/MaxWorkGroupQueries/max_work_group_query.md
Outdated
Show resolved
Hide resolved
…m into max_global_work_sizes
Good stuff! It is unfortunate that it can't use the template variants of info descriptors yet. Maybe it would be worth considering having only the 3D variant of When the info descriptors are made SYCL 2020 compliant in the future we can make a template variant of |
Why is this better than adding the 3D, 2D, and 1D variations now, and then adding the template version later when the DPC++ info descriptors are made conformant with SYCL 2020? I was thinking that we can deprecate the 3D, 2D, and 1D variations once we have the templated one, and then eventually remove them. Doing it this way avoids the need to document (or support) the 3D version as a way to get info about 2D or 1D loops. |
"Better" is such a strong word. W.r.t. ABI it isn't better, but it comes with the benefit of users not having to change their code once the descriptor is changed. Say a user wants to use the 2D variant they can write their own converter from 3D right now. When templated descriptors are introduced,
This means that any code using |
I agree that approach allows some user code to continue working even after we move to the template version of the info descriptors. However, I see two downsides:
Since this is an experimental API, I thought it would not be problematic if we eventually deprecate and remove the non-templated versions of the queries. (Our definition of "experimental API" means we can change the API even without going through a deprecation process.) I guess another option is to proceed as you propose, but document the default template parameter as deprecated, and also deprecate the language about using the 3D query for 2D and 1D loops. We would then remove those from the spec at some point after deprecation. |
I completely agree, it definitely comes with its own set of drawbacks. I am not sure which of the solutions I think is the best, but I just wanted to throw the spanner in the works before a final conclusion was made. I apologize that it was a bit late in the process. |
I all the cases the API will be broken, but if we go ahead with the 1/2/3d version, at least the API/query semantics will remain unchanged. Changing the code later will be easier. If we go with one query version, programmers will have to do two index flips: today, and when the ABI freeze is lifted. |
I don't think it will be difficult either way. In the hard-coded dimensionality option you would have two descriptors doing the same job however, until the deprecated version is removed.
Should hopefully only be at most one flip. If you have to flip from 3D, then that logic can just be scrapped when moving to <3D. Granted it might be confusing to the user when that happens, but we'll have the same problem with If consensus is that the |
Folks, what is the status here? I see that #4563 is pending on these changes, so I'd like to make sure it moves forward. It looks like we need to resolve merge conflicts at least. |
Hello, |
There are quite a lot of comments here already and I'm trying to understand what is the blocker here. |
Pulldown
Done! |
@againull, could you take a look, please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to trigger CI system.
@Michoumichmich, it looks like we need to update tests checking ABI consistency. |
Sure, I will do that! I wasn't sure whether I had the "right to" because of the ABI freeze |
https://github.com/intel/llvm/blob/sycl/CONTRIBUTING.md#development states that "breaking changes are not allowed".
The log says that adding new APIs does not break ABI.
According to my understanding the test validates that all symbols are covered by the test to check for "ABI breaking changes". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@againull, ping.
SYCL currently does not provide a way to query a device to get the maximum number of work groups that can be submitted in each dimension as well as the number of work groups that can be submitted across all the dimensions.
This query does not exist in openCL, but now that GPU are offered through the PI, this query becomes more relevant as different vendors/devices have their own limits.
This commit implements the feature for the host device, level-zero, openCL, ROCm and CUDA. If the query is not applicable, the maximum acceptable value is returned.
Descriptors added:
Feature test macro:
Signed-off-by: Michel Migdal michel.migdal@codeplay.com