Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix device::max_work_item_sizes #832

Merged
merged 2 commits into from
Sep 28, 2022
Merged

Conversation

nmnobre
Copy link
Member

@nmnobre nmnobre commented Sep 22, 2022

Hi @illuhad,

I've noticed that sycl::device::get_info<sycl::info::device::max_work_item_sizes>() returns the allowed per-dimension number of work items, i.e. threads, not on a block, i.e. a work group, but on a grid of blocks. It seems the standard is asking for the former? I don't know if you can test it easily, but our friends at intel seem to have the same opinion.

Currently, I've only changed the behaviour for the cuda backend to illustrate the idea, but we should change the others too before eventually merging.

Cheers,
-Nuno

@illuhad
Copy link
Collaborator

illuhad commented Sep 22, 2022

Thanks for bringing this up. I've had a look at the spec, and I think you are right.

However, I think the max_global_size property of the runtime should remain the same, since it correctly returns the max global size as the name suggests. We should probably introduce new properties for the max group size in each dimension, max_group_size0, max_group_size1 etc.

@nmnobre
Copy link
Member Author

nmnobre commented Sep 22, 2022

However, I think the max_global_size property of the runtime should remain the same, since it correctly returns the max global size as the name suggests. We should probably introduce new properties for the max group size in each dimension, max_group_size0, max_group_size1 etc.

Sounds good, I think it looks better now.

@@ -219,6 +219,15 @@ cuda_hardware_context::get_property(device_uint_property prop) const {
case device_uint_property::max_global_size2:
return _properties->maxThreadsPerBlock * _properties->maxGridSize[2];
Copy link
Contributor

@al42and al42and Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that device_uint_property::max_global_sizeX are not correct (should be _properties->maxThreadsDim[X] * _properties->maxGridSize[X]). However, as far as I can tell, they are not used now, so we can safely remove them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, we should fix them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

std::size_t size2 = static_cast<std::size_t>(get_rt_device()->get_property(
rt::device_uint_property::max_global_size2));
rt::device_uint_property::max_group_size2));
return id<3>{size0, size1, size2};
Copy link
Contributor

@al42and al42and Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say, no. Index flipping is a problem of the kernel headers and the glue code, not of the runtime library which is currently completely unaware that something like this happens. Also, even if we wanted to, we couldn't expose the right flipping here because the correct flip depends on the dimensionality of the query. I.e. max_size[0] would depend on whether we are asking for a 1D, 2D, or 3D kernel.

The reason why SYCL 2020 changed the device descriptors to structs instead of enums was precisely to carry the information about the dimensionality. Because I don't want a templated query interface for the runtime, this is another reason why it probably should be handled by the headers. We will have to expose though whether a backend has vectorization across first index as a runtime property so that the device queries in the headers can do the right thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't change anything as per @illuhad's request but I should add that the intel runtime does also flip the dimensions on a request to sycl::device::get_info<sycl::info::device::max_work_item_sizes>().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nmnobre To clarify: Eventually those queries should also include the flip in hipSYCL. I was just talking about where it would eventually need to go. This requires distinguishing between the SYCL headers, where get_info<>() lives and the actual hipSYCL runtime library libhipSYCL-rt and its backend plugins. I was just saying that in the hipSYCL architecture, that flip should probably be implemented inside get_info<>, not the runtime library itself.

For this we probably need to move to the SYCL 2020 notion where structs are used instead of enums for those queries (#774). In SYCL 2020, max_work_item_sizes<Dim> is templated with the dimensionality so that the flip can be included correctly.

If DPC++ also flips without knowing the dimensionality, I would consider it confusing and not part of SYCL 2020 since this query does not exist anymore without template parameter for the dimension.
https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_device_information_descriptors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@illuhad, yes, you're right; no way to fix it currently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If DPC++ also flips without knowing the dimensionality, I would consider it confusing and not part of SYCL 2020 since this query does not exist anymore without template parameter for the dimension. https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_device_information_descriptors

I've checked and, my bad, they have in fact transitioned to use structs for the device descriptors a couple months back and they do require the template parameter as per SYCL 2020. I was using an old version which still used the enums and no dimension specification. For clarity, the flip happened then and it still does now. :-)

@illuhad illuhad merged commit c3156fa into AdaptiveCpp:develop Sep 28, 2022
nmnobre added a commit to nmnobre/OpenSYCL that referenced this pull request Mar 28, 2023
nmnobre added a commit to nmnobre/OpenSYCL that referenced this pull request Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants