Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Make local accessor argument allocation 1-byte minimum #6032

Merged

Conversation

steffenlarsen
Copy link
Contributor

Some DPC++ backends, such as OpenCL, does not allow setting local memory kernel arguments with a size of zero bytes. As such, the runtime library can fail with a backend error if a local accessor has a zero-size. These changes make a special case for local accessors with a zero byte size making it allocate a single byte of memory.

Some DPC++ backends, such as OpenCL, does not allow setting local memory
kernel arguments with a size of zero bytes. As such, the runtime library
can fail with a backend error if a local accessor has a zero-size. These
changes make a special case for local accessors with a zero byte size
making it allocate a single byte of memory.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
@steffenlarsen steffenlarsen requested a review from a team as a code owner April 20, 2022 19:25
steffenlarsen added a commit to steffenlarsen/llvm-test-suite that referenced this pull request Apr 20, 2022
This tests intel/llvm#6032.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
@steffenlarsen
Copy link
Contributor Author

/verify with intel/llvm-test-suite#1001

@cperkinsintel
Copy link
Contributor

code-wise this is fine. But do we really need to special case 0 size? Why isn't failure an option? Are zero sized local accessors a thing?

@steffenlarsen
Copy link
Contributor Author

code-wise this is fine. But do we really need to special case 0 size? Why isn't failure an option? Are zero sized local accessors a thing?

0-sized local accessors are not disallowed by the SYCL 2020 specification so we should allow it through although it is an edge-case. We cannot simply skip setting the kernel argument as OpenCL will start complaining about unset kernel arguments, so using a minimal-sized allocation seems to be the best solution.

@AerialMantis
Copy link
Contributor

@steffenlarsen I think you're right that the SYCL specification doesn't specify whether zero-sized local accessors, a clarification was added recently which allows zero-sized buffers so there is a precedent to allow it, though perhaps we should raise a ticket for the specification to clarify this?

On a separate note, would I be correct in thinking this change would make #5682 redundant as the backend would never receive a zero-size local memory argument?

@steffenlarsen
Copy link
Contributor Author

@steffenlarsen I think you're right that the SYCL specification doesn't specify whether zero-sized local accessors, a clarification was added recently which allows zero-sized buffers so there is a precedent to allow it, though perhaps we should raise a ticket for the specification to clarify this?

I am not opposed to having a clarification for it, though I'd argue that since we don't specify an error for it, it should be assumed to work. The minimum allocation here is just an implementation detail and the user should still not access the byte as it is undefined behavior to do so (from the spec's perspective).

On a separate note, would I be correct in thinking this change would make #5682 redundant as the backend would never receive a zero-size local memory argument?

Thank you for pointing that one out! Yes, I would expect so. I assume the CUDA backend is fine without this change, so if you think it would be better off with simply a zero-sized allocation, we can make this strategy backend-dependent, though in that case #5682 would still be an issue.

@AerialMantis
Copy link
Contributor

I am not opposed to having a clarification for it, though I'd argue that since we don't specify an error for it, it should be assumed to work. The minimum allocation here is just an implementation detail and the user should still not access the byte as it is undefined behavior to do so (from the spec's perspective).

I agree, since it's not specified explicitly it's really an implementation detail, and I think this is a fair interpretation.

Thank you for pointing that one out! Yes, I would expect so. I assume the CUDA backend is fine without this change, so if you think it would be better off with simply a zero-sized allocation, we can make this strategy backend-dependent, though in that case #5682 would still be an issue.

That's a good point, the CUDA backend would be fine with this change, we could potentially make this backend-dependent which would allow us to optimize for the zero-size case, though without eliding the argument entirely I suspect this wouldn't have much of an impact, so I'm tempted to say this is better done in the runtime, this also enforces consistent behaviour and avoids errors across all backends.

@keryell
Copy link
Contributor

keryell commented Apr 22, 2022

code-wise this is fine. But do we really need to special case 0 size? Why isn't failure an option?

@cperkinsintel Because this is good quality software and you might want to keep your job. :-) But AMD is hiring... ;-)

@cperkinsintel
Copy link
Contributor

cperkinsintel commented Apr 22, 2022

@keryell - it may not be relevant here, but my question about failure being an option is that if a user is making a 0 sized accessor that seems like it is most likely a mistake in their code. By us making it pretend to work, we are hiding their mistake which may lead them to spend a lot of time hunting down some obscure bug in the future. But if it fails the first time they run it, then they'll find their mistake and fix it. That's why I was asking if 0 sized accessors are a real thing. Also, I like my job. ;-)

@keryell
Copy link
Contributor

keryell commented Apr 23, 2022

@cperkinsintel joking aside, when writing some generic code it makes sense to accept a 0-sized accessor instead of asking the programmer to write an exponential number of functions specialized for all the cases.
Of course, it is UB to access to non-existent memory in the case of an accessor of size 0, but in the same way of any out-of-bound access to any accessor.
In another domain, in C++ it is allowed to have a std::array of size 0 https://en.cppreference.com/w/cpp/container/array

Copy link
Contributor

@cperkinsintel cperkinsintel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@steffenlarsen steffenlarsen merged commit 1292532 into intel:sycl Apr 29, 2022
steffenlarsen added a commit to intel/llvm-test-suite that referenced this pull request Apr 29, 2022
This tests intel/llvm#6032.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
aelovikov-intel pushed a commit that referenced this pull request Feb 17, 2023
This tests #6032.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this pull request Mar 27, 2023
…t-suite#1001)

This tests intel#6032.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants