Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][PI][CUDA] Update queries for atomic order and scope for CUDA #4853

Merged
merged 10 commits into from Jan 26, 2022

Conversation

t4c1
Copy link
Contributor

@t4c1 t4c1 commented Oct 29, 2021

Updates returns for atomics memory order and scope capabilities queries to make them in line with changes in #4820.

This includes adding the previously not existing option to query for atomic scope capabilities.

@t4c1 t4c1 requested review from againull, smaslov-intel and a team as code owners October 29, 2021 12:37
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, though I have some small comments while this is blocked anyway.

Could you please add a case for the new descriptors in the other PI plugins, like with PI_DEVICE_INFO_ATOMIC_MEMORY_ORDER_CAPABILITIES? I don't expect full implementations, but for consistency it helps keep track of which ones aren't implemented in the corresponding plugins.

Also, would you mind changing this to a draft PR and adding "Draft: " or "[WIP]" (I think the former is more visible) to the title? This is just to prevent it from being prematurely merged by mistake.

sycl/include/CL/sycl/detail/pi.h Outdated Show resolved Hide resolved
@steffenlarsen
Copy link
Contributor

After reviewing #4820 I do not think having it merged is enough to unblock this PR. The reason is that, even though it introduces atomic operations with additional memory scopes (acq_rel, acquire, and release), these are still not supported by atomic load/store in LLVM's NVPTX implementation.

For more context see llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp#L859 (and similar for store). This previously caused libclc to fail to build kernels that would even remotely consider using atomic load/store with anything stricter than "unordered" memory order (see 4876443.)

@t4c1 t4c1 changed the title [SYCL][PI][CUDA] Update queries for atomic order and scope for CUDA Draft: [SYCL][PI][CUDA] Update queries for atomic order and scope for CUDA Nov 8, 2021
@dm-vodopyanov dm-vodopyanov added the cuda CUDA back-end label Nov 8, 2021
@dm-vodopyanov dm-vodopyanov added this to In review in oneAPI DPC++ via automation Nov 8, 2021
@t4c1
Copy link
Contributor Author

t4c1 commented Nov 9, 2021

Could you please add a case for the new descriptors in the other PI plugins, like with PI_DEVICE_INFO_ATOMIC_MEMORY_ORDER_CAPABILITIES? I don't expect full implementations, but for consistency it helps keep track of which ones aren't implemented in the corresponding plugins.

That should be done now.

@bader
Copy link
Contributor

bader commented Nov 17, 2021

@t4c1, please, update ABI tests.

@t4c1
Copy link
Contributor Author

t4c1 commented Nov 17, 2021

The ABI tests should be fixed now.

bader
bader previously approved these changes Nov 17, 2021
Copy link
Contributor

@bader bader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to start testing.

@bader bader dismissed their stale review November 17, 2021 14:44

Dismiss approve to avoid unintentional merge as this PR is labeled as a draft.

@bader bader requested review from steffenlarsen and removed request for againull November 20, 2021 23:36
steffenlarsen
steffenlarsen previously approved these changes Nov 22, 2021
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for adding this!

@bader
Copy link
Contributor

bader commented Nov 23, 2021

@t4c1, please, resolve merge conflicts.

# Conflicts:
#	sycl/include/CL/sycl/info/info_desc.hpp
@bader
Copy link
Contributor

bader commented Nov 30, 2021

Blocked by: #4820

#4820 is merged. Is this PR unblocked now?

@t4c1
Copy link
Contributor Author

t4c1 commented Nov 30, 2021

Given #4853 (comment), I would say no. In the meantime I also figured there are some atomics without direct ptx equivalents missing and will add them soon.

@bader bader marked this pull request as draft November 30, 2021 13:34
@t4c1 t4c1 marked this pull request as ready for review January 18, 2022 10:25
@t4c1 t4c1 requested a review from a team as a code owner January 18, 2022 10:25
@t4c1 t4c1 changed the title Draft: [SYCL][PI][CUDA] Update queries for atomic order and scope for CUDA [SYCL][PI][CUDA] Update queries for atomic order and scope for CUDA Jan 18, 2022
@t4c1
Copy link
Contributor Author

t4c1 commented Jan 18, 2022

All the PRs blocking this have now been merged.

@bader bader requested review from a team January 18, 2022 10:28
steffenlarsen
steffenlarsen previously approved these changes Jan 18, 2022
@t4c1
Copy link
Contributor Author

t4c1 commented Jan 26, 2022

Does this need to wait for something else or can it be merged?

@bader
Copy link
Contributor

bader commented Jan 26, 2022

@againull, @s-kanaev or @smaslov-intel are expected to approve Level Zero plug-in changes. Folks, could you take a look, please?

Copy link
Contributor

@againull againull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plugin changes look good to me.

@bader bader merged commit 43a4192 into intel:sycl Jan 26, 2022
oneAPI DPC++ automation moved this from In review to Closed Jan 26, 2022
@bader
Copy link
Contributor

bader commented Jan 28, 2022

@t4c1, this change broke two tests in llvm-test-suite.

SYCL :: AtomicRef/atomic_memory_order_acq_rel.cpp
SYCL :: AtomicRef/atomic_memory_order_acq_rel_atomic64.cpp

Error message:

PI CUDA ERROR:
	Value:           719
	Name:            CUDA_ERROR_LAUNCH_FAILED
	Description:     unspecified launch failure
	Function:        cuda_piEnqueueMemBufferRead
	Source Location: llvm.src/sycl/plugins/cuda/pi_cuda.cpp:2430

atomic_memory_order_acq_rel_atomic64.cpp.tmp.out: llvm-test-suite/SYCL/AtomicRef/atomic_memory_order_acq_rel.h:39: void acq_rel_test(sycl::queue, size_t) [AtomicRef = sycl::ext::oneapi::atomic_ref, address_space = sycl::access::address_space::global_space, T = double]: Assertion `a == T(N)' failed.

error: command failed with exit status: -6

Could you take a look please?

@t4c1
Copy link
Contributor Author

t4c1 commented Jan 28, 2022

This change just let these tests run for CUDA. What is broken is all the atomic_memory_order* tests. They are using the pattern:

            auto ld = aar.load();
            ld += 1;
            aar.store(ld);

and checking if the whole sequence of operations is atomic. Which it is not - only each of the operations (load/store) on its own is atomic. I am not sure what these tests are supposed to check ... maybe we can just remove them?

@t4c1
Copy link
Contributor Author

t4c1 commented Jan 28, 2022

Also, as far as I know, no backend supported acquire release (or sequentially consistent) order before this PR was merged, so these tests were never actually run.

@bader
Copy link
Contributor

bader commented Jan 28, 2022

Could you create a patch to llvm-test-suite with removing illegal checks and add @steffenlarsen to discuss this change, please?

bader pushed a commit to intel/llvm-test-suite that referenced this pull request Feb 1, 2022
Removes `atomic_memory_order*` tests, which are broken. They are using the pattern:
```
            auto ld = aar.load();
            ld += 1;
            aar.store(ld);
```
and checking if the whole sequence of operations is atomic. Which it is not - only each of the operations (load/store) on its own is atomic.

Before intel/llvm#4853 was merged no backend supported acquire release or sequentially consistent memory orders, so these tests were never run before.

This issue was first discussed here: intel/llvm#4853 (comment)
@t4c1 t4c1 deleted the atomic_querries branch March 15, 2022 08:51
aelovikov-intel pushed a commit that referenced this pull request Feb 17, 2023
Removes `atomic_memory_order*` tests, which are broken. They are using the pattern:
```
            auto ld = aar.load();
            ld += 1;
            aar.store(ld);
```
and checking if the whole sequence of operations is atomic. Which it is not - only each of the operations (load/store) on its own is atomic.

Before #4853 was merged no backend supported acquire release or sequentially consistent memory orders, so these tests were never run before.

This issue was first discussed here: #4853 (comment)
aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this pull request Mar 27, 2023
…e#783)

Removes `atomic_memory_order*` tests, which are broken. They are using the pattern:
```
            auto ld = aar.load();
            ld += 1;
            aar.store(ld);
```
and checking if the whole sequence of operations is atomic. Which it is not - only each of the operations (load/store) on its own is atomic.

Before intel#4853 was merged no backend supported acquire release or sequentially consistent memory orders, so these tests were never run before.

This issue was first discussed here: intel#4853 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA back-end
Projects
No open projects
oneAPI DPC++
  
Closed
Development

Successfully merging this pull request may close these issues.

None yet

5 participants