-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libomptarget][nextgen-plugin] Use SCRELEASE/SCACQUIRE in packet headers #85678
Conversation
@llvm/pr-subscribers-libc @llvm/pr-subscribers-backend-amdgpu Author: Gheorghe-Teodor Bercea (doru1004) ChangesThis patch updates the construction of packet headers to replace the usage of ACQUIRE/RELEASE with SCACQUIRE/SCRELEASE which is now recommended. Full diff: https://github.com/llvm/llvm-project/pull/85678.diff 1 Files Affected:
diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
index fce7454bf2800d..886d8732d088fc 100644
--- a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -826,15 +826,14 @@ struct AMDGPUQueueTy {
/// Assumes the queue lock is acquired.
void publishKernelPacket(uint64_t PacketId, uint16_t Setup,
hsa_kernel_dispatch_packet_t *Packet) {
- uint32_t *PacketPtr = reinterpret_cast<uint32_t *>(Packet);
-
- uint16_t Header = HSA_PACKET_TYPE_KERNEL_DISPATCH << HSA_PACKET_HEADER_TYPE;
- Header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE;
- Header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE;
+ uint16_t Header =
+ (HSA_PACKET_TYPE_KERNEL_DISPATCH << HSA_PACKET_HEADER_TYPE) |
+ (HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCACQUIRE_FENCE_SCOPE) |
+ (HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCRELEASE_FENCE_SCOPE);
// Publish the packet. Do not modify the package after this point.
uint32_t HeaderWord = Header | (Setup << 16u);
- __atomic_store_n(PacketPtr, HeaderWord, __ATOMIC_RELEASE);
+ __atomic_store_n((uint32_t *)&Packet->header, HeaderWord, __ATOMIC_RELEASE);
// Signal the doorbell about the published packet.
hsa_signal_store_relaxed(Queue->doorbell_signal, PacketId);
@@ -845,15 +844,15 @@ struct AMDGPUQueueTy {
/// barrier dependencies (signals) are satisfied. Assumes the queue is locked
void publishBarrierPacket(uint64_t PacketId,
hsa_barrier_and_packet_t *Packet) {
- uint32_t *PacketPtr = reinterpret_cast<uint32_t *>(Packet);
uint16_t Setup = 0;
- uint16_t Header = HSA_PACKET_TYPE_BARRIER_AND << HSA_PACKET_HEADER_TYPE;
- Header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE;
- Header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE;
+ uint16_t Header =
+ (HSA_PACKET_TYPE_BARRIER_AND << HSA_PACKET_HEADER_TYPE) |
+ (HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCACQUIRE_FENCE_SCOPE) |
+ (HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCRELEASE_FENCE_SCOPE);
// Publish the packet. Do not modify the package after this point.
uint32_t HeaderWord = Header | (Setup << 16u);
- __atomic_store_n(PacketPtr, HeaderWord, __ATOMIC_RELEASE);
+ __atomic_store_n((uint32_t *)&Packet->header, HeaderWord, __ATOMIC_RELEASE);
// Signal the doorbell about the published packet.
hsa_signal_store_relaxed(Queue->doorbell_signal, PacketId);
|
Can you cite the source of this recommendation? I know that https://hsafoundation.com/wp-content/uploads/2021/02/HSA-Runtime-1.2.pdf recommends SCACQUIRE/SCRELEASE but I also see ROCm docs using ACQUIRE/RELEASE https://doc-sep4.readthedocs.io/en/latest/Tutorial/Optimizing-Dispatches.html |
I updated the comment. Please let me know if this clarifies it. The HSA Runtime documents mentions the ACQUIRE/RELEASE versions as being deprecated. |
@@ -826,15 +826,14 @@ struct AMDGPUQueueTy { | |||
/// Assumes the queue lock is acquired. | |||
void publishKernelPacket(uint64_t PacketId, uint16_t Setup, | |||
hsa_kernel_dispatch_packet_t *Packet) { | |||
uint32_t *PacketPtr = reinterpret_cast<uint32_t *>(Packet); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this removed and replaced with a C-style cast on a 16-byte value?
I don't think we need to worry about the header moving honestly, that would be an incredibly ABI breaking change and it's called header for a reason. We already rely implicitly on this storing the two 16-byte values that are right next to eachother. Right now we have "store 32-bytes at the beginning of the struct" and this turns that into "Store 32-bytes at a 16-byte pointer assuming that the out of bounds access is correct." We also do not want to make this two-16 byte stores because this should happen in a single operation. So I think it's fine the way it is. We should probably change the name though, the documentation says it's depcreated and just makes them the same value anyway so it's NFC. From the HSA headers:
|
This is your code no?
|
It is, I should probably update it as well. I think I originally copied part of that from the old plugins somewhere. |
That's ok no problem, I can make it consistent. |
Is what I'd do. |
e2a0e03
to
14252f5
Compare
…ers (llvm#85678) This patch updates the construction of packet headers to replace the usage of ACQUIRE/RELEASE with SCACQUIRE/SCRELEASE which is now recommended. The patch also ensures consistency across kernel dispatches.
…ket headers" (llvm#85950) Reverts llvm#85678
This patch updates the construction of packet headers to replace the usage of ACQUIRE/RELEASE with SCACQUIRE/SCRELEASE which is now recommended.
The patch also fixes a potential source of problems by ensuring the atomic store operation refers to the header field directly and doesn't rely on it being the first field in the packet struct.
Documentation:
Source: https://hsafoundation.com/wp-content/uploads/2021/02/HSA-Runtime-1.2.pdf