-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenMP] incorrect concurrent target reduction #70249
Comments
@llvm/issue-subscribers-openmp Author: Ye Luo (ye-luo)
```
#include <iostream>
#include <vector>
#define N 4 int main() for(int i = 0; i < N; i++)
results: 4 4 4 4
results: 0 4 0 0
OMP_TARGET_OFFLOAD=mandatory OMP_NUM_THREADS=32 LIBOMPTARGET_DEBUG=1 ./a.out >& out && grep "Moving 4 bytes|result" out
|
This is a known issue. I'll put a bandage on it to make it work for now. |
Any idea why it went wrong? Mappings all seem OK. |
No, the runtime has always been broken in this case. |
The KernelEnvironment is for compile time information about a kernel. It allows the compiler to feed information to the runtime. The KernelLaunchEnvironment is for dynamic information *per* kernel launch. It allows the rutime to feed information to the kernel that is not shared with other invocations of the kernel. The first use case is to replace the globals that synchronize teams reductions with per-launch versions. This allows concurrent teams reductions. More uses cases will follow, e.g., per launch memory pools. Fixes: llvm#70249
We used to perform team reduction on global memory allocated in the runtime and by clang. This was racy as multiple instances of a kernel, or different kernels with team reductions, would use the same locations. Since we now have the kernel launch environment, we can allocate dynamic memory per-launch, allowing us to move all the state into a non-racy place. Fixes: llvm#70249
We used to perform team reduction on global memory allocated in the runtime and by clang. This was racy as multiple instances of a kernel, or different kernels with team reductions, would use the same locations. Since we now have the kernel launch environment, we can allocate dynamic memory per-launch, allowing us to move all the state into a non-racy place. Fixes: llvm#70249
…70401) The KernelEnvironment is for compile time information about a kernel. It allows the compiler to feed information to the runtime. The KernelLaunchEnvironment is for dynamic information *per* kernel launch. It allows the rutime to feed information to the kernel that is not shared with other invocations of the kernel. The first use case is to replace the globals that synchronize teams reductions with per-launch versions. This allows concurrent teams reductions. More uses cases will follow, e.g., per launch memory pools. Fixes: #70249
My test is still failing sporadically. Tested with 954af75
|
#70752 is on the way |
We used to perform team reduction on global memory allocated in the runtime and by clang. This was racy as multiple instances of a kernel, or different kernels with team reductions, would use the same locations. Since we now have the kernel launch environment, we can allocate dynamic memory per-launch, allowing us to move all the state into a non-racy place. Fixes: llvm#70249
) We used to perform team reduction on global memory allocated in the runtime and by clang. This was racy as multiple instances of a kernel, or different kernels with team reductions, would use the same locations. Since we now have the kernel launch environment, we can allocate dynamic memory per-launch, allowing us to move all the state into a non-racy place. Fixes: #70249
@llvm/issue-subscribers-clang-codegen Author: Ye Luo (ye-luo)
```
#include <iostream>
#include <vector>
#define N 4 int main() std::cout << "results:";
results: 4 4 4 4
results: 0 4 0 0
OMP_TARGET_OFFLOAD=mandatory OMP_NUM_THREADS=32 LIBOMPTARGET_DEBUG=1 ./a.out >& out && grep "Moving 4 bytes|result" out
|
…lvm#70401) The KernelEnvironment is for compile time information about a kernel. It allows the compiler to feed information to the runtime. The KernelLaunchEnvironment is for dynamic information *per* kernel launch. It allows the rutime to feed information to the kernel that is not shared with other invocations of the kernel. The first use case is to replace the globals that synchronize teams reductions with per-launch versions. This allows concurrent teams reductions. More uses cases will follow, e.g., per launch memory pools. Fixes: llvm#70249 Change-Id: I06ce8c63cf5020be778e1a9e06053a1950dfb18e
…m#70752) We used to perform team reduction on global memory allocated in the runtime and by clang. This was racy as multiple instances of a kernel, or different kernels with team reductions, would use the same locations. Since we now have the kernel launch environment, we can allocate dynamic memory per-launch, allowing us to move all the state into a non-racy place. Fixes: llvm#70249 Change-Id: Id8a5932a1cde8cfcbb0e17655ef3f390f6f4d050
running
clang++ -fopenmp --offload-arch=sm_80 main.cpp && ./a.out
expect
but I got random failure
Turning on debugging info
Mapping and transfers seem OK to me. The failure was miserable.
setting
OMP_NUM_THREADS=1
the test passes reliably.The text was updated successfully, but these errors were encountered: