-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenMP] Inner reduction crashes at runtime #66708
Comments
@llvm/issue-subscribers-openmp
In the following example:
the inner reduction crashes at runtime. Compile line:
The postlink LLVM IR looks ok. The postopt LLVM IR looks suspicious with function bodies being set to unreachable and then called from the kernel. This could well be a red herring and/or the error could be somewhere else but it could be starting point. At first glance it seems to be an issue with OpenMP Opt. |
@rodrigo-ceccato May be of your interest |
If we update the state, or indicate a pessimistic fixpoint, we need to consider NestedParallelism too. Fixes llvm#66708 That said, the reproducer still needs malloc which we don't support on AMD GPU.
If we update the state, or indicate a pessimistic fixpoint, we need to consider NestedParallelism too. Fixes part of #66708 That said, the reproducer still needs malloc which we don't support on AMD GPU. Will be added later.
The patch contains a basic BumpAllocator for (AMD)GPUs to allow us to run more tests. The allocator implements `malloc`, both internally and externally, while we continue to default to the NVIDIA `malloc` when we target NVIDIA GPUs. Once we have smarter or customizable allocators we should consider this choice, for now, this allocator is better than none. It traps if it is out of memory, making it easy to debug. Heap size is configured via `LIBOMPTARGET_HEAP_SIZE` and defaults to 512MB. It allows to track allocation statistics via `LIBOMPTARGET_DEVICE_RTL_DEBUG=8` (together with `-fopenmp-target-debug=8`). Two tests were added, and one was enabled. This is the next step towards fixing llvm#66708
The patch contains a basic BumpAllocator for (AMD)GPUs to allow us to run more tests. The allocator implements `malloc`, both internally and externally, while we continue to default to the NVIDIA `malloc` when we target NVIDIA GPUs. Once we have smarter or customizable allocators we should consider this choice, for now, this allocator is better than none. It traps if it is out of memory, making it easy to debug. Heap size is configured via `LIBOMPTARGET_HEAP_SIZE` and defaults to 512MB. It allows to track allocation statistics via `LIBOMPTARGET_DEVICE_RTL_DEBUG=8` (together with `-fopenmp-target-debug=8`). Two tests were added, and one was enabled. This is the next step towards fixing #66708
The patch contains a basic BumpAllocator for (AMD)GPUs to allow us to run more tests. The allocator implements `malloc`, both internally and externally, while we continue to default to the NVIDIA `malloc` when we target NVIDIA GPUs. Once we have smarter or customizable allocators we should consider this choice, for now, this allocator is better than none. It traps if it is out of memory, making it easy to debug. Heap size is configured via `LIBOMPTARGET_HEAP_SIZE` and defaults to 512MB. It allows to track allocation statistics via `LIBOMPTARGET_DEVICE_RTL_DEBUG=8` (together with `-fopenmp-target-debug=8`). Two tests were added, and one was enabled. This is the next step towards fixing llvm#66708 Change-Id: I181cdca714994b285c0cd1d16dd3546809cc5dd2
The patch contains a basic BumpAllocator for (AMD)GPUs to allow us to run more tests. The allocator implements `malloc`, both internally and externally, while we continue to default to the NVIDIA `malloc` when we target NVIDIA GPUs. Once we have smarter or customizable allocators we should consider this choice, for now, this allocator is better than none. It traps if it is out of memory, making it easy to debug. Heap size is configured via `LIBOMPTARGET_HEAP_SIZE` and defaults to 512MB. It allows to track allocation statistics via `LIBOMPTARGET_DEVICE_RTL_DEBUG=8` (together with `-fopenmp-target-debug=8`). Two tests were added, and one was enabled. This is the next step towards fixing llvm#66708 Change-Id: I1fdec0f2a24dfff49ccbad5d43a0fd68916ccf16
In the following example:
the inner reduction crashes at runtime.
Compile line:
The postlink LLVM IR looks ok. The postopt LLVM IR looks suspicious with function bodies being set to unreachable and then called from the kernel. This could well be a red herring and/or the error could be somewhere else but it could be starting point. At first glance it seems to be an issue with OpenMP Opt.
The text was updated successfully, but these errors were encountered: