New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenMP] offload to amdgpu hit "Input is not an ELF file" error #77798
Labels
openmp:libomptarget
OpenMP offload runtime
Comments
@llvm/issue-subscribers-openmp Author: Ye Luo (ye-luo)
llvm 164f85d
SLES 15 SP4 ROCM 5.7.0
Use https://github.com/ye-luo/miniqmc
output error at the end of the run.
backtrace shows
|
jhuber6
added a commit
to jhuber6/llvm-project
that referenced
this issue
Jan 11, 2024
Summary: The constructors and destructors look up a symbol in the ELF quickly to determine if they need to be run on the GPU. This allows us to avoid the very slow actions required to do the slower lookup using the vendor API. One problem occurs with how we handle the lifetime of these images. Right now there is no invariant to specify the lifetime of the underlying binary image that is loaded. In the typical case, this comes from the binary itself in the `.llvm.offloading` section, meaning that the lifetime of the binary should match the executable itself. This would work fine, if it weren't for the fact that the plugin is loaded via `dlopen` and can have a teardown order out of sync with the main executable. This was likely what was occuring when this failed on some systems but not others. A potential solution would be to simply copy images into memory so the runtime does not rely on external references. Another would be to manually zero these out after initialization as to prevent this mistake from happening accidentally. The former has the benefit of making some checks easier, and allowing for constant initialization be done on the ELF itself (normally we can't do this because writing to a constant section, e.g. .llvm.offloading is a segfault.). The downside would be the extra time required to copy the image in bulk (Although we are likely doing this in the vendor runtimes as well). This patch went with a quick solution to simply set a boolean value at initialization time if we need to call destructors. Fixes: llvm#77798
EugeneZelenko
added
openmp:libomptarget
OpenMP offload runtime
and removed
openmp
labels
Jan 11, 2024
justinfargnoli
pushed a commit
to justinfargnoli/llvm-project
that referenced
this issue
Jan 28, 2024
…lvm#77828) Summary: The constructors and destructors look up a symbol in the ELF quickly to determine if they need to be run on the GPU. This allows us to avoid the very slow actions required to do the slower lookup using the vendor API. One problem occurs with how we handle the lifetime of these images. Right now there is no invariant to specify the lifetime of the underlying binary image that is loaded. In the typical case, this comes from the binary itself in the `.llvm.offloading` section, meaning that the lifetime of the binary should match the executable itself. This would work fine, if it weren't for the fact that the plugin is loaded via `dlopen` and can have a teardown order out of sync with the main executable. This was likely what was occuring when this failed on some systems but not others. A potential solution would be to simply copy images into memory so the runtime does not rely on external references. Another would be to manually zero these out after initialization as to prevent this mistake from happening accidentally. The former has the benefit of making some checks easier, and allowing for constant initialization be done on the ELF itself (normally we can't do this because writing to a constant section, e.g. .llvm.offloading is a segfault.). The downside would be the extra time required to copy the image in bulk (Although we are likely doing this in the vendor runtimes as well). This patch went with a quick solution to simply set a boolean value at initialization time if we need to call destructors. Fixes: llvm#77798
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
llvm 164f85d
SLES 15 SP4 ROCM 5.7.0
Use https://github.com/ye-luo/miniqmc
output error at the end of the run.
backtrace shows
The text was updated successfully, but these errors were encountered: