-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NVPTX][Draft] Make __nvvm_nanosleep
a no-op if unsupported
#81033
Conversation
Summary; The LLVM C library currently uses `nanosleep` in the RPC interface and for the C library `nanosleep` function. We build the LLVM C library for every single NVPTX architecture individually currently, which is not ideal. The goal is to make the LLVM-IR target independent, unfortunately the one snag is the `nanosleep` function which will crash if used on a GPU older than sm_70. There are three possible solutions to this. 1. Use `__nvvm_reflect(__CUDA_ARCH__)` like the libdevice functions. This will work as long as optimizations are on, not ideal. 2. Get rid of the use of nanosleep in `libc`. This isn't ideal as sleeping during the busy-wait loops is helpful for thread scheduling and it prevents us from providing `nanosleep` as a C library function. 3. This patch, which simply makes it legal on all architectures but do nothing is it's older than sm_70. This is a draft to question if this is an acceptable hack, as an intrinsic silently doing nothing is not always a good idea. Potentially a new intrinsic could be added instead, but there is also a desire to have intrinsics map 1-to-1 with hardware.
I do not think this is the right thing to do. "do nothing" is not what one would expect from a Let's unpack your problem a bit. __nvvm_reflect() is probably closest to what you would need. However, IIUIC, if you use it to provide nanosleep-based variant and an alternative for the older GPUs, the In other words, pushing nanosleep implementation into an intrinsic makes things compile everywhere at the expense of doing a wrong thing on the older GPUs. I do not think it's a good trade-off. Perhaps a better approach would be to incorporate dead branch elimination onto NVVMReflect pass itself. We do know that it is the explicit intent of |
Thanks, I made this a draft because I figured it wasn't the correct thing to do but wanted to pose the question.
I think that would be a good solution if possible. Would this simply mean scheduling a global DCE pass right after the NVVM reflect pass? Since that seems to be run at Or, maybe we just have a really shallow implementation in the NVVM reflect pass that collapses the branch? |
Okay, |
The question is -- who's going to provide a fallback implementation for the nanosleepbuiltin for the older GPUs. I do not think it's LLVM's job, so constraining the builtin is appropriate. However, nothing stops you from providing your own implementation in libc using inline asm. Something along these lines:
|
Summary;
The LLVM C library currently uses
nanosleep
in the RPC interface andfor the C library
nanosleep
function. We build the LLVM C library forevery single NVPTX architecture individually currently, which is not
ideal. The goal is to make the LLVM-IR target independent, unfortunately
the one snag is the
nanosleep
function which will crash if used on aGPU older than sm_70. There are three possible solutions to this.
__nvvm_reflect(__CUDA_ARCH__)
like the libdevice functions.This will work as long as optimizations are on, not ideal.
libc
. This isn't ideal assleeping during the busy-wait loops is helpful for thread scheduling
and it prevents us from providing
nanosleep
as a C libraryfunction.
nothing is it's older than sm_70.
This is a draft to question if this is an acceptable hack, as an
intrinsic silently doing nothing is not always a good idea. Potentially
a new intrinsic could be added instead, but there is also a desire to
have intrinsics map 1-to-1 with hardware.