-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch.cuda.synchronize blocks CUDA execution on other threads using other devices. #24963
Comments
as far as I know, these are expected CUDA semantics. It synchronizes the entire device context in the process, at a driver level. |
Block execution on a different device? |
I misread that. that sounds suspicious, cc: @csarofeen @ptrblck any ideas what's up? |
Does sound suspicious, we'll have to take a look. |
@ptrblck will take a look at this. |
I've taken multiple shots at this issue and tried to reproduce it. @heiner I also cannot see the synchronizations in the provided profile, so I used nsight-systems instead. Also, it seems you've just profiled the |
Hey @ptrblck, thanks for taking a stab at this! I am not surprised using multiple devices works fine with As for only profiling the "randint": Note that the line in https://gist.github.com/heiner/c812a38a338878f5c02f6193511afc6a#file-cudasync-py-L76
is only an (optional) annotation of that statement, not profiling only that block. The statement that requests profiling of the overall program should be https://gist.github.com/heiner/c812a38a338878f5c02f6193511afc6a#file-cudasync-py-L137
Now I agree with your assessment that this bug might not be an issue with CUDA synchronization but rather about the GIL. Notice though that not using CUDA creates a different profiling picture, namely one where not all threads are blocked at the same time. Could it be the case that some CUDA-specific codepath in PyTorch is holding the GIL in a situation where that's not necessary? |
馃悰 Bug
In a situation in which different Python threads execute CUDA operations on different devices, calling torch.cuda.synchronize blocks CUDA executions on all threads, including those on other CUDA devices.
To Reproduce
git clone https://gist.github.com/c812a38a338878f5c02f6193511afc6a.git cudasync
cd cudasync/
OMP_NUM_THREADS=1 python cudasync.py
(produces trace file)Expected behavior
torch.cuda.sync(device=my_device)
should not affect execution of CUDA operations on devices other thanmy_device
.Environment
Additional context
Trace file: cudasync.trace.gz
This probably isn't a GIL issue as it doesn't seem to happen when the other threads execute CPU PyTorch operations.
Perfetto link to trace: https://ui.perfetto.dev/#!/?s=76397c96cea6a47c45aed36cd84586cf54469d34089d3578afb7e795219229
Screenshot:
The text was updated successfully, but these errors were encountered: