You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@xlnwel Thanks for the feedback. It seems that the first GPU got a copy of all the processes. It may be caused by the system and CUDA version etc. Do you mind sharing more information, such as your system, GPUs, etc.?
@xlnwel Unfortunately, I can not find an exact explanation for this. My guess is when launching the sub-process, it somehow uses device 0 so it occupies some memory in device 0 (for initializing CUDA). However, in our code, there is no operation specifically on device 0 in the act function. I am not sure about the exact reason. Most systems and CUDA versions should not have this issue. I would suggest trying other systems/CUDA versions.
I run the code with the following command, spotting that there are 9 processes occupying the first GPU. Why would that be the case?
The initialization logs look fine to me
Here's a snapshot of the result of
nvidia-smi
The text was updated successfully, but these errors were encountered: