Why is the first GPU over-consumed? #31

xlnwel · 2021-09-10T02:24:27Z

I run the code with the following command, spotting that there are 9 processes occupying the first GPU. Why would that be the case?

python3 train.py --gpu_devices 0,1,2,3 --num_actor_devices 3 --num_actors 3 --training_device 3

The initialization logs look fine to me

Here's a snapshot of the result of nvidia-smi

The text was updated successfully, but these errors were encountered:

daochenzha · 2021-09-11T02:21:50Z

@xlnwel Thanks for the feedback. It seems that the first GPU got a copy of all the processes. It may be caused by the system and CUDA version etc. Do you mind sharing more information, such as your system, GPUs, etc.?

xlnwel · 2021-09-11T03:19:11Z

Hi, Thanks for replying.

I use 4 1080ti GPUs. The system is Ubuntu 20.04.2 LTS. More CUDA info are listed below

daochenzha · 2021-09-11T07:40:57Z

@xlnwel Unfortunately, I can not find an exact explanation for this. My guess is when launching the sub-process, it somehow uses device 0 so it occupies some memory in device 0 (for initializing CUDA). However, in our code, there is no operation specifically on device 0 in the act function. I am not sure about the exact reason. Most systems and CUDA versions should not have this issue. I would suggest trying other systems/CUDA versions.

xlnwel · 2021-09-11T11:39:23Z

Thanks for your response, @daochenzha. Yeah, I've also read the related part of the code without any conclusion. Thanks anyway.

xlnwel changed the title ~~Why is first GPU over-consumed?~~ Why is the first GPU over-consumed? Sep 10, 2021

karoka added the environment Environment relative label Oct 12, 2021

karoka closed this as completed Oct 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is the first GPU over-consumed? #31

Why is the first GPU over-consumed? #31

xlnwel commented Sep 10, 2021

daochenzha commented Sep 11, 2021

xlnwel commented Sep 11, 2021

daochenzha commented Sep 11, 2021

xlnwel commented Sep 11, 2021

Why is the first GPU over-consumed? #31

Why is the first GPU over-consumed? #31

Comments

xlnwel commented Sep 10, 2021

daochenzha commented Sep 11, 2021

xlnwel commented Sep 11, 2021

daochenzha commented Sep 11, 2021

xlnwel commented Sep 11, 2021