Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is the first GPU over-consumed? #31

Closed
xlnwel opened this issue Sep 10, 2021 · 4 comments
Closed

Why is the first GPU over-consumed? #31

xlnwel opened this issue Sep 10, 2021 · 4 comments
Labels
environment Environment relative

Comments

@xlnwel
Copy link

xlnwel commented Sep 10, 2021

I run the code with the following command, spotting that there are 9 processes occupying the first GPU. Why would that be the case?

python3 train.py --gpu_devices 0,1,2,3 --num_actor_devices 3 --num_actors 3 --training_device 3

The initialization logs look fine to me
image

Here's a snapshot of the result of nvidia-smi

image

@xlnwel xlnwel changed the title Why is first GPU over-consumed? Why is the first GPU over-consumed? Sep 10, 2021
@daochenzha
Copy link
Collaborator

@xlnwel Thanks for the feedback. It seems that the first GPU got a copy of all the processes. It may be caused by the system and CUDA version etc. Do you mind sharing more information, such as your system, GPUs, etc.?

@xlnwel
Copy link
Author

xlnwel commented Sep 11, 2021

Hi, Thanks for replying.

I use 4 1080ti GPUs. The system is Ubuntu 20.04.2 LTS. More CUDA info are listed below
image

@daochenzha
Copy link
Collaborator

@xlnwel Unfortunately, I can not find an exact explanation for this. My guess is when launching the sub-process, it somehow uses device 0 so it occupies some memory in device 0 (for initializing CUDA). However, in our code, there is no operation specifically on device 0 in the act function. I am not sure about the exact reason. Most systems and CUDA versions should not have this issue. I would suggest trying other systems/CUDA versions.

@xlnwel
Copy link
Author

xlnwel commented Sep 11, 2021

Thanks for your response, @daochenzha. Yeah, I've also read the related part of the code without any conclusion. Thanks anyway.

@karoka karoka added the environment Environment relative label Oct 12, 2021
@karoka karoka closed this as completed Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
environment Environment relative
Projects
None yet
Development

No branches or pull requests

3 participants