Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow training speed and low GPU utilization #147

Open
Sau1-Goodman opened this issue Jun 17, 2024 · 4 comments
Open

Slow training speed and low GPU utilization #147

Sau1-Goodman opened this issue Jun 17, 2024 · 4 comments

Comments

@Sau1-Goodman
Copy link

Hi, thank you for your work, it's amazing! I'm a student who just started DRL. I set up the simulation environment according to the tutorial and used your original program to train (by executing 'python3 train_velodyne_td3.py'). In RVIZ, I can see that the robot is running normally (just like the GIF image in the example). But the GPU usage is very low (power: 48W/170W, Memory-usage: 3074MiB/12050MiB), and the time between each epoch is also very long (about 10 minutes). My computer's CPU is AMD 5800X, GPU is RTX3060, nvidia driver is 470.256.02, and cudatoolkit 11.3.1 is installed in the anaconda environment. Execute 'torch.cuda.is_available()' in the python environment, and the output result is True. Is this training speed and GPU usage normal? Thank you very much for your answer!
GPU_usage

@reiniscimurs
Copy link
Owner

Hi,

Cuda will mostly be used only during the train call of the model and memory consumption depends on your batch size/gradients/model size. There is no substantial reason why consumption should be high.

10 minutes between epochs seems reasonable. By default, epoch will run for approx. 5000 steps with step length 0.1 seconds. That means that 1 epoch will run for at least 8.3 minutes. So that makes sense to me.

@Sau1-Goodman
Copy link
Author

Thank you very much for your response. I have two more queries that I'm hoping to clarify:
1、I'm unsure if it's feasible to make adjustments to some parameters, such as increasing the batch_size on line 119 of train_velodyne_td3.py , to enhance GPU utilization. This could potentially reduce training time and better leverage the computational power of the GPU. Do you think this approach is viable? Additionally, do you have any other suggestions to optimize this process?
2、Are the training results in the TD3 -> result & run folder? I'm trying to deploying these trained results onto an actual robot. Could you kindly offer me some tips on how to utilize these results? I would greatly appreciate any suggestions or pointers you might have.
Many thanks!

@reiniscimurs
Copy link
Owner

  1. It would increase GPU consumption but only during the backpropagation. The average consumption will probably stay the same. It would not realistically speed up the training as most of the time is spent collecting samples/executing policy. See tutorial for details: https://medium.com/p/b744852345ac
  2. No, the weights are stored in pytorch_models, see description in each folder for what is stored in them. See test_velodyne_td3.py on how to load model weights. Deploying on real robot will depend entirely on the robot and sensors used, but you can adapt the env file with the proper topics once you have connected everything to ROS.

@Sau1-Goodman
Copy link
Author

Thank you very much for your answer, I will work harder!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants