How to use multiple GPUs to train dagger models? #35

PeihaoChen · 2022-03-31T12:39:20Z

Thanks for the great work.

When I run training using dagger_trainer.py, I found that a large part of training time is taken by 1) collecting data and 2) training the model using collected data. The first process can be speeded up by setting more simulator GPU (SIMULATOR_GPU_IDS). However, the second process can only use one GPU (TORCH_GPU_ID) by default.

Is there any easy way to use multiple GPUs to speed up the second process? Or should I use torch.distributed to reproduce the code by myself?

Many thanks!

jacobkrantz · 2022-03-31T21:20:29Z

Hi, for the model training portion of the DAgger trainer, there currently is no GPU parallelism. That would need a new implementation. torch.distributed is used for training waypoint models with the DDPPO trainer, so that could be a reference point. Good luck!

PeihaoChen · 2022-04-01T00:54:43Z

Thanks for your reply!

PeihaoChen closed this as completed Apr 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use multiple GPUs to train dagger models? #35

How to use multiple GPUs to train dagger models? #35

PeihaoChen commented Mar 31, 2022

jacobkrantz commented Mar 31, 2022

PeihaoChen commented Apr 1, 2022

How to use multiple GPUs to train dagger models? #35

How to use multiple GPUs to train dagger models? #35

Comments

PeihaoChen commented Mar 31, 2022

jacobkrantz commented Mar 31, 2022

PeihaoChen commented Apr 1, 2022