-
Notifications
You must be signed in to change notification settings - Fork 778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about the training speed #17
Comments
I have tried 1080 Ti or P100 for training, the speed is around 0.4 batches/sec.
|
okay, thanks for your suggestions. I will try it. |
I have changed Trainer to MultiGPUTrainer and use 4 x 1080 Ti for training, but the training speed seems slower than single GPU:( Is there any point I have ignored? When using single GPU, it can run a little faster at 0.01 batches/sec. |
@tabsun Hi, thanks for your interest in our work first. There must be some problems here, and here are several potential bugs:
|
@JiahuiYu Thanks for your advices. I have tried to move my training data into the same disk as my code. The speed has no change. What do you mean by GPU utilization ? I have checked all 4 GPU memory has been occupied, is this enough? Maybe I should deep into the training code to verify what has blocked the process. |
@tabsun You can view your GPU utilization by |
|####################| 100.00%, 109326/0 sec. train epoch 1, iter 10000/10000, loss 0.089837, 0.09 batches/sec.
[2018-04-20 02:47:04 @logger.py:43] Trigger callback: Trigger ModelSaver: Save model to model_logs/20180418202435371480_node1_imagenet_NORMAL_wgan_gp_full_model_image_256/snap-10000.
|####################| 100.00%, 108851/0 sec. train epoch 2, iter 10000/10000, loss 0.030459, 0.09 batches/sec.
[2018-04-21 09:01:15 @logger.py:43] Trigger callback: Trigger ModelSaver: Save model to model_logs/20180418202435371480_node1_imagenet_NORMAL_wgan_gp_full_model_image_256/snap-20000.
|####################| 100.00%, 108510/0 sec. train epoch 3, iter 10000/10000, loss 0.030714, 0.09 batches/sec.
[2018-04-22 15:09:45 @logger.py:43] Trigger callback: Trigger ModelSaver: Save model to model_logs/20180418202435371480_node1_imagenet_NORMAL_wgan_gp_full_model_image_256/snap-30000.
|####################| 100.00%, 108276/0 sec. train epoch 4, iter 10000/10000, loss 0.038624, 0.09 batches/sec.
[2018-04-23 21:14:21 @logger.py:43] Trigger callback: Trigger ModelSaver: Save model to model_logs/20180418202435371480_node1_imagenet_NORMAL_wgan_gp_full_model_image_256/snap-40000.
|############--------| 61.50%, 66691/41605 sec. train epoch 5, iter 6150/10000, loss 0.038147, 0.09 batches/sec.
The text was updated successfully, but these errors were encountered: