New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training on a single GPU (Losses keep fluctuating and do not converge) #31
Comments
same question, I even got my losses in nan, what happen? (Actually, I don't know how to know the number of GPUs but when I checked the GPU, the computer only came out with one name, so I guess I only have one GPU. ) @nuschandra , have you solved this problem? here is part of the log
|
@zizhaozhang Thank you for your hard work on this, could you please help with this problem? Or actually, this cannot run on custom data with one GPU? |
Hi,
I am training the Faster RCNN model on 10% of labelled COCO data. It seems like while training with 1 GPU, the losses don't converge and based on an earlier issue (#12), I understand that with 1 GPU and a batch size of 1 due to tensorpack constaints, the batch size may be too small for the network to train and converge. If that's the case, what are the alternatives? Is the only alternative to move away from tensorpack in order to be able to use a larger batch size?
Any inputs/suggestions are more than welcome as I am a bit stuck at the moment and do not have access to more than 1 GPU.
Regards,
Chandra
The text was updated successfully, but these errors were encountered: