New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any tips to reduce training time #4
Comments
Make sure you have cudnn enabled for chainer installation. I am able to run with batchsize = 3 after I fixed the cudnn issue (for 12gb GPU). |
Hi, thank you for the comments, @bobye and @cricket1 . |
could somebody share a pretrained model ? |
@xscjun I guess this project is still in progress, actually not complete. You are welcome to contribute. |
In a few weeks I will try to reduce memory usage by implementing forget function as long as we can increase GPU utils. |
Hello, Tnx for the great work, and is there anyway to save checkpoints while training the model since i haven't found the trained model after i finished training. |
I am using p2.x.large machine on aws .It has 12gb gpu memory
For batch size > 1 I see high gpu utility and gpu memory usage .Thus ending up with out of memory issues
With batch size = 1 ,an epoch takes 4 hrs ,thus 3000 epochs = 500 days :)
Is there a pretrained model which I can use for training
Is such High usage normal.I see that each image is 10kb and mask is 2.5kb.I am using chainer for the first time .Am i doing something wrong .
The text was updated successfully, but these errors were encountered: