Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any tips to reduce training time #4

Closed
cricket1 opened this issue Dec 16, 2016 · 7 comments
Closed

Any tips to reduce training time #4

cricket1 opened this issue Dec 16, 2016 · 7 comments

Comments

@cricket1
Copy link

  • I am using p2.x.large machine on aws .It has 12gb gpu memory

  • For batch size > 1 I see high gpu utility and gpu memory usage .Thus ending up with out of memory issues

  • With batch size = 1 ,an epoch takes 4 hrs ,thus 3000 epochs = 500 days :)

  • Is there a pretrained model which I can use for training

  • Is such High usage normal.I see that each image is 10kb and mask is 2.5kb.I am using chainer for the first time .Am i doing something wrong .

@bobye
Copy link
Contributor

bobye commented Dec 30, 2016

Make sure you have cudnn enabled for chainer installation. I am able to run with batchsize = 3 after I fixed the cudnn issue (for 12gb GPU).

@shiba24
Copy link
Owner

shiba24 commented Dec 31, 2016

Hi, thank you for the comments, @bobye and @cricket1 .
Large memory usage is sometimes inevitable for this kind of neural network.
Just one tip: chainer recently implemented Forget function (http://docs.chainer.org/en/stable/reference/functions.html#forget), and using this definitely we can reduce memory usage, make batchsize larger! (Sorry i do not have enough time to implement this now, though...)

@cricket1
Copy link
Author

cricket1 commented Jan 3, 2017

@bobye @shiba24 thnks for the input will try both

@xscjun
Copy link

xscjun commented Jan 5, 2017

could somebody share a pretrained model ?

@bobye
Copy link
Contributor

bobye commented Jan 7, 2017

@xscjun I guess this project is still in progress, actually not complete. You are welcome to contribute.

@shiba24
Copy link
Owner

shiba24 commented Jan 18, 2017

In a few weeks I will try to reduce memory usage by implementing forget function as long as we can increase GPU utils.

@GhadeerMohamad
Copy link

Hello, Tnx for the great work, and is there anyway to save checkpoints while training the model since i haven't found the trained model after i finished training.
thank you

@shiba24 shiba24 closed this as completed Jun 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants