Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How much time the training costs? #1

Closed
zizhaozhang opened this issue Nov 12, 2015 · 2 comments
Closed

How much time the training costs? #1

zizhaozhang opened this issue Nov 12, 2015 · 2 comments

Comments

@zizhaozhang
Copy link

Hi Saining,

I am wondering what's the mini-batch size are you using in practice for your pre-trained model. Your papers said 10 but your code set it to 1 while both use the same number of iterations. The two settings should have different results.
Why I ask this is because your paper mentions that you use just 7 hours to train. I also use a Tesla K40c but training takes 1 minutes for 20 iterations (the mini batch size is 1). For this speed, it needs 4 days to finish 10,000 iterations.
It is still running.
Could you help me figure it out?

@s9xie
Copy link
Owner

s9xie commented Nov 14, 2015

Hi Zizhao,

We updated our code base to the newer version of caffe and use mini-batch size 1 for full resolution images, in this way we are able to boost our performance to 0.790 ODS. If you resize the images to 400x400 as in the paper and set the mini-batch size to 10, then it will be much faster. From my experiment log 7 hours will be enough for 10,000 iterations in the previous setting.

@zizhaozhang
Copy link
Author

Hi Saining,
Thanks for your answer.
So if I understand correctly, you mean you use image size 400x400 with a mini-batch size of 10 will be much faster than use full size and mini-batch size of 1?
Why? Large Mini-batch should slow down the time of one iteration, right?

I am still do not understand why my iteration is so slow, with the data you provided and the parameters you set in the code (mini-batch is 1), the one iteration time is 1 minutes. So totally 10,000 mins are needed. I am using Tesla K40c. Do you have any idea about this?

By the way, could you show me your loss plot? It is interesting to find the loss is high and vibrate but the results are already visually good after a few thousands of iterations.

Thank you so much for your answers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants