Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialization weights #4

Closed
liuyipei opened this issue Mar 2, 2016 · 7 comments
Closed

Initialization weights #4

liuyipei opened this issue Mar 2, 2016 · 7 comments

Comments

@liuyipei
Copy link

liuyipei commented Mar 2, 2016

This work is very exciting! The provided weights does work as expected. The prototxt works out of the box with the default ilsvrc2012 lmdb data that came with caffe's examples.

However, my training loss from scratch has not decreased even after the full 85k iterations. I tried rebuilding the latest version of caffe, running a second time, and increasing the batch size by 4x: none of these attempts seemed to help. Am I correct in understanding that the model is meant to be trained end-to-end without tricks like layer-by-layer training or anything like that?

To help me diagnose my problem, would it be possible for you to provide a reference set of initialization weights caffemodel (or/and one of your earliest intermediate snapshots)?

Thank you for your help!

@ducha-aiki
Copy link

@liuyipei
Copy link
Author

liuyipei commented Mar 3, 2016

It turns out that I needed to reduce the learning rate. After reducing the learning rate by 10x and increasing the effective batch size by 2x, I was able to train from scratch. Less extreme measures are most likely sufficient.

@ducha-aiki thanks. LSUV does seem to have a slightly faster start; in this case, my biggest problem was the learning rate.

@liuyipei liuyipei closed this as completed Mar 3, 2016
@ducha-aiki
Copy link

@liuyipei with LSUV I was able to converge with big lr. But it is good, that other ways work as well :)
See https://github.com/ducha-aiki/caffenet-benchmark/blob/master/prototxt/architectures/SqueezeNet128_lsuv.prototxt

@forresti
Copy link
Owner

forresti commented Mar 4, 2016

I like how you have trainval and solver in one file. Does Caffe accept that
as-is, or did you customize Caffe to allow it?

Anyway, it looks convenient!
On Mar 3, 2016 8:50 PM, "Dmytro Mishkin" notifications@github.com wrote:

@liuyipei https://github.com/liuyipei with LSUV I was able to converge
with big lr. But it is good, that other ways work as well :)
See
https://github.com/ducha-aiki/caffenet-benchmark/blob/master/prototxt/architectures/SqueezeNet128_lsuv.prototxt


Reply to this email directly or view it on GitHub
#4 (comment).

@ducha-aiki
Copy link

@forresti
Copy link
Owner

@liuyipei
One more thing: I've run into a few problems with cuDNN and numerical correctness. I recommend trying a training run with cuDNN disabled, and seeing if you still get divergence.

@forresti
Copy link
Owner

@liuyipei
Update: We have been experimenting with solver configurations, and we have identified a configuration that converges more reliably. We just committed it to SqueezeNet-master: 0bc03d9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants