Initialization weights #4

liuyipei · 2016-03-02T00:44:36Z

This work is very exciting! The provided weights does work as expected. The prototxt works out of the box with the default ilsvrc2012 lmdb data that came with caffe's examples.

However, my training loss from scratch has not decreased even after the full 85k iterations. I tried rebuilding the latest version of caffe, running a second time, and increasing the batch size by 4x: none of these attempts seemed to help. Am I correct in understanding that the model is meant to be trained end-to-end without tricks like layer-by-layer training or anything like that?

To help me diagnose my problem, would it be possible for you to provide a reference set of initialization weights caffemodel (or/and one of your earliest intermediate snapshots)?

Thank you for your help!

ducha-aiki · 2016-03-02T14:34:30Z

Try LSUV init https://github.com/ducha-aiki/LSUVinit/blob/master/tools/extra/lsuv_init.py

liuyipei · 2016-03-03T21:44:18Z

It turns out that I needed to reduce the learning rate. After reducing the learning rate by 10x and increasing the effective batch size by 2x, I was able to train from scratch. Less extreme measures are most likely sufficient.

@ducha-aiki thanks. LSUV does seem to have a slightly faster start; in this case, my biggest problem was the learning rate.

ducha-aiki · 2016-03-04T04:49:58Z

@liuyipei with LSUV I was able to converge with big lr. But it is good, that other ways work as well :)
See https://github.com/ducha-aiki/caffenet-benchmark/blob/master/prototxt/architectures/SqueezeNet128_lsuv.prototxt

forresti · 2016-03-04T04:54:37Z

I like how you have trainval and solver in one file. Does Caffe accept that
as-is, or did you customize Caffe to allow it?

Anyway, it looks convenient!
On Mar 3, 2016 8:50 PM, "Dmytro Mishkin" notifications@github.com wrote:

@liuyipei https://github.com/liuyipei with LSUV I was able to converge
with big lr. But it is good, that other ways work as well :)
See
https://github.com/ducha-aiki/caffenet-benchmark/blob/master/prototxt/architectures/SqueezeNet128_lsuv.prototxt

—
Reply to this email directly or view it on GitHub
#4 (comment).

ducha-aiki · 2016-03-04T05:02:12Z

@forresti it accepts, see example in caffe master branch: https://github.com/BVLC/caffe/blob/master/examples/mnist/lenet_consolidated_solver.prototxt

forresti · 2016-03-14T02:09:39Z

@liuyipei
One more thing: I've run into a few problems with cuDNN and numerical correctness. I recommend trying a training run with cuDNN disabled, and seeing if you still get divergence.

forresti · 2016-03-26T00:34:29Z

@liuyipei
Update: We have been experimenting with solver configurations, and we have identified a configuration that converges more reliably. We just committed it to SqueezeNet-master: 0bc03d9

liuyipei closed this as completed Mar 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initialization weights #4

Initialization weights #4

liuyipei commented Mar 2, 2016

ducha-aiki commented Mar 2, 2016

liuyipei commented Mar 3, 2016

ducha-aiki commented Mar 4, 2016

forresti commented Mar 4, 2016

ducha-aiki commented Mar 4, 2016

forresti commented Mar 14, 2016

forresti commented Mar 26, 2016

Initialization weights #4

Initialization weights #4

Comments

liuyipei commented Mar 2, 2016

ducha-aiki commented Mar 2, 2016

liuyipei commented Mar 3, 2016

ducha-aiki commented Mar 4, 2016

forresti commented Mar 4, 2016

ducha-aiki commented Mar 4, 2016

forresti commented Mar 14, 2016

forresti commented Mar 26, 2016