How can I reproduce the results on Caffe? #2

handong1587 · 2017-05-14T12:46:13Z

Hi, thanks for sharing this MobileNets!
I am just wandering if I can reproduce same results on Caffe from scatch. Is it possible if you can sharing the solver.prototxt you used to achieve this accuracy rate? And how many days did it take to complete training?
Thanks~

shicai · 2017-05-14T12:56:05Z

I suggest not training this model from scratch using caffe, since caffe use group to implement channel-wise convolution, which is very very slow and inefficient.
If possible, you can use lr=1e-3 and wd=1e-4 to finetune the pretrained model for your own task.

handong1587 · 2017-05-14T13:43:27Z

Thanks for your advice!

ryusaeba · 2017-07-06T12:08:46Z

Hi @shicai

If I would like to fine tune the pretrained model, What number would you suggest for Convolution, BatchNorm and Scale layer? According to your above suggestion, I guess that would be
lr=1e-3 and wd=1e-4 for Convolution.

For BatchNorm, would be shown as below
layer {
name: "conv1/bn"
type: "BatchNorm"
bottom: "conv1"
top: "conv1"
param {
lr_mult: 0 ////////////////////////// keep zero, correct?
decay_mult: 0 ////////////////////////// keep zero, correct?
}
param {
lr_mult: 0 ////////////////////////// keep zero, correct?
decay_mult: 0 ////////////////////////// keep zero, correct?
}
param {
lr_mult: 0 ////////////////////////// keep zero, correct?
decay_mult: 0 ////////////////////////// keep zero, correct?
}
}

Scale layer would be
layer {
name: "conv1/scale"
type: "Scale"
bottom: "conv1"
top: "conv1"
param {
lr_mult: 1 ////////////////////////// is this correct?
decay_mult: 0
}
param {
lr_mult: 1 ////////////////////////// is this correct?
decay_mult: 0
}
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}

Please help me to check param { lr_mult and decay_mult }. Thanks :)

shicai · 2017-07-06T12:19:20Z

If you use the pretrained weights for detection, I sugguest you fixing all the BN parameters by setting lr_mult = 0 and decay_mult = 0.

ryusaeba · 2017-07-07T00:22:49Z

Thanks for your suggestion. I will finetune Convolution layer only and fix all the BN parameters 👍

shicai · 2017-07-07T03:01:50Z

@ryusaeba btw, to fix all the BN parameters, you should also set use_global_stats: true in batch_norm_param so as to keep bn mean/variance unchanged during fine tuning stage.

ryusaeba · 2017-07-07T12:09:21Z

wow, that is a great helpful reminder. many many thanks :)

ryusaeba · 2017-07-07T13:15:37Z

@shicai
If we fix all BN parameters (lr_mult=decay_mult=0 and use_global_stats: true), we also don't want to finetune the convolution at base network , right? I have this question is because the mean/variance maybe different when we finetune the convolution at base network.
Please correct me if my understanding is incorrect. Very appreciated.

shicai · 2017-07-07T16:36:06Z

It's ok to fine tune conv layers when fixing bn parameters, since bn mean/var parameters are not stable during detection training stage.

ryusaeba · 2017-07-08T00:07:31Z

@shicai
So if our target is classification, the finetune setting for BN parameters would be like
what I posted before and with use_global_stats: false ? (#2 (comment))

shicai · 2017-07-08T00:09:08Z

yes.

ryusaeba · 2017-07-08T00:19:45Z

@shicai
Great thanks! Your experience really help me a lots 👍 I am glad to have discussion with you.

ryusaeba · 2017-07-11T08:27:18Z

@shicai
I have one more question about mean/var parameters. Why are these parameters not stable during detection training stage? Please share your experience with me. Many Thanks:)
Originally, I think this is because detection network use negative sample to do training but I am not sure the real reason.

shicai · 2017-07-11T09:22:52Z

I think it is mainly because batch size for training detection models is very small.

ryusaeba · 2017-07-11T10:27:11Z

Could you share me a rough value about batch size? what number is belonging to small or large?

shicai · 2017-07-11T11:27:19Z

if you want stable bn training, you'd better set batch size to 16 or even larger. but for detection tasks, batch size is always set to 1 or 2, due to memory reasons.

handong1587 closed this as completed May 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I reproduce the results on Caffe? #2

How can I reproduce the results on Caffe? #2

handong1587 commented May 14, 2017

shicai commented May 14, 2017

handong1587 commented May 14, 2017 •

edited

ryusaeba commented Jul 6, 2017 •

edited

shicai commented Jul 6, 2017

ryusaeba commented Jul 7, 2017

shicai commented Jul 7, 2017

ryusaeba commented Jul 7, 2017

ryusaeba commented Jul 7, 2017

shicai commented Jul 7, 2017

ryusaeba commented Jul 8, 2017 •

edited

shicai commented Jul 8, 2017

ryusaeba commented Jul 8, 2017

ryusaeba commented Jul 11, 2017 •

edited

shicai commented Jul 11, 2017

ryusaeba commented Jul 11, 2017

shicai commented Jul 11, 2017

How can I reproduce the results on Caffe? #2

How can I reproduce the results on Caffe? #2

Comments

handong1587 commented May 14, 2017

shicai commented May 14, 2017

handong1587 commented May 14, 2017 • edited

ryusaeba commented Jul 6, 2017 • edited

shicai commented Jul 6, 2017

ryusaeba commented Jul 7, 2017

shicai commented Jul 7, 2017

ryusaeba commented Jul 7, 2017

ryusaeba commented Jul 7, 2017

shicai commented Jul 7, 2017

ryusaeba commented Jul 8, 2017 • edited

shicai commented Jul 8, 2017

ryusaeba commented Jul 8, 2017

ryusaeba commented Jul 11, 2017 • edited

shicai commented Jul 11, 2017

ryusaeba commented Jul 11, 2017

shicai commented Jul 11, 2017

handong1587 commented May 14, 2017 •

edited

ryusaeba commented Jul 6, 2017 •

edited

ryusaeba commented Jul 8, 2017 •

edited

ryusaeba commented Jul 11, 2017 •

edited