Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I reproduce the results on Caffe? #2

Closed
handong1587 opened this issue May 14, 2017 · 16 comments
Closed

How can I reproduce the results on Caffe? #2

handong1587 opened this issue May 14, 2017 · 16 comments

Comments

@handong1587
Copy link

Hi, thanks for sharing this MobileNets!
I am just wandering if I can reproduce same results on Caffe from scatch. Is it possible if you can sharing the solver.prototxt you used to achieve this accuracy rate? And how many days did it take to complete training?
Thanks~

@shicai
Copy link
Owner

shicai commented May 14, 2017

I suggest not training this model from scratch using caffe, since caffe use group to implement channel-wise convolution, which is very very slow and inefficient.
If possible, you can use lr=1e-3 and wd=1e-4 to finetune the pretrained model for your own task.

@handong1587
Copy link
Author

handong1587 commented May 14, 2017

Thanks for your advice!

@ryusaeba
Copy link

ryusaeba commented Jul 6, 2017

Hi @shicai

If I would like to fine tune the pretrained model, What number would you suggest for Convolution, BatchNorm and Scale layer? According to your above suggestion, I guess that would be
lr=1e-3 and wd=1e-4 for Convolution.

For BatchNorm, would be shown as below
layer {
name: "conv1/bn"
type: "BatchNorm"
bottom: "conv1"
top: "conv1"
param {
lr_mult: 0 ////////////////////////// keep zero, correct?
decay_mult: 0 ////////////////////////// keep zero, correct?
}
param {
lr_mult: 0 ////////////////////////// keep zero, correct?
decay_mult: 0 ////////////////////////// keep zero, correct?
}
param {
lr_mult: 0 ////////////////////////// keep zero, correct?
decay_mult: 0 ////////////////////////// keep zero, correct?
}
}

Scale layer would be
layer {
name: "conv1/scale"
type: "Scale"
bottom: "conv1"
top: "conv1"
param {
lr_mult: 1 ////////////////////////// is this correct?
decay_mult: 0
}
param {
lr_mult: 1 ////////////////////////// is this correct?
decay_mult: 0
}
scale_param {
filler {
value: 1
}
bias_term: true
bias_filler {
value: 0
}
}
}

Please help me to check param { lr_mult and decay_mult }. Thanks :)

@shicai
Copy link
Owner

shicai commented Jul 6, 2017

If you use the pretrained weights for detection, I sugguest you fixing all the BN parameters by setting lr_mult = 0 and decay_mult = 0.

@ryusaeba
Copy link

ryusaeba commented Jul 7, 2017

Thanks for your suggestion. I will finetune Convolution layer only and fix all the BN parameters 👍

@shicai
Copy link
Owner

shicai commented Jul 7, 2017

@ryusaeba btw, to fix all the BN parameters, you should also set use_global_stats: true in batch_norm_param so as to keep bn mean/variance unchanged during fine tuning stage.

@ryusaeba
Copy link

ryusaeba commented Jul 7, 2017

wow, that is a great helpful reminder. many many thanks :)

@ryusaeba
Copy link

ryusaeba commented Jul 7, 2017

@shicai
If we fix all BN parameters (lr_mult=decay_mult=0 and use_global_stats: true), we also don't want to finetune the convolution at base network , right? I have this question is because the mean/variance maybe different when we finetune the convolution at base network.
Please correct me if my understanding is incorrect. Very appreciated.

@shicai
Copy link
Owner

shicai commented Jul 7, 2017

It's ok to fine tune conv layers when fixing bn parameters, since bn mean/var parameters are not stable during detection training stage.

@ryusaeba
Copy link

ryusaeba commented Jul 8, 2017

@shicai
So if our target is classification, the finetune setting for BN parameters would be like
what I posted before and with use_global_stats: false ? (#2 (comment))

@shicai
Copy link
Owner

shicai commented Jul 8, 2017

yes.

@ryusaeba
Copy link

ryusaeba commented Jul 8, 2017

@shicai
Great thanks! Your experience really help me a lots 👍 I am glad to have discussion with you.

@ryusaeba
Copy link

ryusaeba commented Jul 11, 2017

@shicai
I have one more question about mean/var parameters. Why are these parameters not stable during detection training stage? Please share your experience with me. Many Thanks:)
Originally, I think this is because detection network use negative sample to do training but I am not sure the real reason.

@shicai
Copy link
Owner

shicai commented Jul 11, 2017

I think it is mainly because batch size for training detection models is very small.

@ryusaeba
Copy link

Could you share me a rough value about batch size? what number is belonging to small or large?

@shicai
Copy link
Owner

shicai commented Jul 11, 2017

if you want stable bn training, you'd better set batch size to 16 or even larger. but for detection tasks, batch size is always set to 1 or 2, due to memory reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants