-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speed improvement by merging batch normalization and scale #5
Comments
Hi @weishengchong, I'm afraid that I can't answer the exact difference between those two options right now. Talking about training time, IMO, the impact of removing BN layers won't be that significant during test time. |
Hi @sanghoon, Thanks for sharing. For our googlenet_bn, we are trying to merge bn into conv and bias, then |
Very useful information! Thanks the discussion. |
Hi @quietsmile, for merging CBR (convolution, bias, relu), refer to Nvidia GIE figure 3~5. |
hi @weishengchong can you share some information about how to use the opt model after GIE? especially for detection task. thanks. |
@sanghoon can you share some experience how to merge BN and Scale Layer into Conv Layer? |
@sanghoon Dear sanghoon, I have the same confuse about how to merge BN and Scale Layer into Conv Layer, I read your caffe code modifies, and I find that your Conv layers code have no modifies comparing to the main caffe branch. |
@zimenglan-sisu-512 @xiaoxiongli It's just a simple math. conv layer: conv_weight, conv_bias Let us define a vector 'alpha' of scale factors for conv filters: alpha = scale_weight / sqrt(bn_variance / num_bn_samples + eps) If we set conv_bias and conv_weight as:
Then we can get the same result compared to that with the original network by setting bn and scale parameters as: bn_mean[...] = 0 scale_weight[...] = 1 thus we can simply remove bn and scale layers. The code is not opened, but you can easily implement a script to do this. |
@kyehyeon thank you very much! I got it... Now in the github: Right? ^_^ |
@zimenglan-sisu-512 Do you mean how to use gie for detection task? On Oct 12, 2016 4:03 PM, "zimenglan" notifications@github.com wrote:
|
Hi @sanghoon, After merging for 25% speed up, any side effect on training accuracy? On Oct 10, 2016 10:06 PM, "Sanghoon Hong" notifications@github.com wrote:
|
@weishengchong yes, i have tried, but failed. |
I haven't try it. What was your GIE error message? FYI, at GTC Taipei last month, page below was introduced to TensorRT (=GIE) users. Have you tried reproducing the tutorial results? |
@xiaoxiongli @weishengchong |
@weishengchong yes i have followed this instructions, but since i want to use if based on py-faster-rcnn, so i don't know how to do it. |
@weishengchong @quietsmile @zimenglan-sysu-512 @xiaoxiongli Have you guys implemented the code to merge the BN layer into the Conv layer? |
@sanghoon Could you release the code for merging BN layers into Conv layers and the scripts to generate the prototxt of networks? |
@weishengchong,@sanghoon |
@hengck23 Dear hengck23, i see your inference result -- 15ms is really amazing! Which way you get this result: using GIE or using your own modifiy? |
@ xiaoxiongli |
@hengck23 which website..? Do you mean pvanet's caffe branch? as i know, the pvanet's caffe do not implement the code of merging BN/Scale Layer to Convolution Layer. |
@ xiaoxiongli |
@hengck23 Dear heagck23, I know that the merge models are provided, but when i carefully read the pvanet's caffe branch code, I find that the Conv layers code have no modifies comparing to the main caffe branch. Do you mean the pvanet's caffe branch code already merge BN/Scale Layer to Convolution Layer? but i can not find where it is... So if i want to reproduce your 15ms result, i need implement the "merge BN/Scale Layer to Convolution Layer" code by myself, am i right? |
@ xiaoxiongli The conv layers implementation are the same, i.e. same source code. To reproduce 15ms result, just use the new caffemodel file: test.pt & test_690K.model |
@hengck23 Dear hengck23, where can i find the new caffemodel file "test.pt & test_690K.model" that you mentioned above? Would you Please help.. And Which "train.pt" file you used to re-train? ^_^ |
@ xiaoxiongli models/pvanet/lite/test.model These are the 2 models that i obtain 15 ms and 35 ms respectively. |
Thanks @sanghoon for sharing this framework with the community. I set up PVA-Net on my TX1 and ran the lite version successfully. |
@sanghoon Dear sanghoon, you said that after merge the BN and Scale layer to Conv layer, the training iterations becomes 25% faster. How about the inference time? |
@ xiaoxiongli Batch normalization applies linear transformation to input in evaluation phase. It can be absorbed in following convolution layer by manipulating its weights and biases. |
@ xiaoxiongli
|
@hengck23 ,thanks for your kind help.I only find three param under the BatchNorm layer in original.pt. The code you mentioned need four param,"scale, bias, mean, var",how could i solver the problem. ` |
@kyehyeon Dear kyehyeon, in your reply, you said that: conv_bias = conv_bias * alpha + (scale_bias - (bn_mean / num_bn_samples) * alpha) and i wonder how can you inference above formula, in the Batch Normalize's paper, the author said that: So, what i get is: but in your reply and the hengck23's code above: @sanghoon @hengck23
i do not know what's wrong... I feel so confuse, please help, thank you very much ^_^ |
@kyehyeon @ xiaoxiongli
What kyehyeon says above is correct. Please modify the python code based on his comments.
I deduce: |
@hengck23 dear hengck23, i agree with you. ^_^ , but what i care about is How can we deduce the formula: conv_bias = conv_bias * alpha + (scale_bias - (bn_mean / num_bn_samples) * alpha), especially for the first item. |
Hi @hengck23 @xiaoxiongli ,
The average mean is computed by (mean) over (normalization factor). |
int below code:
I know this code need some modifies, and i can get the scale and bias from the Caffe's Scale Layer. but in the Caffe's Batch normalize layer, I can get 3 parameters: mean, var, and the moving_average_fraction , my question is How to use the parameter moving_average_fraction during merge BN/Scale Layer to Conv layer(Do Absorb the BN parameters)? just ignore this parameter? |
Hi @hengck23 @xiaoxiongli @swearos I've committed a simple script to merge 'Conv-BN-Scale' layers into a single Conv layer. Please note that it seemed work correctly. However, I haven't tested it thoroughly. |
@xiaoxiongli have you tested the performance before and after that? |
@zimenglan-sysu-512 mAP is same, before:110ms, after: 93ms |
@sanghoon |
@xiaoxiongli |
@sanghoon @hengck23 @xiaoxiongli @kyehyeon By the way, I also try to use the parameters extracted by command net.params[“layer name"] in 00_classification (in caffe example) to imitate the forward of the batchnorm layer. ( conv_out - bn_mean / num_bn_samples ) / sqrt(bn_variance / num_bn_samples) |
hi @sanghoon i use your script to merge model, but i find that the output of merged one does not match original one. hi @maxenceliu can you share your script? thanks. |
hi @sanghoon , |
anyone can provide some complete example code for tensorflow about |
do you have some example code for tensorflow about how to merge CONV and BN |
Asssuming PAVNET paper table 2 reported FPS for conv layers merged with BN, scaling/shifting/RELU, glad if community can share FPS or speed-up ratio for PAVNET with and without above merging. Thank you very much.
The text was updated successfully, but these errors were encountered: