speed improvement by merging batch normalization and scale #5

VincentChong123 · 2016-09-28T11:05:48Z

Asssuming PAVNET paper table 2 reported FPS for conv layers merged with BN, scaling/shifting/RELU, glad if community can share FPS or speed-up ratio for PAVNET with and without above merging. Thank you very much.

sanghoon · 2016-10-10T14:06:51Z

Hi @weishengchong,

I'm afraid that I can't answer the exact difference between those two options right now.

Talking about training time,
when I finetune a network after I eliminate BN, scale layers (by merging, if possible),
the iterations becomes 25% faster.

IMO, the impact of removing BN layers won't be that significant during test time.

VincentChong123 · 2016-10-10T14:57:44Z

Hi @sanghoon,

Thanks for sharing.

For our googlenet_bn, we are trying to merge bn into conv and bias, then
share blob memory between conv layers and merge CBR layers.

quietsmile · 2016-10-12T02:38:17Z

Very useful information! Thanks the discussion.
@weishengchong What is CBR layer?

VincentChong123 · 2016-10-12T03:25:47Z

Hi @quietsmile, for merging CBR (convolution, bias, relu), refer to Nvidia GIE figure 3~5.

zimenglan-sysu-512 · 2016-10-12T08:03:19Z

hi @weishengchong can you share some information about how to use the opt model after GIE? especially for detection task. thanks.

zimenglan-sysu-512 · 2016-10-12T08:05:21Z

@sanghoon can you share some experience how to merge BN and Scale Layer into Conv Layer?

xiaoxiongli · 2016-10-12T08:09:38Z

@sanghoon Dear sanghoon, I have the same confuse about how to merge BN and Scale Layer into Conv Layer, I read your caffe code modifies, and I find that your Conv layers code have no modifies comparing to the main caffe branch.

kyehyeon · 2016-10-12T09:07:35Z

@zimenglan-sisu-512 @xiaoxiongli

It's just a simple math.
Given the parameters for each layer as:

conv layer: conv_weight, conv_bias
bn layer: bn_mean, bn_variance, num_bn_samples
scale layer: scale_weight, scale_bias

Let us define a vector 'alpha' of scale factors for conv filters:

alpha = scale_weight / sqrt(bn_variance / num_bn_samples + eps)

If we set conv_bias and conv_weight as:

conv_bias = conv_bias * alpha + (scale_bias - (bn_mean / num_bn_samples) * alpha)
for i in range(len(alpha)):
conv_weight[i] = conv_weight[i] * alpha[i]

Then we can get the same result compared to that with the original network by setting bn and scale parameters as:

bn_mean[...] = 0
bn_variance[...] = 1
num_bn_samples = 1

scale_weight[...] = 1
scale_bias[...] = 0

thus we can simply remove bn and scale layers.

The code is not opened, but you can easily implement a script to do this.

xiaoxiongli · 2016-10-12T09:32:09Z

@kyehyeon thank you very much! I got it...

Now in the github:
For example_train_384\train.prototxt --> original prototxt, so I can train without modify conv layer
For example_finetune\train.prototxt --> merge BN and Scale Layer into Conv Layer, so I can NOT train unless I modify the Caffe's Conv Layer according to what you have say above.

Right? ^_^

VincentChong123 · 2016-10-12T12:13:56Z

@zimenglan-sisu-512

Do you mean how to use gie for detection task?

On Oct 12, 2016 4:03 PM, "zimenglan" notifications@github.com wrote:

hi @weishengchong https://github.com/weishengchong can you share some
information about how to use the opt model after GIE? especially for
detection task. thanks.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#5 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AQyyHFtjwbsk_9G4OTdYrCbFAIAcLP4qks5qzJRJgaJpZM4KIr_K
.

VincentChong123 · 2016-10-12T12:18:04Z

Hi @sanghoon,

After merging for 25% speed up, any side effect on training accuracy?

On Oct 10, 2016 10:06 PM, "Sanghoon Hong" notifications@github.com wrote:

Hi @weishengchong https://github.com/weishengchong,

I'm afraid that I can't answer the exact difference between those two
options right now.

Talking about training time,
when I finetune a network after I eliminate BN, scale layers (by merging,
if possible),
the iterations becomes 25% faster.

IMO, the impact of removing BN layers won't be that significant during
test time.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#5 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AQyyHMAQrRyhaL4vJVzqbrHumWPMqTaTks5qykZ8gaJpZM4KIr_K
.

zimenglan-sysu-512 · 2016-10-12T12:24:38Z

@weishengchong yes, i have tried, but failed.

VincentChong123 · 2016-10-12T14:35:50Z

Hi @zimenglan-sysu-512,

I haven't try it. What was your GIE error message?

FYI, at GTC Taipei last month, page below was introduced to TensorRT (=GIE) users.
https://github.com/dusty-nv/jetson-inference

Have you tried reproducing the tutorial results?

sanghoon · 2016-10-12T16:39:23Z

@xiaoxiongli
You can still fine-tune the network. You just can't batch normalize training data.
However, in the Faster R-CNN training, we hasn't used batch normalization at all.
The result will be almost the same.

@weishengchong
I haven't compared the two cases.
I guess there will be no harmful effect.
On the contrary, merging scale-bias layer may improve the resulted accuracy.
It's something I'm planing to take a shot for.

zimenglan-sysu-512 · 2016-10-13T01:30:16Z

@weishengchong yes i have followed this instructions, but since i want to use if based on py-faster-rcnn, so i don't know how to do it.

baiyancheng20 · 2016-10-17T17:02:17Z

@weishengchong @quietsmile @zimenglan-sysu-512 @xiaoxiongli Have you guys implemented the code to merge the BN layer into the Conv layer?

baiyancheng20 · 2016-10-17T17:12:29Z

@sanghoon Could you release the code for merging BN layers into Conv layers and the scripts to generate the prototxt of networks?

hengck23 · 2016-10-23T06:27:07Z

@weishengchong,@sanghoon
Here are my comparison results (python, windows,GTX 1080)

xiaoxiongli · 2016-10-26T06:31:52Z

@hengck23 Dear hengck23, i see your inference result -- 15ms is really amazing! Which way you get this result: using GIE or using your own modifiy?

hengck23 · 2016-10-26T06:34:25Z

@ xiaoxiongli
I use the caffe code from this website. I use cuda 8 /cudnn5.1
I havent use GIE/tensorRT yet (but is doing tensorRT this week).

xiaoxiongli · 2016-10-26T07:31:06Z

@hengck23 which website..? Do you mean pvanet's caffe branch? as i know, the pvanet's caffe do not implement the code of merging BN/Scale Layer to Convolution Layer.

hengck23 · 2016-10-26T08:24:11Z

@ xiaoxiongli
there is no code for merging. But both the original and merge models are provided.

hengck23 · 2016-10-26T10:11:45Z

@ xiaoxiongli
As a reference i also provide zfnet speed. I retrain zfnet from ross's faster-rcnn using pvanet-faster-rcnn here.

xiaoxiongli · 2016-10-26T10:15:41Z

@hengck23 Dear heagck23, I know that the merge models are provided, but when i carefully read the pvanet's caffe branch code, I find that the Conv layers code have no modifies comparing to the main caffe branch. Do you mean the pvanet's caffe branch code already merge BN/Scale Layer to Convolution Layer? but i can not find where it is...

So if i want to reproduce your 15ms result, i need implement the "merge BN/Scale Layer to Convolution Layer" code by myself, am i right?

hengck23 · 2016-10-26T10:18:59Z

@ xiaoxiongli
oringinal prototxt: conv -->bn--> relu --> ....
after merginging, test prototxt: conv --> relu --> ....

The conv layers implementation are the same, i.e. same source code.
But the parameter values changes, i.e. the caffemodel file change.

To reproduce 15ms result, just use the new caffemodel file: test.pt & test_690K.model

xiaoxiongli · 2016-10-26T11:38:37Z

@hengck23 Dear hengck23, where can i find the new caffemodel file "test.pt & test_690K.model" that you mentioned above? Would you Please help..

And Which "train.pt" file you used to re-train? ^_^

hengck23 · 2016-10-26T11:48:50Z

@ xiaoxiongli
please refer to:
https://github.com/sanghoon/pva-faster-rcnn/blob/master/models/pvanet/download_lite_models.sh
https://github.com/sanghoon/pva-faster-rcnn/tree/master/models/pvanet/lite

models/pvanet/lite/test.model
models/pvanet/lite/original.model
test.pt
original.pt

These are the 2 models that i obtain 15 ms and 35 ms respectively.

For the zfnet, you have to modify from the original one.

train.prototxt.txt

karthikmswamy · 2016-10-27T05:12:49Z

Thanks @sanghoon for sharing this framework with the community.

I set up PVA-Net on my TX1 and ran the lite version successfully.
Out of the box, the net forward pass takes 243 ms. However, when you run TX1 at max performance, the net forward pass takes 184 ms. You get a ~60 ms speed improvement just by running your TX1 at max performance.

xiaoxiongli · 2016-10-27T07:53:47Z

@sanghoon Dear sanghoon, you said that after merge the BN and Scale layer to Conv layer, the training iterations becomes 25% faster. How about the inference time?

xiaoxiongli · 2016-10-27T10:21:09Z

@hengck23 @sanghoon

using full/test.model: Mean AP = 0.8385, 92ms in K40
using full/original.model: Mean AP = 0.8385(same as above), 110ms in K40

hengck23 · 2016-10-27T17:49:42Z

@ xiaoxiongli
https://github.com/e-lab/torch-toolbox/blob/master/BN-absorber/BN-absorber.lua

Batch normalization applies linear transformation to input in evaluation phase. It can be absorbed in following convolution layer by manipulating its weights and biases.

hengck23 · 2016-10-27T17:55:47Z

@ xiaoxiongli
https://github.com/terrychenism/NeuralNetTests/blob/master/caffe_utils/gen_bn_inference_v2.py
https://github.com/terrychenism/NeuralNetTests/blob/master/caffe_utils/gen_bn_inference.py

   #Absorb the BN parameters
    weights = caffe.Net(args.model, args.weights, caffe.TEST)
    for i, layer in enumerate(model.layer):
    if layer.name not in to_be_absorbed: continue
    scale, bias, mean, var = [p.data.ravel() for p in weights.params[layer.name]]

    eps = 1e-5
    invstd = 1./np.sqrt( var + eps )
    invstd = invstd*scale

    for j in xrange(i - 1, -1, -1):
        bottom_layer = model.layer[j]
        if layer.bottom[0] in bottom_layer.top:
            W, b = weights.params[bottom_layer.name]
            num = W.data.shape[0]
            if bottom_layer.type == 'Convolution':
                W.data[...] = (W.data * invstd.reshape(num,1, 1,1))
                 b.data[...] = (b.data[...] - mean) * invstd + bias

swearos · 2016-10-30T05:43:51Z

@hengck23 ,thanks for your kind help.I only find three param under the BatchNorm layer in original.pt. The code you mentioned need four param,"scale, bias, mean, var",how could i solver the problem.
`
layer {
name: "conv1/bn"
type: "BatchNorm"
bottom: "conv1"
top: "conv1"
param { lr_mult: 0 decay_mult: 0 }#scale
param { lr_mult: 0 decay_mult: 0 } #shift/bias
param { lr_mult: 0 decay_mult: 0 } #global mean
#global var???
batch_norm_param { use_global_stats: true }
}

`

xiaoxiongli · 2016-10-30T15:01:11Z

@kyehyeon Dear kyehyeon, in your reply, you said that:

conv_bias = conv_bias * alpha + (scale_bias - (bn_mean / num_bn_samples) * alpha)

and i wonder how can you inference above formula, in the Batch Normalize's paper, the author said that:

So, what i get is:
conv_bias = (scale_bias - (bn_mean / num_bn_samples) * alpha), How Can we get the first item(conv_bias * alpha)?

but in your reply and the hengck23's code above: @sanghoon @hengck23

        if bottom_layer.type == 'Convolution':
            W.data[...] = (W.data * invstd.reshape(num,1, 1,1))
             **b.data[...] = (b.data[...] - mean) * invstd + bias**

i do not know what's wrong... I feel so confuse, please help, thank you very much ^_^

hengck23 · 2016-10-30T15:33:22Z

@kyehyeon @ xiaoxiongli
Note that:

caffe may not implement the paper directly (i haven't check in details yet)
be careful of the wordings used in different code. "scale" in one code may not be the same thing in another code or the paper.
different version of caffe implement BN layer differently.
(be careful to read the correct documentation)
if i am not wrong, in current pvanet :
paper uses "mean, var, gamma, beta"
caffe bn layer uses: "mean, var, scale"
caffe scale layer uses: "multipler, offset"
combination of caffe bn layer +scale layer is used to implement the paper batch normalisation

What kyehyeon says above is correct. Please modify the python code based on his comments.
If you look at "batch_norm_layer.cpp"

  const Dtype scale_factor = this->blobs_[2]->cpu_data()[0] == 0 ?
    0 : 1 / this->blobs_[2]->cpu_data()[0];
 caffe_cpu_scale(variance_.count(), scale_factor,
    this->blobs_[0]->cpu_data(), mean_.mutable_cpu_data());
 caffe_cpu_scale(variance_.count(), scale_factor,
    this->blobs_[1]->cpu_data(), variance_.mutable_cpu_data());

I deduce:
layer {
name: "conv1/bn"
type: "BatchNorm"
bottom: "conv1"
top: "conv1"
param { lr_mult: 0 decay_mult: 0 } #mean
param { lr_mult: 0 decay_mult: 0 } #var
param { lr_mult: 0 decay_mult: 0 } #scale
batch_norm_param { use_global_stats: true }
}

xiaoxiongli · 2016-10-30T15:54:40Z

@hengck23 dear hengck23, i agree with you. ^_^ , but what i care about is How can we deduce the formula: conv_bias = conv_bias * alpha + (scale_bias - (bn_mean / num_bn_samples) * alpha), especially for the first item.

sanghoon · 2016-10-31T08:32:15Z

Hi @hengck23 @xiaoxiongli ,
The params in BatchNorm layer contains the following data, repectively:

mean
variance
normalization factor (for moving average)

The average mean is computed by (mean) over (normalization factor).
I'm working on writing a short script for merging Bn layers

xiaoxiongli · 2016-11-01T03:41:01Z

Dear @hengck23 @sanghoon:

int below code:
https://github.com/terrychenism/NeuralNetTests/blob/master/caffe_utils/gen_bn_inference_v2.py

    scale, bias, mean, var = [p.data.ravel() for p in weights.params[layer.name]]

I know this code need some modifies, and i can get the scale and bias from the Caffe's Scale Layer.

but in the Caffe's Batch normalize layer, I can get 3 parameters: mean, var, and the moving_average_fraction , my question is How to use the parameter moving_average_fraction during merge BN/Scale Layer to Conv layer(Do Absorb the BN parameters)? just ignore this parameter?

sanghoon · 2016-11-02T11:05:42Z

Hi @hengck23 @xiaoxiongli @swearos

I've committed a simple script to merge 'Conv-BN-Scale' layers into a single Conv layer.
Please checkout 39570aa.

Please note that it seemed work correctly. However, I haven't tested it thoroughly.
I'd appreciate it if you could give your feedback on this.

xiaoxiongli · 2016-11-03T13:54:45Z

@sanghoon @hengck23 @swearos Dear sanghong:
it is very kind of you, your script seems work fine. thank you!

and i test the model/pvanet/full model:

I use GPU K40:
before your script: 110ms

after your script:
without cudnn: 93ms
with cudnn: 91ms (1x1 convolution layer use caffe engine)

thank you!^_^

zimenglan-sysu-512 · 2016-11-04T03:59:00Z

@xiaoxiongli have you tested the performance before and after that?

xiaoxiongli · 2016-11-04T06:44:52Z

@zimenglan-sysu-512 mAP is same, before:110ms, after: 93ms

hengck23 · 2016-11-09T09:21:13Z

@sanghoon
Thank you very much!

maxenceliu · 2017-02-25T03:09:17Z

@xiaoxiongli
Hi, I also implement the my conv+bn+scale merge code. The inference speed really increase, but not that significant fast like yours！ which is about 16% faster than the previous one. 63ms -> 53ms. And the network looks like google inception v2.

shuyu0815 · 2017-06-30T02:19:29Z

@sanghoon @hengck23 @xiaoxiongli @kyehyeon
Hi, I implement the conv+bn+scale merge code to own model. The inference speed really increase, but the output has significant shift !
Anyone can give me some suggestions ? Thanks !

By the way, I also try to use the parameters extracted by command net.params[“layer name"] in 00_classification (in caffe example) to imitate the forward of the batchnorm layer.
I use the net.params[“layer name”] to extract the bn_mean, bn_variance, num_bn_samples (moving average fraction) of batchnorm layer and use the following formula to get the output but the output is different to the output extracted by command net.blob[“conv”] (output after batch)

( conv_out - bn_mean / num_bn_samples ) / sqrt(bn_variance / num_bn_samples)

zimenglan-sysu-512 · 2017-06-30T12:55:21Z

hi @sanghoon i use your script to merge model, but i find that the output of merged one does not match original one.

hi @maxenceliu can you share your script?

thanks.

pasxalinamed · 2017-11-29T13:25:50Z

After running ./tools/gen_merged_model.py , it executed correctly but the output model has as a result detections that make no sense! What did go wrong? Before that detections were fine

zimenglan-sysu-512 · 2018-07-27T11:03:36Z

hi @sanghoon ,
i find that if i change np.finfo(np.double).eps to 1e-5 which is the default value in BN layer, i can get the right results.
thanks.

PapaMadeleine2022 · 2019-01-28T07:08:18Z

anyone can provide some complete example code for tensorflow about how to merge conv layer and bn layer to one conv layer?

PSlearner · 2019-04-19T03:13:49Z

anyone can provide some complete example code for tensorflow about how to merge conv layer and bn layer to one conv layer?

do you have some example code for tensorflow about how to merge CONV and BN

zimenglan-sysu-512 mentioned this issue Oct 12, 2016

Difference between Finetune prototxt and train prototxt #11

Closed

John-HW-Cao mentioned this issue Oct 20, 2016

Fail to convert full/original.model to full/test.model #15

Closed

sanghoon self-assigned this Oct 31, 2016

sanghoon closed this as completed Nov 11, 2016

speed improvement by merging batch normalization and scale #5

speed improvement by merging batch normalization and scale #5

Comments

VincentChong123 commented Sep 28, 2016

sanghoon commented Oct 10, 2016

VincentChong123 commented Oct 10, 2016 • edited Loading

quietsmile commented Oct 12, 2016

VincentChong123 commented Oct 12, 2016

zimenglan-sysu-512 commented Oct 12, 2016

zimenglan-sysu-512 commented Oct 12, 2016

xiaoxiongli commented Oct 12, 2016

kyehyeon commented Oct 12, 2016

xiaoxiongli commented Oct 12, 2016 • edited Loading

VincentChong123 commented Oct 12, 2016

VincentChong123 commented Oct 12, 2016

zimenglan-sysu-512 commented Oct 12, 2016

VincentChong123 commented Oct 12, 2016

sanghoon commented Oct 12, 2016 • edited Loading

zimenglan-sysu-512 commented Oct 13, 2016

baiyancheng20 commented Oct 17, 2016

baiyancheng20 commented Oct 17, 2016

hengck23 commented Oct 23, 2016 • edited Loading

xiaoxiongli commented Oct 26, 2016

hengck23 commented Oct 26, 2016

xiaoxiongli commented Oct 26, 2016

hengck23 commented Oct 26, 2016

hengck23 commented Oct 26, 2016 • edited Loading

xiaoxiongli commented Oct 26, 2016 • edited Loading

hengck23 commented Oct 26, 2016 • edited Loading

xiaoxiongli commented Oct 26, 2016 • edited Loading

hengck23 commented Oct 26, 2016

karthikmswamy commented Oct 27, 2016

xiaoxiongli commented Oct 27, 2016

xiaoxiongli commented Oct 27, 2016

hengck23 commented Oct 27, 2016 • edited Loading

hengck23 commented Oct 27, 2016 • edited Loading

swearos commented Oct 30, 2016 • edited Loading

xiaoxiongli commented Oct 30, 2016 • edited Loading

hengck23 commented Oct 30, 2016 • edited Loading

xiaoxiongli commented Oct 30, 2016

sanghoon commented Oct 31, 2016 • edited Loading

xiaoxiongli commented Nov 1, 2016 • edited Loading

sanghoon commented Nov 2, 2016

xiaoxiongli commented Nov 3, 2016

zimenglan-sysu-512 commented Nov 4, 2016

xiaoxiongli commented Nov 4, 2016

hengck23 commented Nov 9, 2016

maxenceliu commented Feb 25, 2017

shuyu0815 commented Jun 30, 2017

zimenglan-sysu-512 commented Jun 30, 2017

pasxalinamed commented Nov 29, 2017

zimenglan-sysu-512 commented Jul 27, 2018

PapaMadeleine2022 commented Jan 28, 2019

PSlearner commented Apr 19, 2019

VincentChong123 commented Oct 10, 2016 •

edited

Loading

xiaoxiongli commented Oct 12, 2016 •

edited

Loading

sanghoon commented Oct 12, 2016 •

edited

Loading

hengck23 commented Oct 23, 2016 •

edited

Loading

hengck23 commented Oct 26, 2016 •

edited

Loading

xiaoxiongli commented Oct 26, 2016 •

edited

Loading

hengck23 commented Oct 26, 2016 •

edited

Loading

xiaoxiongli commented Oct 26, 2016 •

edited

Loading

hengck23 commented Oct 27, 2016 •

edited

Loading

hengck23 commented Oct 27, 2016 •

edited

Loading

swearos commented Oct 30, 2016 •

edited

Loading

xiaoxiongli commented Oct 30, 2016 •

edited

Loading

hengck23 commented Oct 30, 2016 •

edited

Loading

sanghoon commented Oct 31, 2016 •

edited

Loading

xiaoxiongli commented Nov 1, 2016 •

edited

Loading