Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Batch Normalization immediately before non-linearity or after in Keras? #5465

Closed
xiaoming-qxm opened this issue Feb 21, 2017 · 3 comments

Comments

@xiaoming-qxm
Copy link

xiaoming-qxm commented Feb 21, 2017

def conv2d_bn(x, nb_filter, nb_row, nb_col,
              border_mode='same', subsample=(1, 1),
              name=None):
    '''Utility function to apply conv + BN.
    '''

    x = Convolution2D(nb_filter, nb_row, nb_col,
                      subsample=subsample,
                      activation='relu',
                      border_mode=border_mode,
                      name=conv_name)(x)
    x = BatchNormalization(axis=bn_axis, name=bn_name)(x)
    return x

When I use official inception_v3 model in keras, I find that they use BatchNormalization after 'relu' nonlinearity as above code script.

But in the Batch Normalization paper, the authors said

we add the BN transform immediately before the nonlinearity, by
normalizing x=Wu+b.

Then I view the implementation of inception in tensorflow which add BN immediately before the nonlinearity as they said. For more details in inception ops.py

I'm confused. Why do people use above style in Keras other than the following?

def conv2d_bn(x, nb_filter, nb_row, nb_col,
              border_mode='same', subsample=(1, 1),
              name=None):
    '''Utility function to apply conv + BN.
    '''

    x = Convolution2D(nb_filter, nb_row, nb_col,
                      subsample=subsample,
                      border_mode=border_mode,
                      name=conv_name)(x)
    x = BatchNormalization(axis=bn_axis, name=bn_name)(x)
    x = Activation('relu')(x)
    return x

In the Dense case:

x = Dense(1024, name='fc')(x)
x = BatchNormalization(axis=bn_axis, name=bn_name)(x)
x = Activation('relu')(x)
@achalshah20
Copy link
Contributor

Ideally, BN should be performed before non linearity. But I have seen some of the papers, where BN performed better if we apply after 'Relu'.

Check this: https://github.com/ducha-aiki/caffenet-benchmark/blob/master/batchnorm.md

@stale stale bot added the stale label May 23, 2017
@stale
Copy link

stale bot commented May 23, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.

@stale stale bot closed this as completed Jun 22, 2017
@Mickky666
Copy link

I'm reading the code of DCGAN, it seems that they used BN between the convolutional layer and lrelu. However, according to Jeremy Howard http://forums.fast.ai/t/questions-about-batch-normalization/230/2. He claims that we should use after non linearity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants