Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch Normalization Problem #1446

Closed
KlaymenGC opened this issue Jan 11, 2016 · 10 comments
Closed

Batch Normalization Problem #1446

KlaymenGC opened this issue Jan 11, 2016 · 10 comments

Comments

@KlaymenGC
Copy link

I'm building an end-to-end image segmentation network whose output has the same size as the input.

When I add BatchNormalization layer to the model, the prediction of any image in the train/test set is strangely always the same, when I remove the BN layer, everything goes normal.

I've looked into the code of BatchNormalization many times but I still can't figure out what's wrong, any ideas?

@keunwoochoi
Copy link
Contributor

I think I'm in a very similar situation. I'm building a convnet for regression task with audio (spectrograms) and within a 10-20 min of training the loss converges to a certain value regardless of optimiser or other settings. I'll check it out what happens if I remove BN.

@lemuriandezapada
Copy link

check your layer activations, maybe you're just saturating your neurons and once you get there learning stops

@KlaymenGC
Copy link
Author

@lemuriandezapada thank you for the reply, could you elaborate? the basic structure goes like:

Convolutional Layer
BatchNormalization
RELU
...
Deconvolutional Layer
BatchNormalization
ELU
...
Softmax

and the loss keeps decreasing. I've visualized the layers, it seems that after certain deconvolutional layers the activation becomes very noisy and looks like a mess...

@lemuriandezapada
Copy link

It already sounds rather weird to me to have the BatchNorm before the activation function. You're basically forcing half your neurons at all time to be active. Maybe that's not the best representation? I haven't worked much with it but I wouldn't put it before anything that doesn't have weights.
It would make much more sense to put it after the activations.

@KlaymenGC
Copy link
Author

@lemuriandezapada I was doing that because I saw it on a paper about the similar image segmentation task, before that, I was putting the BatchNormalization layer after RELU, but the problem still exists.

@keunwoochoi
Copy link
Contributor

I was also confused on whereto put BN, I think we should put BN after activations for Dense layer, and before activations for Convolution2D layer. At least it seems what they're saying.

@KlaymenGC
Copy link
Author

after reading the paper again, I think putting the BN layer before non-linearity makes sense, and that's what they did in the paper (see Sec. 3.2):

This formulation covers both fully-connected and convolutional layers. We add the BN transform
immediately before the nonlinearity, by normalizing x = Wu + b

Anyway, the BN in Keras doesn't work in my case, I'll try implementing it myself.

@mdering
Copy link
Contributor

mdering commented Jan 19, 2016

is there a deconvolutional layer in keras that I'm missing?

@KlaymenGC
Copy link
Author

@keunwoochoi I think @fchollet had fixed the problem a few days ago, now the BN works without a problem :) 👍

@rvenugop
Copy link

Hey can you let us know how you bot the BN to work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants