Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch Normalization layer gives significant difference between train and validation loss on the exact same data #7265

Closed
ghost opened this issue Jul 7, 2017 · 4 comments

Comments

@ghost
Copy link

ghost commented Jul 7, 2017

Hi,
I have a pretty simple gist reproducing the problem: https://gist.github.com/izikgo/2579b8c26231d5c9a5a2c7d313860d33

In short, I get VERY different results between test and validation in a CNN with BN layers, even when I cancel scale and center, on the exact same data. The data is only a single batch of size 128 from the MNIST dataset.

Anyone knows if this is an acceptable behavior? I know that BN acts differently in train and inference, but the difference looks too big to me.

Thanks,
Izik

@srxdev0619
Copy link

srxdev0619 commented Jul 7, 2017

I too am facing a similar issue, the distribution of activations of the same Conv layer is very different during training and inference on the same data.

screen shot 2017-07-07 at 2 51 46 pm

This is the distribution of activations during inference

screen shot 2017-07-07 at 2 51 36 pm

This is the distribution of the activations during training.

The value of $\gamma$ is very close to 1 and the value of $\beta$ is very close to 0 for this particular layer.

@ghost ghost changed the title Batch Normalization layer given a significant difference in train and validation loss on the exact same data. Batch Normalization layer gives significant difference between train and validation loss on the exact same data Jul 9, 2017
@ghost
Copy link
Author

ghost commented Jul 10, 2017

After reading the code I understand why I'm getting these results. In training time there are two moving averages that are updated based on each batch - the mean and the variance. These values are supposed to approximate the population statistics. They are initialized to zero and one respectively, and then and each step multiplied by the momentum value (default is 0.99) and added the new value*0.01. At inference (test) time, the normalization uses these statistics. For this reason, it takes these values a little while to arrive at the "real" mean and variance of the data. If I lower the momentum for my specific example, the results makes much more sense...

@ghost ghost closed this as completed Jul 10, 2017
@ysyyork
Copy link

ysyyork commented Jul 26, 2017

Hi @izikgo , would you mind share what value you set for momentum? I also came across this issue.

@weiguanwang
Copy link

@izikgo Thank you so much for you hint!!! I tried to reduce the momentum and solve it! I guess I need to read the paper to see the meaning of the momentum to understand the reason.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants