Batch Normalization layer gives significant difference between train and validation loss on the exact same data #7265

ghost · 2017-07-07T08:45:43Z

Hi,
I have a pretty simple gist reproducing the problem: https://gist.github.com/izikgo/2579b8c26231d5c9a5a2c7d313860d33

In short, I get VERY different results between test and validation in a CNN with BN layers, even when I cancel scale and center, on the exact same data. The data is only a single batch of size 128 from the MNIST dataset.

Anyone knows if this is an acceptable behavior? I know that BN acts differently in train and inference, but the difference looks too big to me.

Thanks,
Izik

The text was updated successfully, but these errors were encountered:

srxdev0619 · 2017-07-07T09:25:42Z

I too am facing a similar issue, the distribution of activations of the same Conv layer is very different during training and inference on the same data.

This is the distribution of activations during inference

This is the distribution of the activations during training.

The value of $\gamma$ is very close to 1 and the value of $\beta$ is very close to 0 for this particular layer.

ghost · 2017-07-10T07:28:41Z

After reading the code I understand why I'm getting these results. In training time there are two moving averages that are updated based on each batch - the mean and the variance. These values are supposed to approximate the population statistics. They are initialized to zero and one respectively, and then and each step multiplied by the momentum value (default is 0.99) and added the new value*0.01. At inference (test) time, the normalization uses these statistics. For this reason, it takes these values a little while to arrive at the "real" mean and variance of the data. If I lower the momentum for my specific example, the results makes much more sense...

ysyyork · 2017-07-26T07:16:57Z

Hi @izikgo , would you mind share what value you set for momentum? I also came across this issue.

weiguanwang · 2018-05-30T11:05:47Z

@izikgo Thank you so much for you hint!!! I tried to reduce the momentum and solve it! I guess I need to read the paper to see the meaning of the momentum to understand the reason.

ghost changed the title ~~Batch Normalization layer given a significant difference in train and validation loss on the exact same data.~~ Batch Normalization layer gives significant difference between train and validation loss on the exact same data Jul 9, 2017

ghost closed this as completed Jul 10, 2017

lminer mentioned this issue Jul 13, 2018

Batch normalization causes validation loss to fluctuate wildly #10666

Closed

w4nderlust mentioned this issue May 10, 2019

CLI loss\accuracy output is displayed incorrectly after a training resume ludwig-ai/ludwig#328

Closed

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch Normalization layer gives significant difference between train and validation loss on the exact same data #7265

Batch Normalization layer gives significant difference between train and validation loss on the exact same data #7265

ghost commented Jul 7, 2017 •

edited by ghost

srxdev0619 commented Jul 7, 2017 •

edited

ghost commented Jul 10, 2017 •

edited by ghost

ysyyork commented Jul 26, 2017 •

edited

weiguanwang commented May 30, 2018

Batch Normalization layer gives significant difference between train and validation loss on the exact same data #7265

Batch Normalization layer gives significant difference between train and validation loss on the exact same data #7265

Comments

ghost commented Jul 7, 2017 • edited by ghost

srxdev0619 commented Jul 7, 2017 • edited

ghost commented Jul 10, 2017 • edited by ghost

ysyyork commented Jul 26, 2017 • edited

weiguanwang commented May 30, 2018

ghost commented Jul 7, 2017 •

edited by ghost

srxdev0619 commented Jul 7, 2017 •

edited

ghost commented Jul 10, 2017 •

edited by ghost

ysyyork commented Jul 26, 2017 •

edited