Model params change with 0 learning rate #2293

vabh · 2017-08-04T15:15:49Z

While training a convnet with SGD, the train/test loss and error change when using a learning rate of 0.

Log:

[TEST] Loss: 2.3440, Error: 8999/10000 (90%)

[TRAIN Epoch 1] Loss: 2.33110598225, Error: 44996/50000
[TEST] Loss: 2.3364, Error: 9027/10000 (90%)

[TRAIN Epoch 2] Loss: 2.33058553279, Error: 45001/50000
[TEST] Loss: 2.3342, Error: 9025/10000 (90%)

[TRAIN Epoch 3] Loss: 2.33120793699, Error: 44981/50000
[TEST] Loss: 2.3358, Error: 9030/10000 (90%)

[TRAIN Epoch 4] Loss: 2.33072890223, Error: 44987/50000
[TEST] Loss: 2.3350, Error: 9024/10000 (90%)

[TRAIN Epoch 5] Loss: 2.33064097578, Error: 45025/50000
[TEST] Loss: 2.3367, Error: 9029/10000 (90%)

[TRAIN Epoch 6] Loss: 2.33016999603, Error: 44991/50000
[TEST] Loss: 2.3359, Error: 9026/10000 (90%)

[TRAIN Epoch 7] Loss: 2.33080320681, Error: 44999/50000
[TEST] Loss: 2.3352, Error: 9035/10000 (90%)

[TRAIN Epoch 8] Loss: 2.33087820165, Error: 44996/50000
[TEST] Loss: 2.3365, Error: 9018/10000 (90%)

[TRAIN Epoch 9] Loss: 2.33066928387, Error: 45002/50000
[TEST] Loss: 2.3356, Error: 9025/10000 (90%)

This happens with DenseNet and ResNet.

Training script: https://gist.github.com/vabh/50c12ca28619836e32a869aa0e52ea38
The architecture can be chosen in lines 52-65

Links to implementation:
DenseNet: https://github.com/bamos/densenet.pytorch
DenseNet: https://github.com/andreasveit/densenet-pytorch
ResNeXt: https://github.com/prlz77/ResNeXt.pytorch

PyTorch version: 0.1.12_2

The text was updated successfully, but these errors were encountered:

apaszke · 2017-08-04T15:26:16Z

Probably because BatchNorm changes its running averages

apaszke · 2017-08-04T15:26:52Z

Try with model.eval()

rasoolfa · 2017-08-04T18:01:44Z

I observe the same thing with BatchNorm even with model.eval()

shreyassaxena · 2017-08-04T21:34:31Z

The running mean and variance of BatchNorm layer will change epoch to epoch, even when LR is 0. To get the same loss, you can set the momentum parameter of BatchNorm layer to 0 (in conjunction with LR being 0). This should fix the problem.

fmassa · 2017-08-04T22:59:04Z

Do you have weight_decay? That could also explain the change.

vabh · 2017-08-05T21:13:13Z

In the training script I do the testing after model.eval().
The weight_decay is also set to 0.

I missed the fact that the running mean/std would change on every iteration. On setting momentum = 0 in the BN modules, there is no change in the train/test loss/err.

bigbrother33 · 2019-08-23T03:21:20Z

thank ! I use this to fix bn in training:

def fix_bn(m):
    classname = m.__class__.__name__
    if classname.find('BatchNorm') != -1:
        m.eval()

kluaspan · 2021-06-11T04:06:38Z

When I use parameter server training and set the learning rate to 0, the loss changed a lot between epochs:
epoch 1: loss: 0.9186
epoch 2: loss: 0.8939
epoch 3: loss: 0.9186
epoch 4: loss: 0.9957
epoch 5: loss: 0.9710
but without parameter server, the loss is always loss: 0.6916, and my model is a simple one layer of linear.

code cleaning fixing sign-comparator

vabh closed this as completed Aug 5, 2017

samnordmann pushed a commit to samnordmann/pytorch that referenced this issue Jan 12, 2023

Code cleaning (pytorch#2293)

e18ee0f

code cleaning fixing sign-comparator

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model params change with 0 learning rate #2293

Model params change with 0 learning rate #2293

vabh commented Aug 4, 2017

apaszke commented Aug 4, 2017 •

edited

apaszke commented Aug 4, 2017

rasoolfa commented Aug 4, 2017

shreyassaxena commented Aug 4, 2017

fmassa commented Aug 4, 2017

vabh commented Aug 5, 2017 •

edited

bigbrother33 commented Aug 23, 2019 •

edited

kluaspan commented Jun 11, 2021

Model params change with 0 learning rate #2293

Model params change with 0 learning rate #2293

Comments

vabh commented Aug 4, 2017

apaszke commented Aug 4, 2017 • edited

apaszke commented Aug 4, 2017

rasoolfa commented Aug 4, 2017

shreyassaxena commented Aug 4, 2017

fmassa commented Aug 4, 2017

vabh commented Aug 5, 2017 • edited

bigbrother33 commented Aug 23, 2019 • edited

kluaspan commented Jun 11, 2021

apaszke commented Aug 4, 2017 •

edited

vabh commented Aug 5, 2017 •

edited

bigbrother33 commented Aug 23, 2019 •

edited