The loss of Step 2 #27

YueLiao · 2017-11-06T10:58:35Z

During my training process, the loss of step 1 suddenly jumps from a low value (like 2.1) to a high value (like 13) in step 2. is that a normal situation?

waleedka · 2017-11-06T19:18:09Z

Doesn't seem normal. Does it go down afterwards? You can try a smaller learning rate and see if that improves the training.

YueLiao · 2017-11-07T00:53:39Z

The rpn_loss and mrcnn_loss are normal while the loss(l1_loss) is jumps a high value(like epoch 40: loss = 1.9,while epoch 41:loss =13.1,other loss are normal).And I try a smaller learing rate(lr = 0.001,0.0001),but it is also in this situation.

Dref360 · 2017-11-07T16:17:32Z

Yeah I have a similar problem. All the losses are small but this one.

Sharathnasa · 2017-11-07T16:45:06Z

Even I'm facing the same issue after first stage training. Not yet completed the second stage.

…

On Tue, Nov 7, 2017, 9:47 PM Frédéric Branchaud-Charron < ***@***.***> wrote: Yeah I have a similar problem. All the losses are small but this one. [image: selection_115] <https://user-images.githubusercontent.com/8976546/32504072-ea0da892-c3ac-11e7-94ed-13be7962b0e2.png> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#27 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHMgs1CzfZlkJg-F2od8JCg1SMzaVVOkks5s0IKfgaJpZM4QTD8g> .

waleedka · 2017-11-08T00:02:19Z

@Dref360 Did you change anything at around step 7?

The main losses to pay attention to are the individual losses like rpn_class_loss, mrcnn_bbox_loss, ..etc. You'd want to see nice graphs on those like the ones posted by @Dref360 above.

The total loss is the sum of the individual losses plus the L1 weight regularization loss. The L1 weight regularization loss is the sum across all trainable weights, so it could change drastically if you change the number of layers included in the training. So if you train the heads only and then switch to training all the layers, you'd see a big jump in the total loss because you're including more layers and therefore the sum of the L1 of the weights is larger. This is okay.

It might be a good idea to divide the L1 regularization by the number of weights to get a mean rather than a sum, and that should remove that unexpected behavior. I'll look into doing that this weekend.

leicaand · 2017-11-14T12:26:31Z

# Add L2 Regularization
reg_losses = [keras.regularizers.l2(self.config.WEIGHT_DECAY)(w) for w in self.keras_model.trainable_weights if 'gamma' not in w.name and 'beta' not in w.name]

Gamma and beta parameters shouldn't be included in regularization loss. (batch norm isn't updated by the backprop)

waleedka · 2017-11-15T10:54:40Z

@leicaand Good catch. I pushed the fix. Thanks.

I also pushed an update to divide the weight regularization by the number of weights so the loss is the mean of the L2 rather than the sum. This removes the confusing jump in the total loss in the graphs.

DingkunLiu · 2018-02-24T05:07:25Z

I am confused about this issue.
First, in batchnorm layer, setting trainable False means not updating the running mean and std but not the beta and gamma, and they are still trainable. Because I think beta and gamma is updated via gradient but not this update op. Also another evidence, the beta and gamma in trained model is not zero and one, indicating that they have been updated during training.
Second, does it make sense to divide the l2 loss by its size? Cause its gradient is also divided by this factor, the bigger the size of a weight matrix, the less it's updated every step by the weight regularization loss. I don't think it is a good idea.

DingkunLiu · 2018-02-24T05:16:48Z

Batchnorm has 4 different weights, running mean and std is updated by moving average operation while beta and gamma are updated via gradient. If you want to skip those aren't updated during bp, you should exclude 'moving_mean' and 'moving_variance' but not 'beta' and 'gamma'

waleedka closed this as completed Nov 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The loss of Step 2 #27

The loss of Step 2 #27

YueLiao commented Nov 6, 2017

waleedka commented Nov 6, 2017

YueLiao commented Nov 7, 2017 •

edited

Dref360 commented Nov 7, 2017

Sharathnasa commented Nov 7, 2017 via email

waleedka commented Nov 8, 2017

leicaand commented Nov 14, 2017 •

edited

waleedka commented Nov 15, 2017

DingkunLiu commented Feb 24, 2018 •

edited

DingkunLiu commented Feb 24, 2018

The loss of Step 2 #27

The loss of Step 2 #27

Comments

YueLiao commented Nov 6, 2017

waleedka commented Nov 6, 2017

YueLiao commented Nov 7, 2017 • edited

Dref360 commented Nov 7, 2017

Sharathnasa commented Nov 7, 2017 via email

waleedka commented Nov 8, 2017

leicaand commented Nov 14, 2017 • edited

waleedka commented Nov 15, 2017

DingkunLiu commented Feb 24, 2018 • edited

DingkunLiu commented Feb 24, 2018

YueLiao commented Nov 7, 2017 •

edited

leicaand commented Nov 14, 2017 •

edited

DingkunLiu commented Feb 24, 2018 •

edited