Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The loss of Step 2 #27

Closed
YueLiao opened this issue Nov 6, 2017 · 9 comments
Closed

The loss of Step 2 #27

YueLiao opened this issue Nov 6, 2017 · 9 comments

Comments

@YueLiao
Copy link

YueLiao commented Nov 6, 2017

During my training process, the loss of step 1 suddenly jumps from a low value (like 2.1) to a high value (like 13) in step 2. is that a normal situation?

@waleedka
Copy link
Collaborator

waleedka commented Nov 6, 2017

Doesn't seem normal. Does it go down afterwards? You can try a smaller learning rate and see if that improves the training.

@YueLiao
Copy link
Author

YueLiao commented Nov 7, 2017

The rpn_loss and mrcnn_loss are normal while the loss(l1_loss) is jumps a high value(like epoch 40: loss = 1.9,while epoch 41:loss =13.1,other loss are normal).And I try a smaller learing rate(lr = 0.001,0.0001),but it is also in this situation.

@Dref360
Copy link

Dref360 commented Nov 7, 2017

Yeah I have a similar problem. All the losses are small but this one.
selection_115

@Sharathnasa
Copy link

Sharathnasa commented Nov 7, 2017 via email

@waleedka
Copy link
Collaborator

waleedka commented Nov 8, 2017

@Dref360 Did you change anything at around step 7?

The main losses to pay attention to are the individual losses like rpn_class_loss, mrcnn_bbox_loss, ..etc. You'd want to see nice graphs on those like the ones posted by @Dref360 above.

The total loss is the sum of the individual losses plus the L1 weight regularization loss. The L1 weight regularization loss is the sum across all trainable weights, so it could change drastically if you change the number of layers included in the training. So if you train the heads only and then switch to training all the layers, you'd see a big jump in the total loss because you're including more layers and therefore the sum of the L1 of the weights is larger. This is okay.

It might be a good idea to divide the L1 regularization by the number of weights to get a mean rather than a sum, and that should remove that unexpected behavior. I'll look into doing that this weekend.

@leicaand
Copy link

leicaand commented Nov 14, 2017

# Add L2 Regularization
reg_losses = [keras.regularizers.l2(self.config.WEIGHT_DECAY)(w) for w in self.keras_model.trainable_weights if 'gamma' not in w.name and 'beta' not in w.name]

Gamma and beta parameters shouldn't be included in regularization loss. (batch norm isn't updated by the backprop)

@waleedka
Copy link
Collaborator

@leicaand Good catch. I pushed the fix. Thanks.

I also pushed an update to divide the weight regularization by the number of weights so the loss is the mean of the L2 rather than the sum. This removes the confusing jump in the total loss in the graphs.

@DingkunLiu
Copy link
Contributor

DingkunLiu commented Feb 24, 2018

I am confused about this issue.
First, in batchnorm layer, setting trainable False means not updating the running mean and std but not the beta and gamma, and they are still trainable. Because I think beta and gamma is updated via gradient but not this update op. Also another evidence, the beta and gamma in trained model is not zero and one, indicating that they have been updated during training.
Second, does it make sense to divide the l2 loss by its size? Cause its gradient is also divided by this factor, the bigger the size of a weight matrix, the less it's updated every step by the weight regularization loss. I don't think it is a good idea.

@DingkunLiu
Copy link
Contributor

Batchnorm has 4 different weights, running mean and std is updated by moving average operation while beta and gamma are updated via gradient. If you want to skip those aren't updated during bp, you should exclude 'moving_mean' and 'moving_variance' but not 'beta' and 'gamma'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants