-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss stuck, not decreasing #27
Comments
Thank you very much for these hints. Actually, I think that the stage-wise optimization is the way to go. If I first optimize using SUM loss and then I resume, after 10 epochs, using MAX loss, the problem disappears and the validation metrics keep increasing smoothly. However, I will pay attention also to the batch size, as you suggested. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi, I'm noticing a very strange loss behavior during the training phase.
![image](https://user-images.githubusercontent.com/25117311/75141130-9dee4780-56f0-11ea-9c43-a840cdb77e3d.png)
Initially, the loss decreases as it should be. At a certain point, it reaches a plateau from which most of the times cannot escape.
In particular, if I use pre-extracted features without fine-tuning the image encoder, the plateau is overtaken quite immediately, as show in the following plot:
However, if I try to fine-tune, the loss get stuck forever:
![image](https://user-images.githubusercontent.com/25117311/75143951-55d22380-56f6-11ea-889b-bf275566a993.png)
I noticed that the loss stuck on a very specific value, that is 2 * (batch_size * loss_margin).
![equation](https://camo.githubusercontent.com/b36b6175cafb1a3206083bceb7e83bc2e9d641f01086bcdb6749639e4d2766d7/68747470733a2f2f6c617465782e636f6465636f67732e636f6d2f6769662e6c617465783f7328692c2673706163653b6327292673706163653b3d2673706163653b7328692c2673706163653b6329)
![equation](https://camo.githubusercontent.com/e743f190b80467bd4931e59e12945f4a6330dac3024b6af3a015b7c1460a91f9/68747470733a2f2f6c617465782e636f6465636f67732e636f6d2f6769662e6c617465783f732869272c2673706163653b63292673706163653b3d2673706163653b7328692c2673706163653b6329)
It seems that the loss is collapsing to values where the difference between positive and negative pair similarities is always 0:
and
I'm using margin = 0.2. For the pre-extracted features I used a batch size = 128, while for the fine-tuning the batch size = 32. The configuration is the very same as yours.
In general I noticed this behavior happening when the network is too complex.
Maybe the reason is that good hard negatives cannot be found if I use batch sizes less than 128. However, I have hardware constraints.
Did you notice a similar behavior in your experiments? If so, how did you solve?
Thank you very much
The text was updated successfully, but these errors were encountered: