-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce validation loss in training #30
Comments
First try: dropout = 0.1. Run: lunar-serenity-26. |
Promising, surely better results than with 0 dropout. Now trying |
Also tried 0.5 (worthy-sunset-28). Both cases: difference between validation loss and step loss stays slightly lower initially than in the lower dropout factor runs. However, also running logical-butterfly-29 and there we see that after some more training the validation loss still starts climbing again, whereas the training loss more or less completely vanishes, so again overfit on training set at some point. The performance is better though, and still climbing at this moment (epoch 130 of 300), so let's see where it goes. |
We did a sweep again over dropout values and again values between 0.2-0.5 seem to perform best. However, validation loss still increases in all runs after a certain time. As already mentioned in this report, we should probably look into more regularization options to correct for this overfitting on training data. See #61. |
I think we solved the validation loss (overfitting) issue during our regularization sweeps (https://wandb.ai/spokenlanguage/platalea_transformer/reports/Jan-29-Project-Update-Regularization-rates-conclusion--Vmlldzo0MzY3MDg). |
In run dainty-dawn-20 we saw the validation loss increasing again starting from epoch 20, approximately. We should find a way to train that reduces validation loss together with training loss.
The text was updated successfully, but these errors were encountered: