-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent Performance and Loss when Resuming Training #13
Comments
config.yaml
|
@nuaazs The 'lr_scheduler' is a warmup cosine scheduler. If you only set the 'num_epoch' to 200 and then resume training, the learning rate will increase at epoch 100. To avoid this, I recommend adjusting the 'lr_scheduler' configuration to maintain a lower lr value. Alternatively, you may simply need to wait and train for more epochs to achieve optimal performance. |
Thank you for your response. @wanghuii1 |
Thank you for your excellent work. 🙂
We have observed that whenever we resume training with a different number of epochs after training completion, the loaded historical model exhibits significantly lower accuracy compared to the corresponding epoch during the original training. For instance, when loading a model trained for 100 epochs, its performance is only comparable to that of a model trained for 30 epochs.
This inconsistency in performance after resuming training poses a challenge for us to continue training from a checkpoint and obtain the desired results.
The text was updated successfully, but these errors were encountered: