-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When can models stop training? #9
Comments
Typically, the checkpoint need to be saved when the lowest losses (ppl) are achieved. But during my experiments, I find the model' performance can be further improved by training more epoches. In my opinion, I think the generative task is a little bit different from the classification task, and you can try to train longer even if the lowest loss are obtained. |
Thank you very much for your project. The code is beautiful, the logic is clear, and I've learned a lot from your project. Will you continue to open source GAN-based dialog models, RL-based conversation models, or some transformer-based dialog models? I'm looking forward to it. Besides, I see in the data_loader.py, the order of loading data is fixed first. Although random is added to each batch thereafter, global randomness can not be achieved.
...
Does that make a difference? Does the model remember the order of the data leading to overfitting? After I add the global random, the MReCoSa model Valid Loss curve is better, and the Test PPL can also be reduced
|
Thank you so much for your attention to this repo.
|
I'm a little confused about some code in In DSHRED.py:
Why is the bold line of code self.attn(output[0].unsqueeze(0), output) and not self.attn(output[-1].unsqueeze(0), output)? |
I observed that when the Test PPL of the model stopped falling, the other test results would still rise. For example, when the epoch is 22, Valid Loss and Test PPL may be the lowest, but after the epoch is greater than 22, Bleu and Distinct results will continue to rise.
According to the experimental steps, theoretically, model parameters should be saved at the lowest point of the Valid dataset Loss. Then the model's evaluation results in Test Dataset can be regarded as the final representation. Is that right?
The text was updated successfully, but these errors were encountered: