-
Notifications
You must be signed in to change notification settings - Fork 446
Closed
Description
Hi authors, thanks for your excellent work! When I want to reproduce your results on small models, I find that at a certain timestep, the training return increases rapidly but the eval results decrease significantly. Could you inform me what goes wrong here? I also experimented with Qwen models as well as SFT models, but the results are similar to the ones I observed on Llama-3.2-3B:
Originally, I suspected that there is a training-eval gap, but I don't think it is what is truly happening here, since the training and eval data are roughly from the same distribution.
Metadata
Metadata
Assignees
Labels
No labels