-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always getting NaNs in long training #33
Comments
I have same problem :( |
same NaN issue with CosineAnnealing scheduler after the first epoch. |
May I know the learning rate schedule you are using? |
same issue, i set a big weight decay to avoid it. i suppose that 'update=symbol * lr' enlarging abs(parameter) while symbol not changing. |
same here. sudden nan losses during 100 e training with onecyclelr and clipping |
I've been experimenting with the LION optimizer in your other (great) Imagen repository. I can share my anecdotal experience and combinations:
beta1 0.95
andbeta2 0.98
1e-4
,3e-5
and1e-5
.True
andFalse
.Training was indeed fast but unfortunately in the end always ended up yielding NaNs.
I think a potential issue could be how LION interacts with a warmup schedule; I am not sure if you're supposed to do warmup with this optimizer or not (which I always did).
The text was updated successfully, but these errors were encountered: