Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always getting NaNs in long training #33

Open
danbochman opened this issue Dec 5, 2023 · 5 comments
Open

Always getting NaNs in long training #33

danbochman opened this issue Dec 5, 2023 · 5 comments

Comments

@danbochman
Copy link

danbochman commented Dec 5, 2023

I've been experimenting with the LION optimizer in your other (great) Imagen repository. I can share my anecdotal experience and combinations:

  • Models of different sizes 0.2B, 0.7B and 1B params.
  • Betas such as beta1 0.95 and beta2 0.98
  • Learning rates 1e-4, 3e-5 and 1e-5.
  • Triton kernel turned both True and False.

Training was indeed fast but unfortunately in the end always ended up yielding NaNs.

I think a potential issue could be how LION interacts with a warmup schedule; I am not sure if you're supposed to do warmup with this optimizer or not (which I always did).

image

@ysesst93013
Copy link

I have same problem :(

@SergeySakharovskiy
Copy link

same NaN issue with CosineAnnealing scheduler after the first epoch.

@xiangning-chen
Copy link
Contributor

I've been experimenting with the LION optimizer in your other (great) Imagen repository. I can share my anecdotal experience and combinations:

  • Models of different sizes 0.2B, 0.7B and 1B params.
  • Betas such as beta1 0.95 and beta2 0.98
  • Learning rates 1e-4, 3e-5 and 1e-5.
  • Triton kernel turned both True and False.

Training was indeed fast but unfortunately in the end always ended up yielding NaNs.

I think a potential issue could be how LION interacts with a warmup schedule; I am not sure if you're supposed to do warmup with this optimizer or not (which I always did).

image

May I know the learning rate schedule you are using?

@zjutzyl
Copy link

zjutzyl commented Mar 12, 2024

same issue, i set a big weight decay to avoid it. i suppose that 'update=symbol * lr' enlarging abs(parameter) while symbol not changing.

@lindakasabian
Copy link

lindakasabian commented Apr 16, 2024

same here. sudden nan losses during 100 e training with onecyclelr and clipping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants