Always getting NaNs in long training #33

danbochman · 2023-12-05T11:19:29Z

I've been experimenting with the LION optimizer in your other (great) Imagen repository. I can share my anecdotal experience and combinations:

Models of different sizes 0.2B, 0.7B and 1B params.
Betas such as beta1 0.95 and beta2 0.98
Learning rates 1e-4, 3e-5 and 1e-5.
Triton kernel turned both True and False.

Training was indeed fast but unfortunately in the end always ended up yielding NaNs.

I think a potential issue could be how LION interacts with a warmup schedule; I am not sure if you're supposed to do warmup with this optimizer or not (which I always did).

The text was updated successfully, but these errors were encountered:

ysesst93013 · 2024-01-03T06:14:54Z

I have same problem :(

SergeySakharovskiy · 2024-01-28T00:17:05Z

same NaN issue with CosineAnnealing scheduler after the first epoch.

xiangning-chen · 2024-01-29T22:20:15Z

I've been experimenting with the LION optimizer in your other (great) Imagen repository. I can share my anecdotal experience and combinations:

Models of different sizes 0.2B, 0.7B and 1B params.

Betas such as beta1 0.95 and beta2 0.98

Learning rates 1e-4, 3e-5 and 1e-5.

Triton kernel turned both True and False.

Training was indeed fast but unfortunately in the end always ended up yielding NaNs.

I think a potential issue could be how LION interacts with a warmup schedule; I am not sure if you're supposed to do warmup with this optimizer or not (which I always did).

May I know the learning rate schedule you are using?

zjutzyl · 2024-03-12T04:57:33Z

same issue, i set a big weight decay to avoid it. i suppose that 'update=symbol * lr' enlarging abs(parameter) while symbol not changing.

lindakasabian · 2024-04-16T04:17:46Z

same here. sudden nan losses during 100 e training with onecyclelr and clipping

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always getting NaNs in long training #33

Always getting NaNs in long training #33

danbochman commented Dec 5, 2023 •

edited

ysesst93013 commented Jan 3, 2024

SergeySakharovskiy commented Jan 28, 2024

xiangning-chen commented Jan 29, 2024

zjutzyl commented Mar 12, 2024

lindakasabian commented Apr 16, 2024 •

edited

Always getting NaNs in long training #33

Always getting NaNs in long training #33

Comments

danbochman commented Dec 5, 2023 • edited

ysesst93013 commented Jan 3, 2024

SergeySakharovskiy commented Jan 28, 2024

xiangning-chen commented Jan 29, 2024

zjutzyl commented Mar 12, 2024

lindakasabian commented Apr 16, 2024 • edited

danbochman commented Dec 5, 2023 •

edited

lindakasabian commented Apr 16, 2024 •

edited