Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAE finetune train loss nan #42

Open
CodingMice opened this issue Feb 16, 2022 · 5 comments
Open

MAE finetune train loss nan #42

CodingMice opened this issue Feb 16, 2022 · 5 comments

Comments

@CodingMice
Copy link

info : Loss is nan, stopping training.

@Jeff-LiangF
Copy link

Hey @CodingMice ,

It might be due to the amp.autocast() . Disable it via amp.autocast(enabled=False) solves my problem.

@endernewton
Copy link
Contributor

It would be great if more context is provided here. There could be multiple ways the Loss goes to NaN, and amp can indeed be one of them.

@daisukelab
Copy link

daisukelab commented Jul 28, 2022

FYI - This PyTorch issue thread with a long history could be a hint...
pytorch/pytorch#40497

And here's troubleshooting for the issue (also suggested in the thread):
https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html#loss-is-inf-nan

Anyway, a quick fix would be as commented by @Jeff-LiangF.

@exx8
Copy link

exx8 commented Nov 25, 2022

I've faced the same issue.
Remarkably, using gradient clipping has solved the issue + improved the results.

@CharlesChen24
Copy link

I've faced the same issue. Remarkably, using gradient clipping has solved the issue + improved the results.

how to set the value of gradient clipping? 0.1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants