Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loss is nan, stopping training! #6

Closed
Haoqing-Wang opened this issue Sep 25, 2022 · 2 comments
Closed

loss is nan, stopping training! #6

Haoqing-Wang opened this issue Sep 25, 2022 · 2 comments

Comments

@Haoqing-Wang
Copy link

Wonderful job! When I try to train the Swin-B for 800 epochs, I meet this problem, 'loss is nan, stopping training'. But I find the loss value has no question. If I skip this error, the loss will keep as nan forever. Would you have some suggestions for this problem? Thanks very much!

@LayneH
Copy link
Owner

LayneH commented Sep 26, 2022

Hi, there might be multiple reasons for loss becoming NaN, e.g., AMP.
I did not encounter this problem in my pre-training experiments, so I am not sure what the problem is.
There are similar issues raised in the repo of MAE, you might want to check if they help (facebookresearch/mae#42).

@Haoqing-Wang
Copy link
Author

Thanks a lot!

@LayneH LayneH closed this as completed Nov 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants