Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using Python 3.7.6 and PyTorch 1.4.0 loss is always NaN #2

Closed
skariel opened this issue May 16, 2020 · 9 comments
Closed

using Python 3.7.6 and PyTorch 1.4.0 loss is always NaN #2

skariel opened this issue May 16, 2020 · 9 comments

Comments

@skariel
Copy link

skariel commented May 16, 2020

Haven't tested on Python 2

@HobbitLong
Copy link
Owner

My environment:

Python 3.6.9 (default, Nov  7 2019, 10:44:02)
>>> torch.__version__
'1.2.0'

@HobbitLong
Copy link
Owner

I will close for now, feel free to reopen

@BestJuly
Copy link

BestJuly commented Jun 3, 2020

I also met the same problem in one server (Python 3.7.4, pytorch 1.2.0) and did not have this problem in another server (Python 3.7.4, pytorch 1.3.0). I think this may be caused by different settings of environment (but why?).
A strange thing is that for the first iteration, without calculating loss, some extracted features are nan. I have not figured out why and how to solve. And now I am running on the server which is fine. One solution may be setting the same experimental environment.

@HobbitLong
Copy link
Owner

@BestJuly ,

I added some fix 4 days ago to improve the stability. Can you try again now?

@BestJuly
Copy link

@HobbitLong Thank you for your kindness. The problem vanished several days ago even though I do not know why. When I using the newest version of your code, it also runs well.

By the way, I think for the newest version, .cuda() is missing in main_supcon.py.

@HobbitLong
Copy link
Owner

@BestJuly,

Ah, nice catch! Just pushed a fix. Thanks for spotting this!

@shijianjian
Copy link

To me, the problem lies in:

exp_logits = torch.exp(logits) * logits_mask
log_prob = logits - torch.log(exp_logits.sum(1, keepdim=True))

my exp_logits runs into 0 occasionally which causing undefined log results. Thus, I did a simple adding with 1e-20.

@shaunakjoshi12
Copy link

To me, the problem lies in:

exp_logits = torch.exp(logits) * logits_mask
log_prob = logits - torch.log(exp_logits.sum(1, keepdim=True))

my exp_logits runs into 0 occasionally which causing undefined log results. Thus, I did a simple adding with 1e-20.

Thanks man you saved my day!

@HenryPengZou
Copy link

HenryPengZou commented Sep 27, 2021

One method to avoid exp_logits becoming 0 or having loss = Nan is to normalize your feature vectors before passing it the loss function. Hope it helps you guys!

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants