using Python 3.7.6 and PyTorch 1.4.0 loss is always NaN #2

skariel · 2020-05-16T22:15:20Z

Haven't tested on Python 2

HobbitLong · 2020-05-17T08:21:00Z

My environment:

Python 3.6.9 (default, Nov  7 2019, 10:44:02)
>>> torch.__version__
'1.2.0'

HobbitLong · 2020-05-28T04:37:16Z

I will close for now, feel free to reopen

BestJuly · 2020-06-03T12:48:18Z

I also met the same problem in one server (Python 3.7.4, pytorch 1.2.0) and did not have this problem in another server (Python 3.7.4, pytorch 1.3.0). I think this may be caused by different settings of environment (but why?).
A strange thing is that for the first iteration, without calculating loss, some extracted features are nan. I have not figured out why and how to solve. And now I am running on the server which is fine. One solution may be setting the same experimental environment.

HobbitLong · 2020-06-03T18:07:03Z

@BestJuly ,

I added some fix 4 days ago to improve the stability. Can you try again now?

BestJuly · 2020-06-11T02:54:07Z

@HobbitLong Thank you for your kindness. The problem vanished several days ago even though I do not know why. When I using the newest version of your code, it also runs well.

By the way, I think for the newest version, .cuda() is missing in main_supcon.py.

HobbitLong · 2020-06-11T03:02:01Z

@BestJuly,

Ah, nice catch! Just pushed a fix. Thanks for spotting this!

shijianjian · 2020-07-17T04:30:18Z

To me, the problem lies in:

exp_logits = torch.exp(logits) * logits_mask
log_prob = logits - torch.log(exp_logits.sum(1, keepdim=True))

my exp_logits runs into 0 occasionally which causing undefined log results. Thus, I did a simple adding with 1e-20.

shaunakjoshi12 · 2021-07-02T07:28:16Z

To me, the problem lies in:
exp_logits = torch.exp(logits) * logits_mask
log_prob = logits - torch.log(exp_logits.sum(1, keepdim=True))
my exp_logits runs into 0 occasionally which causing undefined log results. Thus, I did a simple adding with 1e-20.

Thanks man you saved my day!

HenryPengZou · 2021-09-27T07:17:49Z

One method to avoid exp_logits becoming 0 or having loss = Nan is to normalize your feature vectors before passing it the loss function. Hope it helps you guys!

HobbitLong closed this as completed May 28, 2020

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using Python 3.7.6 and PyTorch 1.4.0 loss is always NaN #2

using Python 3.7.6 and PyTorch 1.4.0 loss is always NaN #2

skariel commented May 16, 2020

HobbitLong commented May 17, 2020

HobbitLong commented May 28, 2020

BestJuly commented Jun 3, 2020 •

edited

Loading

HobbitLong commented Jun 3, 2020

BestJuly commented Jun 11, 2020

HobbitLong commented Jun 11, 2020

shijianjian commented Jul 17, 2020

shaunakjoshi12 commented Jul 2, 2021

HenryPengZou commented Sep 27, 2021 •

edited

Loading

using Python 3.7.6 and PyTorch 1.4.0 loss is always NaN #2

using Python 3.7.6 and PyTorch 1.4.0 loss is always NaN #2

Comments

skariel commented May 16, 2020

HobbitLong commented May 17, 2020

HobbitLong commented May 28, 2020

BestJuly commented Jun 3, 2020 • edited Loading

HobbitLong commented Jun 3, 2020

BestJuly commented Jun 11, 2020

HobbitLong commented Jun 11, 2020

shijianjian commented Jul 17, 2020

shaunakjoshi12 commented Jul 2, 2021

HenryPengZou commented Sep 27, 2021 • edited Loading

BestJuly commented Jun 3, 2020 •

edited

Loading

HenryPengZou commented Sep 27, 2021 •

edited

Loading