About loss being NaN, lr_scheduler.step(), optimizer.step() #116

FlyDre · 2022-03-23T08:33:20Z

I've read issue #44. Like that case, I change the ResNet50 to another backbone.
So I check the link you mentioned:
https://discuss.pytorch.org/t/optimizer-step-before-lr-scheduler-step-error-using-gradscaler/92930/7

And therefore i change codes as below:

But seems losses( all 3 losses) still being NaN and the warning of "UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate" still exsists.

Is this normal or do I need to modify losses.py?

Thank you.

The text was updated successfully, but these errors were encountered:

hkchengrex · 2022-03-23T18:45:12Z

The warning can be ignored. It doesn't matter.
I think the problem is in the backbone (I see that you are using ConvNext). I've also tried ConvNext, but

It gives NaN unless I turn off amp (--no_amp)
Even when it is working it is converging much slower than ResNet50
I would love to learn why/if you have a solution.

FlyDre · 2022-03-24T07:45:29Z

Thanks for replying. I'd update this issue if I figure it out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About loss being NaN, lr_scheduler.step(), optimizer.step() #116

About loss being NaN, lr_scheduler.step(), optimizer.step() #116

FlyDre commented Mar 23, 2022 •

edited

hkchengrex commented Mar 23, 2022

FlyDre commented Mar 24, 2022

About loss being NaN, lr_scheduler.step(), optimizer.step() #116

About loss being NaN, lr_scheduler.step(), optimizer.step() #116

Comments

FlyDre commented Mar 23, 2022 • edited

hkchengrex commented Mar 23, 2022

FlyDre commented Mar 24, 2022

FlyDre commented Mar 23, 2022 •

edited