MAE finetune train loss nan #42

CodingMice · 2022-02-16T08:20:53Z

info : Loss is nan, stopping training.

Jeff-LiangF · 2022-02-18T01:45:17Z

It might be due to the amp.autocast() . Disable it via amp.autocast(enabled=False) solves my problem.

endernewton · 2022-03-03T19:45:09Z

It would be great if more context is provided here. There could be multiple ways the Loss goes to NaN, and amp can indeed be one of them.

daisukelab · 2022-07-28T23:52:42Z

FYI - This PyTorch issue thread with a long history could be a hint...
pytorch/pytorch#40497

Anyway, a quick fix would be as commented by @Jeff-LiangF.

exx8 · 2022-11-25T12:40:37Z

I've faced the same issue.
Remarkably, using gradient clipping has solved the issue + improved the results.

CharlesChen24 · 2023-08-28T08:47:49Z

I've faced the same issue. Remarkably, using gradient clipping has solved the issue + improved the results.

how to set the value of gradient clipping? 0.1?

linwk20 mentioned this issue Apr 11, 2022

loss is NaN when pretraining on a small patch size #65

Open

mx-mark mentioned this issue May 26, 2022

How to use the maskfeat model to imagenet dataset mx-mark/VideoTransformer-pytorch#17

Closed

LayneH mentioned this issue Sep 26, 2022

loss is nan, stopping training! LayneH/GreenMIM#6

Closed

Provide feedback