About step_loss of version2 #46

suzhenghang · 2023-04-10T05:10:08Z

During the training of version2, the step loss easily becomes NaN, even if the learning rate is lowered. Have you encountered this issue before?

ExponentialML · 2023-04-10T05:37:51Z

@suzhenghang No I haven't. What type of setup are you running on? (GPU, CPU, Python version, etc.)

suzhenghang · 2023-04-10T06:17:45Z

Thanks，I solved this issue by disabling xformers during training. In the previous v1 version, I had it enabled. Reference link

ExponentialML · 2023-04-10T07:00:23Z

Thanks，I solved this issue by disabling xformers during training. In the previous v1 version, I had it enabled. Reference link

Glad you solved it! I have tried Xformers with Torch 2.0 and while it does work without NaN loss, I don't see any initial improvement. If you ever think about trying it, it should work.

Rbrq03 · 2023-10-31T03:46:28Z

I encountered this issue as well. It appears to be a problem with version2 when using fp16. Disabling mixed precision can resolve the issue. However, I'm not certain about the exact cause of the problem.

@ExponentialML If you're looking to address this issue, I'd be happy to provide more information to help you reproduce the bug. For anyone else facing this problem, disabling mixed precision might be the best solution.

ExponentialML closed this as completed Apr 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About step_loss of version2 #46

About step_loss of version2 #46

suzhenghang commented Apr 10, 2023

ExponentialML commented Apr 10, 2023

suzhenghang commented Apr 10, 2023 •

edited

ExponentialML commented Apr 10, 2023

Rbrq03 commented Oct 31, 2023 •

edited

About step_loss of version2 #46

About step_loss of version2 #46

Comments

suzhenghang commented Apr 10, 2023

ExponentialML commented Apr 10, 2023

suzhenghang commented Apr 10, 2023 • edited

ExponentialML commented Apr 10, 2023

Rbrq03 commented Oct 31, 2023 • edited

suzhenghang commented Apr 10, 2023 •

edited

Rbrq03 commented Oct 31, 2023 •

edited