-
Notifications
You must be signed in to change notification settings - Fork 103
About step_loss of version2 #46
Comments
@suzhenghang No I haven't. What type of setup are you running on? (GPU, CPU, Python version, etc.) |
Thanks,I solved this issue by disabling xformers during training. In the previous v1 version, I had it enabled. Reference link |
Glad you solved it! I have tried Xformers with Torch 2.0 and while it does work without NaN loss, I don't see any initial improvement. If you ever think about trying it, it should work. |
I encountered this issue as well. It appears to be a problem with version2 when using fp16. Disabling mixed precision can resolve the issue. However, I'm not certain about the exact cause of the problem. @ExponentialML If you're looking to address this issue, I'd be happy to provide more information to help you reproduce the bug. For anyone else facing this problem, disabling mixed precision might be the best solution. |
During the training of version2, the step loss easily becomes NaN, even if the learning rate is lowered. Have you encountered this issue before?
The text was updated successfully, but these errors were encountered: