使用8张V100显卡微调DeepSeek_R1_distill_Qwen_7B,一直出现Loss为0,grad_norm为NaN #7637
Unanswered
KnightKKOP
asked this question in
Q&A
Replies: 1 comment 1 reply
-
|
是deepspeed z3? 把bucketsize和batchsize设小点试试,开大gradient accumulation. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
使用Fp16和Fp32均出现上述问题,有大佬帮忙看看吗?
Beta Was this translation helpful? Give feedback.
All reactions