the gradient of all parameters is None #283

nankepan · 2024-04-16T14:32:41Z

Hi,
I print param.grad here and find that the gradient of all parameters is None. Is this caused by using colorsalai? How can I obtain the gradient of parameters? Thank you.

JThh · 2024-04-16T16:15:58Z

It should only be None after optimizer.zero_grad(); booster.backward was doing torch.optim.Optimizer.backward(loss). Would you mind printing the contents of loss to see if it is NaN?

nankepan · 2024-04-17T02:04:11Z

It should only be None after optimizer.zero_grad(); booster.backward was doing torch.optim.Optimizer.backward(loss). Would you mind printing the contents of loss to see if it is NaN?

Thanks for reply. loss is normal, but gradient is None before optimizer.zero_grad(), which is strange.
I trained the model, loss was steadily decreasing and the model performance was also improving. But the gradient None makes me confused.

zhengzangw · 2024-05-10T06:19:51Z

This is because Colossalai manages the gradients, so you cannot directly access them by param.grad. @ver217 Could you please help with this?

ver217 · 2024-06-24T07:27:53Z

Hi, gradients is managed in zero optimizer and p.grad is None. This is expected behavior. If you want to check grads manually, you can refer to https://github.com/hpcaitech/ColossalAI/blob/7f8b16635b42013b73e1cb1ffdebc07b4d71ac93/tests/test_zero/test_low_level/test_zero1_2.py#L164
Note that the grad is sharded and flat.

281LinChenjian · 2024-07-22T07:42:05Z

I also need to extract p.grad for subsequent calculations. Is there any way to get p.grad correctly? I have read the above code but still don't know how to do it.

Hi, gradients is managed in zero optimizer and p.grad is None. This is expected behavior. If you want to check grads manually, you can refer to https://github.com/hpcaitech/ColossalAI/blob/7f8b16635b42013b73e1cb1ffdebc07b4d71ac93/tests/test_zero/test_low_level/test_zero1_2.py#L164 Note that the grad is sharded and flat.

zhengzangw added the help wanted Extra attention is needed label May 10, 2024

zhengzangw assigned ver217 May 10, 2024

zhengzangw closed this as completed Jun 26, 2024

281LinChenjian mentioned this issue Jul 22, 2024

[BUG]: Low_Level_Zero plugin crashes with LoRA hpcaitech/ColossalAI#5909

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the gradient of all parameters is None #283

the gradient of all parameters is None #283

nankepan commented Apr 16, 2024

JThh commented Apr 16, 2024

nankepan commented Apr 17, 2024 •

edited

Loading

zhengzangw commented May 10, 2024

ver217 commented Jun 24, 2024

281LinChenjian commented Jul 22, 2024

the gradient of all parameters is None #283

the gradient of all parameters is None #283

Comments

nankepan commented Apr 16, 2024

JThh commented Apr 16, 2024

nankepan commented Apr 17, 2024 • edited Loading

zhengzangw commented May 10, 2024

ver217 commented Jun 24, 2024

281LinChenjian commented Jul 22, 2024

nankepan commented Apr 17, 2024 •

edited

Loading