gradient scaling in fp16 training #44

luguansong · 2022-06-28T12:48:43Z

Hi,

I would like to first thank you for open-sourcing your code for the community.

During using the code for fp16 (or amp) training, I found something confusing at https://github.com/openai/guided-diffusion/blob/main/guided_diffusion/fp16_util.py#L202 . Why do you only scale the gradient for scalar parameters? what about the gradient of matrix parameters?

Sincerely looking forward to your reply.

unixpickle · 2022-07-13T23:16:35Z

This actually does look like a bug. I don't know if this was a bug introduced when porting our code over to this public repo, or if it was a bug that impacted our own experiments. Will have to look into it.

fixes #44

unixpickle mentioned this issue Jul 13, 2022

nan occurs when training ImageNet 128x128 #50

Open

unixpickle added a commit that referenced this issue Jul 13, 2022

attempt to fix optimizer state

6911960

fixes #44

unixpickle mentioned this issue Jul 13, 2022

attempt to fix fp16 #52

Merged

unixpickle closed this as completed in #52 Jul 15, 2022

unixpickle added a commit that referenced this issue Jul 15, 2022

fix for fp16 loss scaling (#52)

22e0df8

fixes #44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gradient scaling in fp16 training #44

gradient scaling in fp16 training #44

luguansong commented Jun 28, 2022

unixpickle commented Jul 13, 2022

gradient scaling in fp16 training #44

gradient scaling in fp16 training #44

Comments

luguansong commented Jun 28, 2022

unixpickle commented Jul 13, 2022