Purpose of Gradient Scaling #100

low5545 · 2022-11-04T07:37:09Z

Gradient scaling is used in train_mlp_nerf.py, train_ngp_nerf.py and train_mlp_dnerf.py without autocasting. Moreover, gradient unscaling is not performed before optimizer.step(). Hence, there isn't any automatic mixed precision training here.

Thus, what's the purpose of the gradient scaling?

The text was updated successfully, but these errors were encountered:

liruilong940607 · 2022-11-04T17:00:42Z

Hi it is just for scale up the loss, not for mixed precision.

The reason we do this is kinda tricky -- we find out the gradient of the network parameters when using tiny-cuda-nn can sometimes be super small -- like magnitude of 1e-17. And when using Adam as optimizer, it will compute grad ** 2 which will be flushed out due to the float precision limit.

Simply scale up the loss will scale up the gradient and avoid this issue. Also since the Adam is not sensitive to the scale of the gradient, the scaling up won't affect the optimization at all so don't need to scale it back.

low5545 · 2022-11-04T17:51:32Z

Thanks for the explanation!

low5545 closed this as completed Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Purpose of Gradient Scaling #100

Purpose of Gradient Scaling #100

low5545 commented Nov 4, 2022

liruilong940607 commented Nov 4, 2022 •

edited

low5545 commented Nov 4, 2022

Purpose of Gradient Scaling #100

Purpose of Gradient Scaling #100

Comments

low5545 commented Nov 4, 2022

liruilong940607 commented Nov 4, 2022 • edited

low5545 commented Nov 4, 2022

liruilong940607 commented Nov 4, 2022 •

edited