You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Gradient scaling is used in train_mlp_nerf.py, train_ngp_nerf.py and train_mlp_dnerf.py without autocasting. Moreover, gradient unscaling is not performed before optimizer.step(). Hence, there isn't any automatic mixed precision training here.
Thus, what's the purpose of the gradient scaling?
The text was updated successfully, but these errors were encountered:
Hi it is just for scale up the loss, not for mixed precision.
The reason we do this is kinda tricky -- we find out the gradient of the network parameters when using tiny-cuda-nn can sometimes be super small -- like magnitude of 1e-17. And when using Adam as optimizer, it will compute grad ** 2 which will be flushed out due to the float precision limit.
Simply scale up the loss will scale up the gradient and avoid this issue. Also since the Adam is not sensitive to the scale of the gradient, the scaling up won't affect the optimization at all so don't need to scale it back.
Gradient scaling is used in
train_mlp_nerf.py
,train_ngp_nerf.py
andtrain_mlp_dnerf.py
without autocasting. Moreover, gradient unscaling is not performed beforeoptimizer.step()
. Hence, there isn't any automatic mixed precision training here.Thus, what's the purpose of the gradient scaling?
The text was updated successfully, but these errors were encountered: