Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Purpose of Gradient Scaling #100

Closed
low5545 opened this issue Nov 4, 2022 · 2 comments
Closed

Purpose of Gradient Scaling #100

low5545 opened this issue Nov 4, 2022 · 2 comments

Comments

@low5545
Copy link

low5545 commented Nov 4, 2022

Gradient scaling is used in train_mlp_nerf.py, train_ngp_nerf.py and train_mlp_dnerf.py without autocasting. Moreover, gradient unscaling is not performed before optimizer.step(). Hence, there isn't any automatic mixed precision training here.

Thus, what's the purpose of the gradient scaling?

@liruilong940607
Copy link
Collaborator

liruilong940607 commented Nov 4, 2022

Hi it is just for scale up the loss, not for mixed precision.

The reason we do this is kinda tricky -- we find out the gradient of the network parameters when using tiny-cuda-nn can sometimes be super small -- like magnitude of 1e-17. And when using Adam as optimizer, it will compute grad ** 2 which will be flushed out due to the float precision limit.

Simply scale up the loss will scale up the gradient and avoid this issue. Also since the Adam is not sensitive to the scale of the gradient, the scaling up won't affect the optimization at all so don't need to scale it back.

@low5545
Copy link
Author

low5545 commented Nov 4, 2022

Thanks for the explanation!

@low5545 low5545 closed this as completed Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants