Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loss calculation in MultiGPULossCompute #23

Closed
mehdimashayekhi opened this issue Feb 14, 2019 · 1 comment
Closed

loss calculation in MultiGPULossCompute #23

mehdimashayekhi opened this issue Feb 14, 2019 · 1 comment

Comments

@mehdimashayekhi
Copy link

mehdimashayekhi commented Feb 14, 2019

Hi, thanks much for sharing. I have a quick question in the loss calculation for mutiplegpu. I am not very familiar with pytorch and having hard time understanding this line : o1.backward(gradient=o2), why backprop is done on out (not loss) and why we need to pass o2?, thanks for your answer.

@ghost
Copy link

ghost commented May 7, 2019

In MultiGPULoss function, the variable out is cloned as a leaf node of computational graph, which means in this function the computational graph is built from out(o1) and ends at o2. So l.backward() only calculates the gradient of o2 wrt. o1. Due to the chain rule we should multiply do2/do1 and do1/dw and the correct gradient will be stored at w.grad.

@srush srush closed this as completed May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants