You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks much for sharing. I have a quick question in the loss calculation for mutiplegpu. I am not very familiar with pytorch and having hard time understanding this line : o1.backward(gradient=o2), why backprop is done on out (not loss) and why we need to pass o2?, thanks for your answer.
The text was updated successfully, but these errors were encountered:
In MultiGPULoss function, the variable out is cloned as a leaf node of computational graph, which means in this function the computational graph is built from out(o1) and ends at o2. So l.backward() only calculates the gradient of o2 wrt. o1. Due to the chain rule we should multiply do2/do1 and do1/dw and the correct gradient will be stored at w.grad.
Hi, thanks much for sharing. I have a quick question in the loss calculation for mutiplegpu. I am not very familiar with pytorch and having hard time understanding this line :
o1.backward(gradient=o2)
, why backprop is done onout
(not loss) and why we need to passo2
?, thanks for your answer.The text was updated successfully, but these errors were encountered: