loss calculation in MultiGPULossCompute #23

mehdimashayekhi · 2019-02-14T23:27:09Z

Hi, thanks much for sharing. I have a quick question in the loss calculation for mutiplegpu. I am not very familiar with pytorch and having hard time understanding this line : o1.backward(gradient=o2), why backprop is done on out (not loss) and why we need to pass o2?, thanks for your answer.

The text was updated successfully, but these errors were encountered:

ghost · 2019-05-07T01:21:14Z

In MultiGPULoss function, the variable out is cloned as a leaf node of computational graph, which means in this function the computational graph is built from out(o1) and ends at o2. So l.backward() only calculates the gradient of o2 wrt. o1. Due to the chain rule we should multiply do2/do1 and do1/dw and the correct gradient will be stored at w.grad.

srush closed this as completed May 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loss calculation in MultiGPULossCompute #23

loss calculation in MultiGPULossCompute #23

mehdimashayekhi commented Feb 14, 2019 •

edited

ghost commented May 7, 2019

loss calculation in MultiGPULossCompute #23

loss calculation in MultiGPULossCompute #23

Comments

mehdimashayekhi commented Feb 14, 2019 • edited

ghost commented May 7, 2019

mehdimashayekhi commented Feb 14, 2019 •

edited