Central to all neural networks in PyTorch is the __autograd__ package.

__autograd.Variable__ is something can be updated and auto grad, like W and b in neural network

In [6]:
import torch

require_grad = True means this variable require backward, so b.grad_fn is an object, otherwise,
b.grad_fn is None

In [11]:
a = torch.ones(2,2,requires_grad=True)
print(a)
b = a + 2
print(b, b.grad_fn)

a1 = torch.ones(2,2,requires_grad=True)
b1 = a1 + 2
print(b, "grad of b1 is {}".format(b1.grad_fn))

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward>) <AddBackward object at 0x7f903f16e978>
tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward>) grad of b1 is None


### Use Variable.backward() to compute the derivation of the Variable

Here __b, c, out__ are all variable

.backward() can be used because out is scala.

运行这个 使得神经网络 从此处开始回传， 将此参数一直往前求导

例如下例：

If Variable is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to backward(), however if it has more elements, you need to specify a gradient argument that is a tensor of matching shape.

In [8]:
c = b*b*3
out = c.mean()
print(c,out)

tensor([[27., 27.],
        [27., 27.]]) tensor(27.)


### run __out.backward()__ will get each deraviation of out

$\frac{\partial out}{\partial c}$,
$\frac{\partial out}{\partial b}$,
$\frac{\partial out}{\partial a}$,

### when run a.grad or b.grad or c.grad, it should gives the result of 
$\frac{\partial out}{\partial a}$  or  $\frac{\partial out}{\partial b}$ or $\frac{\partial out}{\partial c}$,

but as link below says:

### hook media https://discuss.pytorch.org/t/why-cant-i-see-grad-of-an-intermediate-variable/94
gradients are only retained for leaf variables. non-leaf variables’ gradients are not retained to be inspected later. This was done by design, to save memory.

So you can not get b.grad and c.grad by default.

But we can use hook to get this:

You should have got a matrix of ``4.5``. Let’s call the ``out``
*Variable* “$o$”.
We have that $o = \frac{1}{4}\sum_i c_i$,
$c_i = 3(a_i+2)^2$ and $c_i\bigr\rvert_{a_i=1} = 27$.
Therefore,
$\frac{\partial o}{\partial a_i} = \frac{3}{2}(a_i+2)$, hence

here, a1, a2, a3, a4 =1
$\frac{\partial o}{\partial a_i}\bigr\rvert_{a_i=1} = \frac{9}{2} = 4.5$.


In [9]:
out.backward()
print(a.grad)

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

## a.grad still be Variable!!!!!

### But when run b.grad, c.grad then them is None.
### Because pytorch only retain gradients of leaf variable to save memory, say 
b = a + 1

c = $b * b * 3$

out = c.mean()

### only a is leaf variable. If we need inter-variable gradient, we need to use hook as below:

In [10]:
print(b.grad, c.grad)

None None


In [61]:
from __future__ import print_function
from torch.autograd import Variable
import torch

xx = Variable(torch.ones(1,1), requires_grad = True)
yy = 3*xx
zz = yy**2

yy.register_hook(print)

zz.backward()
print(xx.grad)

Variable containing:
 6
[torch.FloatTensor of size 1x1]

Variable containing:
 18
[torch.FloatTensor of size 1x1]



### Two more things!
#### 1. When run .backward twice, it will give error, since when the graph back ward compute complete, they delete intermediate result, if you want to run it again, need to initial Variable again
#### 2. If you didn't clear gradient of one variable, then run .backward() again, in former example, run out.backward() again, then x.grad will increase to 9!

### Important!!
#### In previous example, only scala can use .backward()
#### actually, every Variable can use that, but need to pass in a tensor to make sure which part and relative weight for this part you selected to compute gradients. 

In [64]:
from torch.autograd import Variable
import torch
x = Variable(torch.FloatTensor([[1, 2, 3, 4]]), requires_grad=True)
z = 2*x
loss = z.sum(dim=1)
print(z, loss)

# do backward for first element of z
z.backward(torch.FloatTensor([[1, 0, 0, 0]]))
print(x.grad.data)
x.grad.data.zero_() #remove gradient in x.grad, or it will be accumulated

# do backward for second element of z
z.backward(torch.FloatTensor([[0, 1, 0, 0]]))
print(x.grad.data)
x.grad.data.zero_()

# do backward for all elements of z, with weight equal to the derivative of
# loss w.r.t z_1, z_2, z_3 and z_4
z.backward(torch.FloatTensor([[1, 1, 1, 1]]))
print(x.grad.data)
x.grad.data.zero_()

# or we can directly backprop using loss
loss.backward() # equivalent to loss.backward(torch.FloatTensor([1.0]))
print(x.grad.data)    


Variable containing:
 2  4  6  8
[torch.FloatTensor of size 1x4]
 Variable containing:
 20
[torch.FloatTensor of size 1]


 2  0  0  0
[torch.FloatTensor of size 1x4]


 0  2  0  0
[torch.FloatTensor of size 1x4]


 2  2  2  2
[torch.FloatTensor of size 1x4]


 2  2  2  2
[torch.FloatTensor of size 1x4]



In [49]:
import torch

In [50]:
a = torch.ones(3,5)

In [51]:
a.size()

torch.Size([3, 5])

In [54]:
a.size(1)

5

In [55]:
a.size()[1]

5