In [1]:
import torch
x = torch.randn(3 , requires_grad = True)
print(x)

tensor([ 0.4444, -1.0966, -0.2144], requires_grad=True)


The autograd package provides automatic differentiation 
 for all operations on Tensors

requires_grad = True -> tracks all operations on the tensor. 
y was created as a result of an operation, so it has a grad_fn attribute.
grad_fn: references a Function that has created the Tensor

In [15]:
y = x+2
print(y)
z = y*y*2
print(z)

tensor([2.4444, 0.9034, 1.7856], grad_fn=<AddBackward0>)
tensor([11.9498,  1.6324,  6.3770], grad_fn=<MulBackward0>)


In [16]:
z = z.mean()
print(z)

tensor(6.6531, grad_fn=<MeanBackward0>)


 Let's compute the gradients with backpropagation
 When we finish our computation we can call .backward() and have all the gradients computed automatically.
 The gradient for this tensor will be accumulated into .grad attribute.
 It is the partial derivate of the function w.r.t. the tensor

In [17]:
z.backward()
print(x.grad)

tensor([8.4738, 9.6366, 4.7760])


 Generally speaking, torch.autograd is an engine for computing vector-Jacobian product
 It computes partial derivates while applying the chain rule

 -------------
 Model with non-scalar output:
 If a Tensor is non-scalar (more than 1 elements), we need to specify arguments for backward() 
 specify a gradient argument that is a tensor of matching shape.
 needed for vector-Jacobian product

In [18]:
x = torch.randn(4, requires_grad=True)
print(x)

tensor([ 0.3599, -0.0089, -0.2701, -1.9900], requires_grad=True)


In [20]:
y = x*2
for _ in range(10):
  y = y*2
print(y)
print(y.shape)

tensor([  737.1655,   -18.1494,  -553.1948, -4075.4771],
       grad_fn=<MulBackward0>)
torch.Size([4])


In [22]:
v = torch.tensor([0.1,0.1,1.0,0.001],dtype = torch.float32)
y.backward(v)
print(x.grad)

tensor([ 204.8000,  204.8000, 2048.0000,    2.0480])


 -------------
Stop a tensor from tracking history:
For example during our training loop when we want to update our weights
then this update operation should not be part of the gradient computation
- x.requires_grad_(False)
- x.detach()
- wrap in 'with torch.no_grad():'

In [23]:
# .requires_grad_(...) changes an existing flag in-place.

a = torch.randn(2,2)
print(a.requires_grad)

False


In [24]:
b = ((a*3)/(a-1))
print(b.grad_fn) # created by the user -> grad_fn is None

None


In [25]:
a.requires_grad_(True)
print(a.requires_grad)

True


In [26]:
b = (a*a).sum()
print(b.grad_fn)

<SumBackward0 object at 0x0000023C1CFF5090>


In [27]:
# .detach(): get a new Tensor with the same content but no gradient computation:

a = torch.randn(2,2, requires_grad = True)
print ( a. requires_grad)
b = a.detach()
print(b.requires_grad)

True
False


In [29]:
# wrap in 'with torch.no_grad():'

a = torch.randn(2,2,requires_grad=True)
print(a.requires_grad)
with torch.no_grad():
    print((x**2).requires_grad)

True
False


-------------
backward() accumulates the gradient for this tensor into .grad attribute.
!!! We need to be careful during optimization !!!
Use .zero_() to empty the gradients before a new optimization step!

In [37]:
weights = torch.ones(4, requires_grad = True)

for epoch in range(3):
    #just a dummy example
    model_output = (weights*3).sum()
    model_output.backward()

    print(weights.grad)
    # optimize model, i.e. adjust weights...
    with torch.no_grad():
        weights -= 0.1 * weights.grad

    # this is important! It affects the final weights & output
    weights.grad.zero_()

print(weights)
print(model_output)


tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])
tensor([0.1000, 0.1000, 0.1000, 0.1000], requires_grad=True)
tensor(4.8000, grad_fn=<SumBackward0>)


Optimizer has zero_grad() method
optimizer = torch.optim.SGD([weights], lr=0.1)
During training:
optimizer.step()
optimizer.zero_grad()