# AUTOMATIC DIFFERENTIATION WITH ```TORCH.AUTOGRAD```

In [1]:
import torch

x = torch.ones(5)
y = torch.zeros(3)
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w) + b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

## Tensors, Functions and Computational graph

In [2]:
print(f'Gradient function for z = {z.grad_fn}')
print(f'Gradient function for loss = {loss.grad_fn}')

Gradient function for z = <AddBackward0 object at 0x0000026403FAEB80>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x0000026403784FA0>


## Computing Gradients

In [3]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.2268, 0.0426, 0.0868],
        [0.2268, 0.0426, 0.0868],
        [0.2268, 0.0426, 0.0868],
        [0.2268, 0.0426, 0.0868],
        [0.2268, 0.0426, 0.0868]])
tensor([0.2268, 0.0426, 0.0868])


## Disabling Gradient Tracking

In [4]:
z = torch.matmul(x, w) + b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w) + b
print(z.requires_grad)

True
False


In [5]:
z = torch.matmul(x, w) + b
z_det = z.detach()
print(z_det.requires_grad)

False


## More on Computational Graphs
### In a forward pass, autograd does two things sumultaneously:
* run the requested operation to compute a resulting tensor
* maintain the operation's gradient function in the DAG.
### The backward pass kicks off when ```.backward()``` is called on the DAG root. ```autograd``` then:
* computes the gradients from each ```.grad_fn```,
* accumulates them in the respective tensor's ```.grad``` attribute
* using the chain rule, propagates all the way to the leaf tensors.

## Optional Reading: Tensor Gradients and Jacobian Products

In [6]:
inp = torch.eye(4, 5, requires_grad=True)
out = (inp + 1).pow(2).t()
out.backward(torch.ones_like(out), retain_graph=True)
print(f'First call\n{inp.grad}')
out.backward(torch.ones_like(out), retain_graph=True)
print(f'\nSecond call\n{inp.grad}')
inp.grad.zero_()
out.backward(torch.ones_like(out), retain_graph=True)
print(f'\nCall after zeroing gradients\n{inp.grad}')

First call
tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.]])

Second call
tensor([[8., 4., 4., 4., 4.],
        [4., 8., 4., 4., 4.],
        [4., 4., 8., 4., 4.],
        [4., 4., 4., 8., 4.]])

Call after zeroing gradients
tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.]])
