# Automatic Differentiation with `torch.autograd`

**** Very Important

In [1]:
print("Automatic Differentiation with torch.autograd()")

Automatic Differentiation with torch.autograd()


In [2]:
import torch


In [6]:
x = torch.ones(5)
y = torch.zeros(3)

w = torch.randn(5,3, requires_grad=True)
b = torch.randn(3, requires_grad=True)

z = torch.matmul(x,w) + b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z,y)

print(x, "\n", y, "\n", w, "\n", b)

print(z)
print(loss)

tensor([1., 1., 1., 1., 1.]) 
 tensor([0., 0., 0.]) 
 tensor([[ 0.1166, -0.8524, -0.0466],
        [ 0.3575, -0.0199,  1.2706],
        [ 0.7350,  1.6097,  0.2307],
        [-0.2324, -0.4027, -1.5891],
        [-0.4959, -0.3405,  0.9123]], requires_grad=True) 
 tensor([ 1.2327, -0.8374, -0.6333], requires_grad=True)
tensor([ 1.7134, -0.8432,  0.1445], grad_fn=<AddBackward0>)
tensor(1.0017, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)


#### Tensors, Functions and Computational Graph


In [7]:
print(f"Gradient function for z = {z.grad_fn}")
print(f"Gradient function for loss = {loss.grad_fn}")

Gradient function for z = <AddBackward0 object at 0x11af53c70>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x11af534c0>


#### Computing Gradients

In [8]:
loss.backward()
print(w.grad)
print(b.grad)
print(loss)

tensor([[0.2824, 0.1003, 0.1787],
        [0.2824, 0.1003, 0.1787],
        [0.2824, 0.1003, 0.1787],
        [0.2824, 0.1003, 0.1787],
        [0.2824, 0.1003, 0.1787]])
tensor([0.2824, 0.1003, 0.1787])
tensor(1.0017, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)


#### Disabling Gradient Tracking

In [9]:
z = torch.matmul(x,w) + b
print(z)

with torch.no_grad():
    z = torch.matmul(x,w) + b
print(z)

tensor([ 1.7134, -0.8432,  0.1445], grad_fn=<AddBackward0>)
tensor([ 1.7134, -0.8432,  0.1445])


Another way to achieve this is using `detach()` method on the tensor. 

In [10]:
z = torch.matmul(x,w) + b
z_det = z.detach()
print(z_det)
print(z_det.requires_grad)

tensor([ 1.7134, -0.8432,  0.1445])
False


Reasons you might want to diable gradient tracking :
- To mark some parameters in your neural network as frozen parameters
- To speed up computations when you are only doing forward pass, because computations on tensors that do not track gradients would be more efficient.

#### More on Computational Graphs

autograd keeps a record of data(tensors) and all executed operations(along with the result new tensors) in a Directed Acyclic Graph (DAG) consisting of `Functions` objects.

In the forward pass, autograd does two things simultaneously:
- run the requested operation to compute a resulting tensor
- maintain the operation's gradient function in the DAG

The backward pass kicks off when `.backward()` is called on the DAG root. autograd then:
- computes the gradients from each `.grad_fn`
- accumulates them in the respective tensor's `.grad` attribute
- using the chain rule, propagrates all the way to the lead tensors

NOTE: DAGs are dynamic in PyTorch An important thing to note is that the graph is recreated from scratch; after each .backward() call, autograd starts populating a new graph. This is exactly what allows you to use control flow statements in your model; you can change the shape, size and operations at every iteration if needed.



#### Tensor Gradients and Jacobian Products

In [12]:
inp = torch.eye(4,5, requires_grad=True)
out = (inp+1).pow(2).t()

print(inp, "\n", out)

out.backward(torch.ones_like(out), retain_graph=True)
print(f"First call\n{inp.grad}")

out.backward(torch.ones_like(out), retain_graph=True)
print(f"Second call\n{inp.grad}")

inp.grad.zero_()
out.backward(torch.ones_like(out), retain_graph=True)
print(f"\nCall after zeroing gradients\n{inp.grad}")

tensor([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.]], requires_grad=True) 
 tensor([[4., 1., 1., 1.],
        [1., 4., 1., 1.],
        [1., 1., 4., 1.],
        [1., 1., 1., 4.],
        [1., 1., 1., 1.]], grad_fn=<TBackward0>)
First call
tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.]])
Second call
tensor([[8., 4., 4., 4., 4.],
        [4., 8., 4., 4., 4.],
        [4., 4., 8., 4., 4.],
        [4., 4., 4., 8., 4.]])

Call after zeroing gradients
tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.]])
