## **Differentiation in Autograd**

**Backpropagation:** Parameters are adjusted according to the gradient of the loss function.

`torch.autograd`: Differentiation Engine. Gradients are calculated using it.

In [1]:
import torch
import torchvision
from torch import nn, optim

In [2]:
# input tensor

x = torch.ones(5)
print(x)

tensor([1., 1., 1., 1., 1.])


In [3]:
# output tensor

y = torch.zeros(3)
print(y)

tensor([0., 0., 0.])


In the bellow variables `w` and `b` are parameters which we need to optimize. Therefore, setting `requires_grad` attribute as True

In [4]:
# randomly specifying weight value

w = torch.randn(5, 3, requires_grad=True)
print(w)

tensor([[-0.5493,  0.0203,  0.1157],
        [-0.7170,  0.2039,  1.6857],
        [ 0.8402,  0.7463,  1.8053],
        [-1.3074, -0.0734,  0.7490],
        [-0.2875, -0.8547,  1.3091]], requires_grad=True)


In [5]:
# randomly speifying bias value

b = torch.randn(3, requires_grad=True)
print(b)

tensor([-0.6305,  0.2254, -0.5351], requires_grad=True)


In [6]:
z = torch.matmul(x, w)+b
print(z)

tensor([-2.6515,  0.2677,  5.1298], grad_fn=<AddBackward0>)


In [7]:
# calculating loass

loss = nn.functional.binary_cross_entropy_with_logits(z, y)
print(loss)

tensor(2.0133, grad_fn=<BinaryCrossEntropyWithLogitsBackward>)


`grad_fn`: Reference to backward propagation is stored here

In [8]:
print('Gradient function for z =',z.grad_fn)
print('Gradient function for loss =', loss.grad_fn)

Gradient function for z = <AddBackward0 object at 0x000002A20B95E308>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward object at 0x000002A20B95E2C8>


#### **Computing Gradients**

`backward()`: to compute the derivative

`grad`: gradients are retrived using it. Only calculated when requires_grad value is True

In [9]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.0220, 0.1888, 0.3314],
        [0.0220, 0.1888, 0.3314],
        [0.0220, 0.1888, 0.3314],
        [0.0220, 0.1888, 0.3314],
        [0.0220, 0.1888, 0.3314]])
tensor([0.0220, 0.1888, 0.3314])


#### **Disabling Gradient Tracking**

Done using `no_grad()` or `detach()`

In [10]:
with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)

False


In [11]:
z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

False


### **Tensor Gradients and Jacobian Products**

In this demonstration we will find that all the gradients value are stored in the last node of graph. To find again the previous value gradient has to be set zero using-- `grad.zero_()`

In [12]:
inp = torch.eye(5, requires_grad=True)
out = (inp+1).pow(2)

In [13]:
out.backward(torch.ones_like(inp), retain_graph=True)
print("First call\n", inp.grad)

First call
 tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.],
        [2., 2., 2., 2., 4.]])


In [14]:
out.backward(torch.ones_like(inp), retain_graph=True)
print("\nSecond call\n", inp.grad)


Second call
 tensor([[8., 4., 4., 4., 4.],
        [4., 8., 4., 4., 4.],
        [4., 4., 8., 4., 4.],
        [4., 4., 4., 8., 4.],
        [4., 4., 4., 4., 8.]])


In [15]:
inp.grad.zero_()
out.backward(torch.ones_like(inp), retain_graph=True)
print("\nCall after zeroing gradients\n", inp.grad)


Call after zeroing gradients
 tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.],
        [2., 2., 2., 2., 4.]])


### **Another Iteration**

`a` and `b` are two tensors.

In [16]:
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)

Creating tensor `Q` from tensor `a` and `b`

> Q=3a<sup>3</sup> - b<sup>2</sup>

In [17]:
Q = 3*a**3 - b**2

Here, `a` and `b` are parameters and `Q` is error

    * When `Q.backward()` (It is a vector) is called autograd calculates gradiants and stores them.
    * It store value in `gradiants` ------- It is tensor of same shape as `Q`
    

In [18]:
external_grad = torch.tensor([1., 1.])
Q.backward(gradient=external_grad)

Checking gradiants

In [19]:
print(9*a**2 == a.grad)
print(-2*b == b.grad)

tensor([True, True])
tensor([True, True])


#### **Exclusion from the DAG**

In [20]:
x = torch.rand(5, 5)
y = torch.rand(5, 5)
z = torch.rand((5, 5), requires_grad=True)

a = x + y
print(f"Does `a` require gradients? : {a.requires_grad}")
b = x + z
print(f"Does `b` require gradients?: {b.requires_grad}")

Does `a` require gradients? : False
Does `b` require gradients?: True


Gradients of all parameters in model are set as False.

In [21]:
model = torchvision.models.resnet18(pretrained=True)

# Freeze all the parameters in the network
for param in model.parameters():
    param.requires_grad = False

Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to C:\Users\91939/.cache\torch\hub\checkpoints\resnet18-5c106cde.pth


  0%|          | 0.00/44.7M [00:00<?, ?B/s]

All layers are frozen expect parameters of last layer -----> Gradients are computed using weights and bias of last layer- `model.fc`

In [22]:
# Optimize only the classifier
optimizer = optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)