```markdown
## Gradients
## Backpropagation
## Update Rule
## Gradient Descent
## Optimizers
## torch.nn
## sequential and functional models
```

In [1]:
import torch

import warnings
warnings.filterwarnings("ignore")

**Gradients**
```markdown
These are partial derivatives of a function with respect to its parameters. 
```

In [2]:
# The autograd package provides automatic differentiation 
# for all operations on Tensors
# requires_grad = True -> tracks all operations on the tensor. 
x = torch.tensor([1], dtype=torch.float32, requires_grad=True)
y = x*2 + 2 # dy/dx = 2

# y was created as a result of an operation, so it has a grad_fn attribute.
# grad_fn: references a Function that has created the Tensor
print(x) # created by the user -> grad_fn is None
print(y)
print(y.grad_fn)

tensor([1.], requires_grad=True)
tensor([4.], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x7f578c729c90>


In [3]:
# Grad_fn will be tracked on all operations with y 
z = y * y * 3 # dz/dy = 6y -> dz/dx = dz/dy * dy/dx
print(z)
z = z.mean()
print(z)


tensor([48.], grad_fn=<MulBackward0>)
tensor(48., grad_fn=<MeanBackward0>)


In [4]:

# Let's compute the gradients with backpropagation
# When we finish our computation we can call .backward() and have all the gradients computed automatically.
# The gradient for this tensor will be accumulated into .grad attribute.
# It is the partial derivate of the function w.r.t. the tensor

z.backward()
print(x.grad) # dz/dx


tensor([48.])


**Snippet of a typical training loop.**

```markdown
# Training loop
- Forward Pass (W^{T}X + b)
- Compute Loss
- Compute Gradients (`loss.backward()`)
- Update Weights (`optimizer.step()`)
- Zero Gradients (`optimizer.zero_grad()`)
```



In [6]:
x = torch.tensor(1.0)
y = torch.tensor(2.0)

# This is the parameter we want to optimize -> requires_grad=True
w = torch.tensor(0.0, requires_grad=True)
# b = torch.tensor(1.0, requires_grad=True)

# forward pass to compute loss
y_predicted = w * x 
loss = (y_predicted - y)**2

# backward pass to compute gradient dLoss/dw
loss.backward()

# update weights, this operation should not be part of the computational graph
with torch.no_grad():
    w -= 0.01 * w.grad
    # b -= 0.01 * b.grad
    ## optimizer.step() -> if optimizer is used
# don't forget to zero the gradients
w.grad.zero_()
# b.grad.zero_()

# Optimizer has zero_grad() method
# optimizer = torch.optim.SGD([weights,biases], lr=0.01)
# During training:
# optimizer.zero_grad()

print(loss.item())

4.0


**Gradient Descent**
```markdown
Iteartively updates the weights of the model in order to minimize the loss function. 
```
$w_{t+1} = w_{t} - lr * \frac{dl}{dw}$

In [7]:
# INCREASE THE NUMBER OF EPOCHS
x = torch.tensor(1.0)
y = torch.tensor(2.0)
lr = 0.1

w = torch.tensor(1.0, requires_grad=True)

for epoch in range(10):
    
    y_predicted = w * x 
    loss = (y_predicted - y)**2
    
    print(f"w: {w.item():.2f}-> y_pred: {y_predicted.item():.2f} -> loss: {loss.item():.2f}")

    # backward pass to compute gradient dLoss/dw
    loss.backward()

    with torch.no_grad():
        w -= lr * w.grad
        
    w.grad.zero_()


w: 1.00-> y_pred: 1.00 -> loss: 1.00
w: 1.20-> y_pred: 1.20 -> loss: 0.64
w: 1.36-> y_pred: 1.36 -> loss: 0.41
w: 1.49-> y_pred: 1.49 -> loss: 0.26
w: 1.59-> y_pred: 1.59 -> loss: 0.17
w: 1.67-> y_pred: 1.67 -> loss: 0.11
w: 1.74-> y_pred: 1.74 -> loss: 0.07
w: 1.79-> y_pred: 1.79 -> loss: 0.04
w: 1.83-> y_pred: 1.83 -> loss: 0.03
w: 1.87-> y_pred: 1.87 -> loss: 0.02


**OPTIMIZERS**
```markdown
`torch.optim` provides implementations of various optimization algorithms.

- SGD
- Adam
- RMSprop
- Adagrad

optimizers are usefull in updating the weights of the model. 

optimizers = torch.optim.SGD(model.parameters(), lr=0.01)
```

In [8]:
import torch

# Linear regression
# f = w * x 
# end goal : f = 2 * x

# dataset
X = torch.tensor([1, 2, 3, 4], dtype=torch.float32)
Y = torch.tensor([2, 4, 6, 8], dtype=torch.float32)

w = torch.tensor(0.0, dtype=torch.float32, requires_grad=True)

def forward(x):
    return w * x

LR = 0.1
EPOCHS = 10

# optimizer
optimizer = torch.optim.SGD([w], lr=LR)

# Training loop
for epoch in range(EPOCHS):
    #forward pass
    y_predicted = forward(X)

    # loss
    l = torch.nn.MSELoss()(Y, y_predicted)
    print(f"w: {w.item():.2f} -> loss: {l.item():.2f}")

    # calculate gradients 
    l.backward()

    # update weights
    optimizer.step()

    # zero the gradients after updating
    optimizer.zero_grad()

w: 0.00 -> loss: 30.00
w: 3.00 -> loss: 7.50
w: 1.50 -> loss: 1.88
w: 2.25 -> loss: 0.47
w: 1.88 -> loss: 0.12
w: 2.06 -> loss: 0.03
w: 1.97 -> loss: 0.01
w: 2.02 -> loss: 0.00
w: 1.99 -> loss: 0.00
w: 2.00 -> loss: 0.00


**torch.nn.Module**
```markdown
- Base class for all neural network modules.
- Your models should also subclass this class.
- It has a `forward` method that defines the computation performed at every call.
- Layers also are subclasses of `nn.Module`
```

```python
class LinearModel(torch.nn.Module):
    def __init__(self, input_dim, output_dim):
        super(LinearModel, self).__init__()
        # Define layers
        self.linearLayer = torch.nn.Linear(input_dim, output_dim)
        ...
    def forward(self, x):
        output = self.linearLayer(x)
        return output
model = LinearModel(input_size, output_size)
```

In [9]:
import torch 
import torch.nn as nn
from collections import OrderedDict

# Dataset
X = torch.tensor([1, 2, 3, 4], dtype=torch.float32).unsqueeze(1)
Y = torch.tensor([2, 4, 6, 8], dtype=torch.float32).unsqueeze(1)

print(f"input_shape: {X.shape}, target:{Y.shape}")

input_shape: torch.Size([4, 1]), target:torch.Size([4, 1])


In [None]:

# 1 MODEL
# model = nn.Linear(in_features=X.shape[1], out_features=Y.shape[1])

# 2 Sequential API
# model = nn.Sequential(
#     nn.Linear(in_features=X.shape[1], out_features=Y.shape[1])
# )

# model = nn.Sequential(
#     OrderedDict([("L1",nn.Linear(in_features=X.shape[1], out_features=Y.shape[1]))])
# )

#3 Custom model
class LinearModel(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(LinearModel, self).__init__()
        # define layers
        self.linear = nn.Linear(input_dim, output_dim)
    def forward(self, x):
        output = self.linear(x)
        return output
    
model = LinearModel(input_dim=X.shape[1], output_dim=Y.shape[1])
model.parameters


<bound method Module.parameters of Sequential(
  (L1): Linear(in_features=1, out_features=1, bias=True)
)>

In [15]:

LR = 0.1
EPOCHS = 10

# loss 
loss = nn.MSELoss()

# optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=LR)

# Training loop
for epoch in range(EPOCHS):
    #forward pass
    y_predicted = model(X)

    # loss
    l = loss(Y, y_predicted)
    print(f"w: {w.item():.2f} -> loss: {l.item():.2f}")

    # calculate gradients 
    l.backward()

    # update weights
    optimizer.step()

    # zero the gradients after updating
    optimizer.zero_grad()

w: 2.00 -> loss: 41.43
w: 2.00 -> loss: 18.66
w: 2.00 -> loss: 8.44
w: 2.00 -> loss: 3.84
w: 2.00 -> loss: 1.78
w: 2.00 -> loss: 0.84
w: 2.00 -> loss: 0.42
w: 2.00 -> loss: 0.23
w: 2.00 -> loss: 0.14
w: 2.00 -> loss: 0.10


In [11]:
#