# Gradient descent

### PART 1

- Prediction MANUALLY 
- Gradients Computation MANUALLY
- Loss Computation MANUALLY
- Parameter Update MANUALLY

### PART 2

- Prediction MANUALLY 
- Gradients Computation **Autograd**
- Loss Computation MANUALLY
- Parameter Update MANUALLY

### PART 3

- Prediction MANUALLY 
- Gradients Computation **Autograd**
- Loss Computation **PyTorch Loss**
- Parameter Update **PyTorch Optimizer**

### PART 4

- Prediction **PyTorch Model** 
- Gradients Computation **Autograd**
- Loss Computation **PyTorch Loss**
- Parameter Update **PyTorch Optimizer**



## PART 1

- Prediction MANUALLY 
- Gradients Computation MANUALLY
- Loss Computation MANUALLY
- Parameter Update MANUALLY

We have observations of the next "unknown" function:

$$
f(x) = 2x
$$

Predictions are calculated as:

$$
\hat{y} = wX
$$

$$
MSE = \frac{1}{n} (y-\hat{y})^T(y-\hat{y})
$$

Gradient computation manually:

$$
\frac{\partial MSE}{\partial w} = \frac{2}{n}X^T(wX-y)
$$

In [3]:
import numpy as np 
import time

# training data
X = np.array([1, 2, 3, 4], dtype=np.float32)
y = np.array([2, 4, 6, 8], dtype=np.float32)

# weight initialization
w = 0.0

# model prediction
def forward(x):
    return w*x

# loss
def loss(y,y_pred):
    return ((y-y_pred)**2).mean()

# gradient
def gradient(x,y,y_pred):
    return np.dot(2*x,y_pred-y).mean()

Now let's train the model and learn better parameters $w$. 

In [4]:
learning_rate = 0.01
total_epochs = 10

for epoch in range(total_epochs):
    # prediction (forward pass)
    y_pred = forward(X)

    # loss
    l = loss(y,y_pred)

    # gradients
    dw = gradient(X,y,y_pred)

    # update weights
    w -= learning_rate * dw

    print(f'Epoch {epoch+1}/{total_epochs}: w = {w:.3f}, loss = {l:.8f}')
    time.sleep(2)


Epoch 1/10: w = 1.200, loss = 30.00000000
Epoch 2/10: w = 1.680, loss = 4.79999924
Epoch 3/10: w = 1.872, loss = 0.76800019
Epoch 4/10: w = 1.949, loss = 0.12288000
Epoch 5/10: w = 1.980, loss = 0.01966083
Epoch 6/10: w = 1.992, loss = 0.00314570
Epoch 7/10: w = 1.997, loss = 0.00050332
Epoch 8/10: w = 1.999, loss = 0.00008053
Epoch 9/10: w = 1.999, loss = 0.00001288
Epoch 10/10: w = 2.000, loss = 0.00000206


## PART 2

- Prediction MANUALLY
- Gradients Computation **Autograd**
- Loss Computation MANUALLY
- Parameter Update MANUALLY




In [7]:
import torch 

# training data
X = torch.tensor([1, 2, 3, 4], dtype=torch.float32)
y = torch.tensor([2, 4, 6, 8], dtype=torch.float32)

# weight initialization
w = torch.tensor(0.0, dtype=torch.float32, requires_grad=True)

total_epochs = 30

for epoch in range(total_epochs):
    # prediction (forward pass)
    y_pred = forward(X)

    # loss
    l = loss(y,y_pred)

    # gradients
    l.backward() # calculates dl/dw 

    # update weights
    """The update of the wieghts w must be done
    outside the computational graph. Otherwise,
    the update will be like an operation in the
    computational graph and the gradients will
    be calculated for the update as well. """
    with torch.no_grad():
        w -= learning_rate * w.grad
    
    """The gradients are written in the w.grad
    attribute. We must set them to zero after
    each update. Otherwise, the gradients will
    be accumulated in the w.grad attribute."""
    w.grad.zero_() # _ means in-place operation

    print(f'Epoch {epoch+1}/{total_epochs}: w = {w:.3f}, loss = {l:.8f}')
    time.sleep(0.2)

    if l < 0.001:
        break




Epoch 1/30: w = 0.300, loss = 30.00000000
Epoch 2/30: w = 0.555, loss = 21.67499924
Epoch 3/30: w = 0.772, loss = 15.66018772
Epoch 4/30: w = 0.956, loss = 11.31448650
Epoch 5/30: w = 1.113, loss = 8.17471695
Epoch 6/30: w = 1.246, loss = 5.90623236
Epoch 7/30: w = 1.359, loss = 4.26725292
Epoch 8/30: w = 1.455, loss = 3.08308983
Epoch 9/30: w = 1.537, loss = 2.22753215
Epoch 10/30: w = 1.606, loss = 1.60939169
Epoch 11/30: w = 1.665, loss = 1.16278565
Epoch 12/30: w = 1.716, loss = 0.84011245
Epoch 13/30: w = 1.758, loss = 0.60698116
Epoch 14/30: w = 1.794, loss = 0.43854395
Epoch 15/30: w = 1.825, loss = 0.31684780
Epoch 16/30: w = 1.851, loss = 0.22892261
Epoch 17/30: w = 1.874, loss = 0.16539653
Epoch 18/30: w = 1.893, loss = 0.11949898
Epoch 19/30: w = 1.909, loss = 0.08633806
Epoch 20/30: w = 1.922, loss = 0.06237914
Epoch 21/30: w = 1.934, loss = 0.04506890
Epoch 22/30: w = 1.944, loss = 0.03256231
Epoch 23/30: w = 1.952, loss = 0.02352631
Epoch 24/30: w = 1.960, loss = 0.016997

## PART 3

- Prediction MANUALLY
- Gradients Computation **Autograd**
- Loss Computation **PyTorch Loss**
- Parameter Update **PyTorch Optimizer**

3 usual steps in a PyTorch pipeline:

1. Design model (input, output size, forward pass)
2. Construct loss and optimizer
3. Training loop
    - Forward pass: compute prediction and loss
    - Backward pass: gradients
    - Update weights

In [10]:
import torch.nn as nn

# weight initialization
w = torch.tensor(0.0, dtype=torch.float32, requires_grad=True)

pytorch_loss = nn.MSELoss()
optimizer = torch.optim.SGD([w], # parameters to learn
                            lr = 0.01) # learning rate

for epoch in range(total_epochs):
    
    # prediction (forward pass)
    y_pred = forward(X)

    # loss
    l = pytorch_loss(y,y_pred)

    # gradients
    l.backward() # calculates dl/dw 

    # update weights
    """ Instead of:
    with torch.no_grad():
        w -= learning_rate * w.grad
    we can use the optimizer.step() method """
    optimizer.step()

    # reset gradients
    """ Instead of:
    w.grad.zero_()
    we can use the optimizer.zero_grad() method """
    optimizer.zero_grad()

    print(f'Epoch {epoch+1}/{total_epochs}: w = {w:.3f}, loss = {l:.8f}')
    time.sleep(0.2)

    if l < 0.001:
        break

Epoch 1/30: w = 0.300, loss = 30.00000000
Epoch 2/30: w = 0.555, loss = 21.67499924
Epoch 3/30: w = 0.772, loss = 15.66018772
Epoch 4/30: w = 0.956, loss = 11.31448650
Epoch 5/30: w = 1.113, loss = 8.17471695
Epoch 6/30: w = 1.246, loss = 5.90623236
Epoch 7/30: w = 1.359, loss = 4.26725292
Epoch 8/30: w = 1.455, loss = 3.08308983
Epoch 9/30: w = 1.537, loss = 2.22753215
Epoch 10/30: w = 1.606, loss = 1.60939169
Epoch 11/30: w = 1.665, loss = 1.16278565
Epoch 12/30: w = 1.716, loss = 0.84011245
Epoch 13/30: w = 1.758, loss = 0.60698116
Epoch 14/30: w = 1.794, loss = 0.43854395
Epoch 15/30: w = 1.825, loss = 0.31684780
Epoch 16/30: w = 1.851, loss = 0.22892261
Epoch 17/30: w = 1.874, loss = 0.16539653
Epoch 18/30: w = 1.893, loss = 0.11949898
Epoch 19/30: w = 1.909, loss = 0.08633806
Epoch 20/30: w = 1.922, loss = 0.06237914
Epoch 21/30: w = 1.934, loss = 0.04506890
Epoch 22/30: w = 1.944, loss = 0.03256231
Epoch 23/30: w = 1.952, loss = 0.02352631
Epoch 24/30: w = 1.960, loss = 0.016997

## PART 4

- Prediction **PyTorch Model**
- Gradients Computation **Autograd**
- Loss Computation **PyTorch Loss**
- Parameter Update **PyTorch Optimizer**

We don't need to specify the forward function since it will be implicit in the model we select. We also don't need to define the weights $w$ since they will also be included in the model. 

In [15]:
"Training data must have the shape (n_samples, n_features)."
X = torch.tensor([[1], # 4 samples of 1 feature 
                  [2],
                  [3],
                  [4]],
                  dtype=torch.float32)
y = torch.tensor([[2], # 4 outputs of dim=1
                  [4],
                  [6],
                  [8]],
                  dtype=torch.float32)

n_samples, n_features = X.shape

input_size = n_features
output_size = y.shape[1]

model = nn.Linear(input_size, output_size)

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

for epoch in range(total_epochs):
        
    # prediction (forward pass)
    y_pred = model(X)

    # loss
    l = pytorch_loss(y,y_pred)

    # gradients
    l.backward() # calculates dl/dw 

    # update weights
    optimizer.step()

    # reset gradients
    optimizer.zero_grad()

    [w,b] = model.parameters()
    print(f'Epoch {epoch+1}/{total_epochs}: w = {w.item():.3f}, b = {b.item():.2f} loss = {l:.4f}')

    if l < 0.001:
        break

Epoch 1/30: w = -0.130, b = 0.45 loss = 42.3191
Epoch 2/30: w = 0.167, b = 0.55 loss = 29.4204
Epoch 3/30: w = 0.414, b = 0.63 loss = 20.4699
Epoch 4/30: w = 0.621, b = 0.70 loss = 14.2591
Epoch 5/30: w = 0.793, b = 0.75 loss = 9.9491
Epoch 6/30: w = 0.936, b = 0.80 loss = 6.9582
Epoch 7/30: w = 1.056, b = 0.83 loss = 4.8826
Epoch 8/30: w = 1.156, b = 0.86 loss = 3.4420
Epoch 9/30: w = 1.239, b = 0.89 loss = 2.4421
Epoch 10/30: w = 1.309, b = 0.91 loss = 1.7480
Epoch 11/30: w = 1.367, b = 0.93 loss = 1.2660
Epoch 12/30: w = 1.416, b = 0.94 loss = 0.9313
Epoch 13/30: w = 1.456, b = 0.95 loss = 0.6987
Epoch 14/30: w = 1.490, b = 0.96 loss = 0.5370
Epoch 15/30: w = 1.519, b = 0.96 loss = 0.4245
Epoch 16/30: w = 1.543, b = 0.97 loss = 0.3461
Epoch 17/30: w = 1.563, b = 0.97 loss = 0.2914
Epoch 18/30: w = 1.580, b = 0.98 loss = 0.2531
Epoch 19/30: w = 1.594, b = 0.98 loss = 0.2263
Epoch 20/30: w = 1.606, b = 0.98 loss = 0.2074
Epoch 21/30: w = 1.616, b = 0.98 loss = 0.1939
Epoch 22/30: w = 

## Custom linear regression model 

In [18]:
class LinearRegression(nn.Module):

    def __init__(self, input_dim, output_dim):
        super(LinearRegression, self).__init__()

        # define layers
        self.lin = nn.Linear(input_dim, output_dim)

        # forward pass
    def forward(self, x):
        return self.lin(x)
    

newmodel = LinearRegression(input_size, output_size)
