<a href="https://colab.research.google.com/github/jurados/NotesPytorch/blob/main/%5B02%5D_Linear_Regression_with_PyTorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

https://www.youtube.com/watch?v=vo_fUOk-IKk&list=PLWKjhJtqVAbm3T2Eq1_KgloC7ogdXxdRa

# 02 Linear Regression with PyTorch

|Region|Temp. (F)|Reinfall (mm)|Humidity (%)|Apples (ton)|Oranges (ton)|
|------|:------:|:------------:|:----------:|:----------:|:-----------:|
|Kanto |73      |67            |43          |56          |70           |  
|Johto |91      |88            |64          |81          |101          |  
|Hoenn |87      |134           |58          |119         |133          |  
|Sinnoh|102     |43            |37          |22          |37           |  
|Unova |69      |96            |70          |103         |119          |  


In [None]:
# Inputs (temp, rainfall, humidity)
inputs = np.array([[73,67,43],
                   [91,88,64],
                   [87,134,58],
                   [102,43,37],
                   [69,96,40]], dtype='float32')

In [None]:
# Targets (temp, rainfall, humidity)
targets = np.array([[56,70],
                   [81,101],
                   [119,133],
                   [22,37],
                   [103,119]], dtype='float32')

In [None]:
# Convert inputs and targets to tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
inputs, targets

(tensor([[ 73.,  67.,  43.],
         [ 91.,  88.,  64.],
         [ 87., 134.,  58.],
         [102.,  43.,  37.],
         [ 69.,  96.,  40.]]),
 tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.],
         [ 22.,  37.],
         [103., 119.]]))

## 2.1 Linear regression model from scratch

The weights and biases (`w11`,`w12`,$\cdots$ `w23`, `b1` and `b2`) can also be represented as matrices, initialized as random values.

In [None]:
# Weights and biases
W = torch.rand(2,3, requires_grad=True)
b = torch.rand(2,requires_grad=True)
print(f'Weights:\n{W}')
print(f"Biases:\n{b}")

Weights:
tensor([[0.5785, 0.8296, 0.1514],
        [0.1127, 0.5120, 0.2324]], requires_grad=True)
Biases:
tensor([0.3732, 0.3643], requires_grad=True)


We can define the model as follows:

In [None]:
model = lambda X: X @ W.t() + b

In [None]:
# Generate predictions
preds = model(inputs)
preds

tensor([[104.6970,  52.8924],
        [135.7107,  70.5547],
        [170.6506,  92.2628],
        [100.6550,  42.4778],
        [125.9877,  66.5932]], grad_fn=<AddBackward0>)

In [None]:
# Compare with targets
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

## 2.2 Loss function
We can compare the model's predictions with the actual targets, using the following method:
- Calculate the difference between the two matrices(`preds` and `targets`).
- Square all elements of the difference matrix to remove negative values.
- Calculate the average of the elements in the resulting matrix.


The result is a single number, known as the **Mean Squared Error** (MSE).

In [None]:
# MSE loss
def mse(t1,t2):
  diff = t1 - t2
  return torch.sum(diff*diff)/diff.numel()

`torch.sum` return the sum of all the elements in a tensor, and the  `.numel` method return the number of elements in a tensor.

In [None]:
loss = mse(preds, targets)
loss

tensor(2040.3079, grad_fn=<DivBackward0>)

## 2.3 Compute gradients

In [None]:
# Compute gradients
loss.backward()

In [None]:
# Gradients for weights
print(W)
print(W.grad)

tensor([[0.5785, 0.8296, 0.1514],
        [0.1127, 0.5120, 0.2324]], requires_grad=True)
tensor([[ 4527.2246,  4117.4814,  2484.1870],
        [-2124.1692, -2815.9377, -1388.0955]])


We reset the greadients to zero by calling `.zero_()` method. We need to do this, because PyTorch accumulates, gradient i.e. the next times we call `.backward` on the loss the new gradient vfalues will get added to the existing gradient values, which may lead to unexpected results.

In [None]:
W.grad.zero_()
b.grad.zero_()
W.grad, b.grad

(tensor([[0., 0., 0.],
         [0., 0., 0.]]),
 tensor([0., 0.]))

## 2.4 Adjust weights and biases using gradient descent
We'll reduce the loss and improve our model using the gradient descent optimization algorithm, which has the following steps.

1. Generate predictions.
2. Calculate the loss.
3. Compute gradients, the weight and biases.
4. Adjust the weights by sustracting a small quantity proportional to the gradient.
5. Reset the gradietns to zero.

In [None]:
# Generate predictions
preds = model(inputs)
preds

tensor([[104.6970,  52.8924],
        [135.7107,  70.5547],
        [170.6506,  92.2628],
        [100.6550,  42.4778],
        [125.9877,  66.5932]], grad_fn=<AddBackward0>)

In [None]:
# Calculate the loss
loss = mse(preds, targets)
loss

tensor(2040.3079, grad_fn=<DivBackward0>)

In [None]:
# Compute gradients
loss.backward()
W.grad, b.grad

(tensor([[ 4527.2246,  4117.4814,  2484.1870],
         [-2124.1692, -2815.9377, -1388.0955]]),
 tensor([ 51.3402, -27.0438]))

Finally, we update the weights and biases using the gradietn computed above

In [None]:
# Ajust weights and reset gradients
with torch.inference_mode(): # We use this to indicate to PyTorch tath we shouldn't track, calculate or modify gradients while updating the weights and biases
  W -= W.grad * 1e-5 # We multiplly the gradietns with a really small number (1e-5), to ensure that we don't modify the weights by a really large mount
  b -= b.grad * 1e-5
  W.grad.zero_()
  b.grad.zero_()

In [None]:
W, b

(tensor([[0.5785, 0.8296, 0.1514],
         [0.1127, 0.5120, 0.2324]], requires_grad=True),
 tensor([0.3732, 0.3643], requires_grad=True))

In [None]:
# Calculate the loss
loss = mse(preds, targets)
loss

tensor(2040.3079, grad_fn=<DivBackward0>)

## 2.5 Train for multiple epochs

In [None]:
# Train for 100 epochs:
for i in tqdm(range(100)):
  preds = model(inputs)
  loss = mse(preds, targets)
  loss.backward()

  with torch.no_grad():
    W -= W.grad * 1e-5
    b -= b.grad * 1e-5
    W.grad.zero_()
    b.grad.zero_()

  0%|          | 0/100 [00:00<?, ?it/s]

In [None]:
preds = model(inputs)
loss = mse(preds, targets)
loss

tensor(95.2721, grad_fn=<DivBackward0>)

## 2.6 Linear Regression using PyTorch buil-ins

In [None]:
import torch.nn as nn

In [None]:
# Inputs (temp, rainfall, humidity)
inputs = np.array([[73,67,43],
                   [91,88,64],
                   [87,134,58],
                   [102,43,37],
                   [69,96,70],
                   [73,67,43],
                   [91,88,64],
                   [87,134,58],
                   [102,43,37],
                   [69,96,70],
                   [73,67,43],
                   [91,88,64],
                   [87,134,58],
                   [102,43,37],
                   [69,96,70]], dtype='float32')

# Targets (temp, rainfall, humidity)
targets = np.array([[56,70],
                   [81,101],
                   [119,133],
                   [22,37],
                   [103,119],
                    [81,101],
                   [119,133],
                   [22,37],
                   [103,119],
                    [56,70],
                    [81,101],
                   [119,133],
                   [22,37],
                   [103,119],
                    [22,37]], dtype='float32')

inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

## 2.7 Dataset and DataLoader

We'll create a `TensorDataset`, which allows access to rows from `inputs` and `targets` as tuples, and provides standard APIs for working with many different types of datasets in PyTorch.

In [None]:
from torch.utils.data import TensorDataset

In [None]:
# Define dataset
train_dataset = TensorDataset(inputs, targets)
train_dataset[:3]

(tensor([[ 73.,  67.,  43.],
         [ 91.,  88.,  64.],
         [ 87., 134.,  58.]]),
 tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.]]))

We'll also create `DataLoader`, which can split the data into batches of a predefined size while training. It also provides other utilities like shuffling and random sampling of the data.

In [None]:
from torch.utils.data import DataLoader

In [None]:
# Define data lodaer
batch_size = 5
train_dataloader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)

In [None]:
for x, y in train_dataloader:
  print(x)
  print(y)

tensor([[ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 73.,  67.,  43.],
        [ 69.,  96.,  70.],
        [ 69.,  96.,  70.]])
tensor([[119., 133.],
        [103., 119.],
        [ 81., 101.],
        [103., 119.],
        [ 56.,  70.]])
tensor([[102.,  43.,  37.],
        [ 73.,  67.,  43.],
        [102.,  43.,  37.],
        [ 87., 134.,  58.],
        [ 91.,  88.,  64.]])
tensor([[103., 119.],
        [ 81., 101.],
        [ 22.,  37.],
        [ 22.,  37.],
        [ 81., 101.]])
tensor([[ 73.,  67.,  43.],
        [ 87., 134.,  58.],
        [ 69.,  96.,  70.],
        [ 91.,  88.,  64.],
        [ 91.,  88.,  64.]])
tensor([[ 56.,  70.],
        [ 22.,  37.],
        [ 22.,  37.],
        [119., 133.],
        [119., 133.]])


## 2.8 `nn.Linear`

In [None]:
# Define model
model = nn.Linear(in_features=3, out_features=2)
print(f'weight model: {model.weight} and bias model: {model.bias}')

weight model: Parameter containing:
tensor([[ 0.4197,  0.2760, -0.5534],
        [-0.0582,  0.3293, -0.2790]], requires_grad=True) and bias model: Parameter containing:
tensor([ 0.0629, -0.0830], requires_grad=True)


In [None]:
# Parameters
list(model.parameters()) # return a list containing all the weights and bias matrices present in the model

[Parameter containing:
 tensor([[ 0.4197,  0.2760, -0.5534],
         [-0.0582,  0.3293, -0.2790]], requires_grad=True),
 Parameter containing:
 tensor([ 0.0629, -0.0830], requires_grad=True)]

In [None]:
# Generate predictions
preds = model(inputs)
preds

tensor([[25.4018,  5.7355],
        [27.1324,  5.7448],
        [41.4715, 22.8009],
        [34.2697, -2.1833],
        [16.7860,  7.9866],
        [25.4018,  5.7355],
        [27.1324,  5.7448],
        [41.4715, 22.8009],
        [34.2697, -2.1833],
        [16.7860,  7.9866],
        [25.4018,  5.7355],
        [27.1324,  5.7448],
        [41.4715, 22.8009],
        [34.2697, -2.1833],
        [16.7860,  7.9866]], grad_fn=<AddmmBackward0>)

## 2.9 Loss function
Instead of defining a loss function manually, we can use the built-in loss function `mse_loss`

In [None]:
# Import nn.functional
import torch.nn.functional as F

The `nn.functional` package contains many useful loss function and several other utilities.

In [None]:
# Define loss function
loss_fn = F.mse_loss

In [None]:
loss = loss_fn(model(inputs), targets)
loss

tensor(5882.3721, grad_fn=<MseLossBackward0>)

## 2.10 Optimizer

In [None]:
# Define optimizer
opt = torch.optim.SGD(params=model.parameters(), lr =1e-5)

## 2.11 Train the model

In [None]:
# Utility function to train the mode
def fit(num_epochs, model, loss_fn, opt, train_dataloader):

  # Repeat for given number of epochs
  for epoch in tqdm(range(num_epochs)):

    # Train with batches of data:
    for X_train, y_train in train_dataloader:

      # 1. Generate predictions
      pred = model(X_train)

      # 2. Calculate loss
      loss = loss_fn(pred, y_train)

      # 3. Compute gradients
      loss.backward()

      # 4. Update parameters using gradients
      opt.step()

      # 5. Reset the gradient to zero
      opt.zero_grad()

    if (epoch+1) %10 == 0:
      print(f"Epoch: {epoch+1} | Loss: {loss.item():.4f}")

Instead of updating parameters (weights and biases) manually, we use `opt.step()` to perfom the update, and `opt.zero_grad()` to reset the gradients to zero.

In [None]:
fit(num_epochs=100,model=model,loss_fn=loss_fn,opt=opt,train_dataloader=train_dataloader)

  0%|          | 0/100 [00:00<?, ?it/s]

Epoch: 10 | Loss: 2970.0747
Epoch: 20 | Loss: 1940.7731
Epoch: 30 | Loss: 1057.7996
Epoch: 40 | Loss: 1301.7987
Epoch: 50 | Loss: 2324.1821
Epoch: 60 | Loss: 1331.4520
Epoch: 70 | Loss: 1843.0999
Epoch: 80 | Loss: 1752.9121
Epoch: 90 | Loss: 739.1357
Epoch: 100 | Loss: 2137.1538


In [None]:
# Generate predicitons
preds = model(inputs)
preds

tensor([[ 62.4116,  74.8610],
        [ 77.0069,  95.4333],
        [ 75.5634,  90.3677],
        [ 87.5529, 100.1554],
        [ 57.4666,  76.5839],
        [ 62.4116,  74.8610],
        [ 77.0069,  95.4333],
        [ 75.5634,  90.3677],
        [ 87.5529, 100.1554],
        [ 57.4666,  76.5839],
        [ 62.4116,  74.8610],
        [ 77.0069,  95.4333],
        [ 75.5634,  90.3677],
        [ 87.5529, 100.1554],
        [ 57.4666,  76.5839]], grad_fn=<AddmmBackward0>)