<a href="https://colab.research.google.com/github/fjpcediel87/DeepLearning--Learning/blob/main/02_Linear_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
import torch

Training data

In [2]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43],
                   [91, 88, 64],
                   [87, 134, 58],
                   [102, 43, 37],
                   [69, 96, 70]], dtype='float32')

In [3]:
# Targets (apples, oranges)
targets = np.array([[56, 70],
                    [81, 101],
                    [119, 133],
                    [22, 37],
                    [103, 119]], dtype='float32')

Input and Target data have been separated.

In [4]:
# Convert inputs and targets to tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


Linear regression model from scratch

In [5]:
# Weights and biases
w = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, requires_grad=True)
print(w)
print(b)

tensor([[-0.4790,  0.1384,  1.5438],
        [ 2.5427,  0.5653,  2.4507]], requires_grad=True)
tensor([-0.1121, -0.1192], requires_grad=True)


X=inputs
Y=targets
X*w^T + b = Y (Vector operation)

In [6]:
def model(x):
    return x @ w.t() + b

`@` represents matrix multiplication in PyTorch, and the `.t` method returns the transpose of a tensor.

The matrix obtained by passing the input data into the model is a set of predictions for the target variables.

In [7]:
# Generate predictions
preds = model(inputs)
print(preds)

tensor([[ 40.5739, 328.7503],
        [ 67.2768, 437.8539],
        [ 66.2952, 438.9808],
        [ 14.0997, 374.2185],
        [ 88.1841, 401.1406]], grad_fn=<AddBackward0>)


Above, you will find pred_Y, but it must be compared with the actual targets

In [8]:
# Compare with targets
print(targets)

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


Loss function

In [12]:
diff = preds - targets
torch.sum(diff*diff)/diff.numel() #Number of elements for diff, 10 this time.

tensor(47085.2188, grad_fn=<DivBackward0>)

The above function will be iterated, so it will be tranformed in a function called MSE loss. (Mean Squared Error)

In [13]:
# MSE loss
def mse(t1, t2):
    diff = t1 - t2
    return torch.sum(diff * diff) / diff.numel()

In [14]:
# Compute loss
loss = mse(preds, targets)
print(loss)

tensor(47085.2188, grad_fn=<DivBackward0>)


4.7E4 is a bad mse, the error is too high. We are looking for MSE near 1E-2

**Compute gradients**

In [15]:
# Compute gradients
loss.backward()

In [16]:
# Gradients for weights
print(w)
print(w.grad)

tensor([[-0.4790,  0.1384,  1.5438],
        [ 2.5427,  0.5653,  2.4507]], requires_grad=True)
tensor([[-1757.6727, -2213.1357, -1185.5824],
        [26005.3613, 25913.3496, 16531.7461]])


The loss is a [quadratic function](https://en.wikipedia.org/wiki/Quadratic_function) of our weights and biases, and our objective is to find the set of weights where the loss is the lowest. If we plot a graph of the loss w.r.t any individual weight or bias element, it will look like the figure shown below. A key insight from calculus is that the gradient indicates the rate of change of the loss, or the [slope](https://en.wikipedia.org/wiki/Slope) of the loss function w.r.t. the weights and biases.

If a gradient element is **positive**:
* **increasing** the element's value slightly will **increase** the loss.
* **decreasing** the element's value slightly will **decrease** the loss

![postive-gradient](https://i.imgur.com/hFYoVgU.png)

If a gradient element is **negative**:
* **increasing** the element's value slightly will **decrease** the loss.
* **decreasing** the element's value slightly will **increase** the loss.

![negative=gradient](https://i.imgur.com/w3Wii7C.png)

The increase or decrease in loss by changing a weight element is proportional to the value of the gradient of the loss w.r.t. that element. This forms the basis for the optimization algorithm that we'll use to improve our model.

In [17]:
w.grad.zero_()
b.grad.zero_()
print(w.grad)
print(b.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])


## Adjust weights and biases using gradient descent

We'll reduce the loss and improve our model using the gradient descent optimization algorithm, which has the following steps:

1. Generate predictions

2. Calculate the loss

3. Compute gradients w.r.t the weights and biases

4. Adjust the weights by subtracting a small quantity proportional to the gradient

5. Reset the gradients to zero

Let's implement the above step by step.

In [18]:
# Generate predictions
preds = model(inputs)
print(preds)

tensor([[ 40.5739, 328.7503],
        [ 67.2768, 437.8539],
        [ 66.2952, 438.9808],
        [ 14.0997, 374.2185],
        [ 88.1841, 401.1406]], grad_fn=<AddBackward0>)


In [19]:
# Calculate the loss
loss = mse(preds, targets)
print(loss)

tensor(47085.2188, grad_fn=<DivBackward0>)


In [20]:
# Compute gradients
loss.backward()
print(w.grad)
print(b.grad)

tensor([[-1757.6727, -2213.1357, -1185.5824],
        [26005.3613, 25913.3496, 16531.7461]])
tensor([-20.9141, 304.1888])


In [21]:
# Adjust weights & reset gradients
with torch.no_grad():
    w -= w.grad * 1e-5 #We are substracting the normalized grad from the current weight, for negative grad, we should increase the weight and for positive grads, means we should decrease the weight.
    b -= b.grad * 1e-5
    w.grad.zero_()
    b.grad.zero_()

In [22]:
print(w)
print(b)

tensor([[-0.4614,  0.1605,  1.5556],
        [ 2.2827,  0.3061,  2.2854]], requires_grad=True)
tensor([-0.1119, -0.1222], requires_grad=True)


In [23]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(32238.0117, grad_fn=<DivBackward0>)


## Train for multiple epochs

To reduce the loss further, we can repeat the process of adjusting the weights and biases using the gradients multiple times. Each iteration is called an epoch. Let's train the model for 100 epochs.

In [24]:
# Train for 100 epochs
for i in range(100):
    preds = model(inputs)
    loss = mse(preds, targets)
    loss.backward()
    with torch.no_grad():
        w -= w.grad * 1e-5
        b -= b.grad * 1e-5
        w.grad.zero_()
        b.grad.zero_()

In [25]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(495.8057, grad_fn=<DivBackward0>)


In [26]:
# Train for 100 epochs
for i in range(100):
    preds = model(inputs)
    loss = mse(preds, targets)
    loss.backward()
    with torch.no_grad():
        w -= w.grad * 1e-5
        b -= b.grad * 1e-5
        w.grad.zero_()
        b.grad.zero_()

In [27]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(178.5366, grad_fn=<DivBackward0>)


In [28]:
# Train for 100 epochs
for i in range(100):
    preds = model(inputs)
    loss = mse(preds, targets)
    loss.backward()
    with torch.no_grad():
        w -= w.grad * 1e-5
        b -= b.grad * 1e-5
        w.grad.zero_()
        b.grad.zero_()

In [29]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(82.8563, grad_fn=<DivBackward0>)


In [32]:
# Train for 100 epochs
for i in range(100):
    preds = model(inputs)
    loss = mse(preds, targets)
    loss.backward()
    with torch.no_grad():
        w -= w.grad * 1e-5
        b -= b.grad * 1e-5
        w.grad.zero_()
        b.grad.zero_()

In [33]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(50.4891, grad_fn=<DivBackward0>)


In [34]:
# Predictions
preds

tensor([[ 56.7682,  71.1420],
        [ 87.2994, 105.7621],
        [107.7489, 120.0381],
        [ 19.4258,  42.4568],
        [111.7233, 124.7916]], grad_fn=<AddBackward0>)

In [31]:
#Comparison with targets
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

## Linear regression using PyTorch built-ins

The model and training process above were implemented using basic matrix operations. But since this such a common pattern , PyTorch has several built-in functions and classes to make it easy to create and train models.

Let's begin by importing the `torch.nn` package from PyTorch, which contains utility classes for building neural networks.

In [35]:
import torch.nn as nn

In [36]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], [91, 88, 64], [87, 134, 58],
                   [102, 43, 37], [69, 96, 70], [73, 67, 43],
                   [91, 88, 64], [87, 134, 58], [102, 43, 37],
                   [69, 96, 70], [73, 67, 43], [91, 88, 64],
                   [87, 134, 58], [102, 43, 37], [69, 96, 70]],
                  dtype='float32')

# Targets (apples, oranges)
targets = np.array([[56, 70], [81, 101], [119, 133],
                    [22, 37], [103, 119], [56, 70],
                    [81, 101], [119, 133], [22, 37],
                    [103, 119], [56, 70], [81, 101],
                    [119, 133], [22, 37], [103, 119]],
                   dtype='float32')

inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

## Dataset and DataLoader

We'll create a `TensorDataset`, which allows access to rows from `inputs` and `targets` as tuples, and provides standard APIs for working with many different types of datasets in PyTorch.

In [37]:
from torch.utils.data import TensorDataset

In [38]:
# Define dataset
train_ds = TensorDataset(inputs, targets)
train_ds[0:3]

(tensor([[ 73.,  67.,  43.],
         [ 91.,  88.,  64.],
         [ 87., 134.,  58.]]),
 tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.]]))

The above is taking a batch of three rows from the whole Tensor

In [39]:
from torch.utils.data import DataLoader

In [40]:
# Define data loader
batch_size = 5
train_dl = DataLoader(train_ds, batch_size, shuffle=True)

In [43]:
for xb, yb in train_dl:
    print(xb)
    print(yb)
    break

tensor([[73., 67., 43.],
        [91., 88., 64.],
        [69., 96., 70.],
        [91., 88., 64.],
        [69., 96., 70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [103., 119.],
        [ 81., 101.],
        [103., 119.]])


## nn.Linear

Instead of initializing the weights & biases manually, we can define the model using the `nn.Linear` class from PyTorch, which does it automatically.

In [44]:
# Define model
model = nn.Linear(3, 2)
print(model.weight)
print(model.bias)

Parameter containing:
tensor([[ 0.4361, -0.5064, -0.4604],
        [ 0.4444, -0.2891,  0.5440]], requires_grad=True)
Parameter containing:
tensor([-0.0143, -0.3639], requires_grad=True)


PyTorch models also have a helpful `.parameters` method, which returns a list containing all the weights and bias matrices present in the model. For our linear regression model, we have one weight matrix and one bias matrix.

In [45]:
# Parameters
list(model.parameters())

[Parameter containing:
 tensor([[ 0.4361, -0.5064, -0.4604],
         [ 0.4444, -0.2891,  0.5440]], requires_grad=True),
 Parameter containing:
 tensor([-0.0143, -0.3639], requires_grad=True)]

In [46]:
# Generate predictions
preds = model(inputs)
preds

tensor([[-21.9071,  36.1016],
        [-34.3603,  49.4542],
        [-56.6370,  31.1134],
        [  5.6547,  52.6646],
        [-50.7672,  40.6279],
        [-21.9071,  36.1016],
        [-34.3603,  49.4542],
        [-56.6370,  31.1134],
        [  5.6547,  52.6646],
        [-50.7672,  40.6279],
        [-21.9071,  36.1016],
        [-34.3603,  49.4542],
        [-56.6370,  31.1134],
        [  5.6547,  52.6646],
        [-50.7672,  40.6279]], grad_fn=<AddmmBackward0>)

Loss function

In [47]:
# Import nn.functional
import torch.nn.functional as F

In [48]:
# Define loss function
loss_fn = F.mse_loss

In [49]:
loss = loss_fn(model(inputs), targets)
print(loss)

tensor(9471.1914, grad_fn=<MseLossBackward0>)


## Optimizer

Instead of manually manipulating the model's weights & biases using gradients, we can use the optimizer `optim.SGD`. SGD stands for `stochastic gradient descent`. It is called `stochastic` because samples are selected in batches (often with random shuffling) instead of as a single group.

In [50]:
# Define optimizer
opt = torch.optim.SGD(model.parameters(), lr=1e-5)

Train the model

In [51]:
# Utility function to train the model
def fit(num_epochs, model, loss_fn, opt):

    # Repeat for given number of epochs
    for epoch in range(num_epochs):

        # Train with batches of data
        for xb,yb in train_dl:

            # 1. Generate predictions
            pred = model(xb)

            # 2. Calculate loss
            loss = loss_fn(pred, yb)

            # 3. Compute gradients
            loss.backward()

            # 4. Update parameters using gradients
            opt.step()

            # 5. Reset the gradients to zero
            opt.zero_grad()

        # Print the progress
        if (epoch+1) % 10 == 0:
            print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))

In [55]:
fit(200, model, loss_fn, opt)

Epoch [10/200], Loss: 52.0584
Epoch [20/200], Loss: 27.1371
Epoch [30/200], Loss: 34.8862
Epoch [40/200], Loss: 20.5423
Epoch [50/200], Loss: 12.7401
Epoch [60/200], Loss: 14.3472
Epoch [70/200], Loss: 10.3887
Epoch [80/200], Loss: 7.8502
Epoch [90/200], Loss: 6.9559
Epoch [100/200], Loss: 9.1184
Epoch [110/200], Loss: 4.2265
Epoch [120/200], Loss: 20.0168
Epoch [130/200], Loss: 3.9181
Epoch [140/200], Loss: 11.0691
Epoch [150/200], Loss: 9.1498
Epoch [160/200], Loss: 6.5885
Epoch [170/200], Loss: 3.0802
Epoch [180/200], Loss: 2.3974
Epoch [190/200], Loss: 2.7813
Epoch [200/200], Loss: 3.9513


In [56]:
# Generate predictions
preds = model(inputs)
preds

tensor([[ 57.5014,  70.3195],
        [ 80.1338, 100.4124],
        [123.1151, 133.5355],
        [ 22.6223,  37.4001],
        [ 97.3938, 118.3916],
        [ 57.5014,  70.3195],
        [ 80.1338, 100.4124],
        [123.1151, 133.5355],
        [ 22.6223,  37.4001],
        [ 97.3938, 118.3916],
        [ 57.5014,  70.3195],
        [ 80.1338, 100.4124],
        [123.1151, 133.5355],
        [ 22.6223,  37.4001],
        [ 97.3938, 118.3916]], grad_fn=<AddmmBackward0>)

In [54]:
# Compare with targets
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])