# Linear Regression and Gradient Descent

| Region | Temp. (F) | Rainfall (mm) | Humidity (%) | Apples (tons) | Oranges (ton) |
|--------|-----------|---------------|--------------|---------------|---------------|
| Kanto  | 73        | 67            | 43           | 56            | 70            |
| Johto  | 91        | 88            | 64           | 81            | 101           |
| Hoenn  | 87        | 134           | 58           | 119           | 133           |
| Sinnoh | 102       | 43            | 37           | 22            | 37            |
| Unova  | 69        | 96            | 70           | 103           | 119           |

#### In a linear regression model, each target variable is estimated to be a weighted sum of the input variables, offset by some constant, known as a bias:

#### yield_apple  = w11 * temp + w12 * rainfall + w13 * humidity + b1
#### yield_orange = w21 * temp + w22 * rainfall + w23 * humidity + b2

#### This is done by adjusting the weights (w) slightly many times to make better predictions, using an optimization technique called gradient descent.

## Traning data

In [1]:
import numpy as np
import torch

#### Read some (CSV) files as numpy arrays, do some processing, and then convert them to PyTorch tensors

In [2]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

In [3]:
# Targets (apples, oranges)
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')

In [4]:
# Convert inputs and targets to tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


In [5]:
# Weights and biases, initialized as random values
w = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, requires_grad=True)
print(w)
print(b)

tensor([[-0.0092, -0.0168,  0.4618],
        [ 0.3842,  0.6609, -0.4085]], requires_grad=True)
tensor([0.0557, 0.0212], requires_grad=True)


##### @ represents matrix multiplication in PyTorch, and the .t method returns the transpose of a tensor:

##### model = X * W^T + b (X, W^T and b are matrixes)

In [6]:
def model(x):
    return x @ w.t() + b

In [7]:
# Generate predictions
preds = model(inputs)
print(preds)

tensor([[18.1137, 54.7850],
        [27.2924, 67.0017],
        [23.7873, 98.3161],
        [15.4773, 52.5177],
        [30.1324, 61.3845]], grad_fn=<AddBackward0>)


In [8]:
# Compare with targets
print(targets)

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


##### You can see that there's a huge difference between the predictions of our model, and the actual values of the target variables. Obviously, this is because we've initialized our model with 「random」 weights and biases, and we can't expect it to just work.

# Loss function (MSE loss)

#### By loss function, we can evaluate how well our model is performing:
#### 1. Calculate the difference between the two matrices (preds and targets);
#### 2. Square all elements of the difference matrix to remove negative values;
#### 3. Calculate the average of the elements in the resulting matrix.
#### The result is a single number, known as the mean squared error (MSE).

##### torch.sum returns the sum of all the elements in a tensor, and the .numel method returns the number of elements in a tensor:

In [9]:
# MSE loss
def mse(t1, t2):
    diff = t1 - t2
    return torch.sum(diff * diff) / diff.numel()

In [10]:
# Compute loss
loss = mse(preds, targets)
print(loss)

tensor(2488.8269, grad_fn=<DivBackward0>)


In [11]:
# Compute gradients
loss.backward()

In [12]:
# Gradients for weights
print(w)
print(w.grad)

tensor([[-0.0092, -0.0168,  0.4618],
        [ 0.3842,  0.6609, -0.4085]], requires_grad=True)
tensor([[-4325.9546, -5459.7832, -3186.1606],
        [-1922.9406, -2704.5452, -1660.1467]])


In [13]:
# Reset the gradients
w.grad.zero_()
b.grad.zero_()
print(w.grad)
print(b.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])


# Adjust weights and biases using gradient descent

#### Reduce the loss and improve our model using the gradient descent optimization algorithm, which has the following steps:
#### 1. Generate predictions
#### 2. Calculate the loss
#### 3. Compute gradients w.r.t the weights and biases
#### 4. Adjust the weights by subtracting a small quantity proportional to the gradient
#### 5. Reset the gradients to zero

In [14]:
# Generate predictions
preds = model(inputs)
print(preds)

tensor([[18.1137, 54.7850],
        [27.2924, 67.0017],
        [23.7873, 98.3161],
        [15.4773, 52.5177],
        [30.1324, 61.3845]], grad_fn=<AddBackward0>)


In [15]:
# Calculate the loss
loss = mse(preds, targets)
print(loss)

tensor(2488.8269, grad_fn=<DivBackward0>)


In [16]:
# Compute gradients
loss.backward()
print(w.grad)
print(b.grad)

tensor([[-4325.9546, -5459.7832, -3186.1606],
        [-1922.9406, -2704.5452, -1660.1467]])
tensor([-53.2394, -25.1990])


In [17]:
# Adjust weights & reset gradients
with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5
    w.grad.zero_()
    b.grad.zero_()

In [18]:
print(w)
print(b)

tensor([[ 0.0340,  0.0378,  0.4936],
        [ 0.4035,  0.6879, -0.3919]], requires_grad=True)
tensor([0.0563, 0.0214], requires_grad=True)


In [19]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(1828.7830, grad_fn=<DivBackward0>)


# Train for multiple epochs

In [20]:
# Train for 100 epochs
for i in range(100):
    preds = model(inputs)
    loss = mse(preds, targets)
    loss.backward()
    with torch.no_grad():
        w -= w.grad * 1e-5
        b -= b.grad * 1e-5
        w.grad.zero_()
        b.grad.zero_()

In [21]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(177.0228, grad_fn=<DivBackward0>)


In [22]:
# Predictions
preds

tensor([[ 60.7972,  74.2010],
        [ 85.1362,  94.8627],
        [106.1921, 139.9095],
        [ 42.1846,  58.7653],
        [ 94.7126,  96.3465]], grad_fn=<AddBackward0>)

In [23]:
# Targets
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

# Linear regression using PyTorch built-ins

#### PyTorch has several built-in functions and classes to make it easy to create and train models

In [24]:
import torch.nn as nn

In [25]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], [91, 88, 64], [87, 134, 58], 
                   [102, 43, 37], [69, 96, 70], [73, 67, 43], 
                   [91, 88, 64], [87, 134, 58], [102, 43, 37], 
                   [69, 96, 70], [73, 67, 43], [91, 88, 64], 
                   [87, 134, 58], [102, 43, 37], [69, 96, 70]], 
                  dtype='float32')

# Targets (apples, oranges)
targets = np.array([[56, 70], [81, 101], [119, 133], 
                    [22, 37], [103, 119], [56, 70], 
                    [81, 101], [119, 133], [22, 37], 
                    [103, 119], [56, 70], [81, 101], 
                    [119, 133], [22, 37], [103, 119]], 
                   dtype='float32')

inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

# Create a TensorDataset and a DataLoader

In [26]:
from torch.utils.data import TensorDataset

#### The TensorDataset allows us to access a small section of the training data using the array indexing notation ([0:3] in the above code). It returns a tuple (or pair), in which the first element contains the input variables for the selected rows, and the second contains the targets.

In [27]:
# Define dataset
train_ds = TensorDataset(inputs, targets)
train_ds[0:3]

(tensor([[ 73.,  67.,  43.],
         [ 91.,  88.,  64.],
         [ 87., 134.,  58.]]), tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.]]))

In [28]:
from torch.utils.data import DataLoader

In [29]:
# Define data loader
batch_size = 5
train_dl = DataLoader(train_ds, batch_size, shuffle=True)

In [30]:
for xb, yb in train_dl:
    print(xb)
    print(yb)
    break

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 69.,  96.,  70.],
        [102.,  43.,  37.],
        [ 87., 134.,  58.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [103., 119.],
        [ 22.,  37.],
        [119., 133.]])


# Using nn.Linear to do it automatically

In [31]:
# Define model
model = nn.Linear(3, 2)
print(model.weight)
print(model.bias)

Parameter containing:
tensor([[ 0.2812, -0.3992,  0.3545],
        [-0.4438,  0.1369,  0.2265]], requires_grad=True)
Parameter containing:
tensor([ 0.1692, -0.4314], requires_grad=True)


In [32]:
# Parameters (weights and bias matrices)
list(model.parameters())

[Parameter containing:
 tensor([[ 0.2812, -0.3992,  0.3545],
         [-0.4438,  0.1369,  0.2265]], requires_grad=True),
 Parameter containing:
 tensor([ 0.1692, -0.4314], requires_grad=True)]

In [33]:
# Generate predictions
preds = model(inputs)
preds

tensor([[  9.1884, -13.9154],
        [ 13.3095, -14.2721],
        [ -8.3060,  -7.5564],
        [ 24.7965, -31.4312],
        [  6.0571,  -2.0540],
        [  9.1884, -13.9154],
        [ 13.3095, -14.2721],
        [ -8.3060,  -7.5564],
        [ 24.7965, -31.4312],
        [  6.0571,  -2.0540],
        [  9.1884, -13.9154],
        [ 13.3095, -14.2721],
        [ -8.3060,  -7.5564],
        [ 24.7965, -31.4312],
        [  6.0571,  -2.0540]], grad_fn=<AddmmBackward>)

# Loss Function

In [34]:
# Import nn.functional
import torch.nn.functional as F

In [35]:
# Define loss function
loss_fn = F.mse_loss

In [36]:
loss = loss_fn(model(inputs), targets)
print(loss)

tensor(9180.8359, grad_fn=<MseLossBackward>)


# Optimizer

In [37]:
# Define optimizer
opt = torch.optim.SGD(model.parameters(), lr=1e-5)

# Train the model (similar above)

#### 1. We use the data loader defined earlier to get batches of data for every iteration.

#### 2. Instead of updating parameters (weights and biases) manually, we use opt.step to perform the update, and opt.zero_grad to reset the gradients to zero.

#### 3. We've also added a log statement which prints the loss from the last batch of data for every 10th epoch, to track the progress of training. loss.item returns the actual value stored in the loss tensor.

In [38]:
# Utility function to train the model
def fit(num_epochs, model, loss_fn, opt):
    
    # Repeat for given number of epochs
    for epoch in range(num_epochs):
        
        # Train with batches of data
        for xb,yb in train_dl:
            
            # 1. Generate predictions
            pred = model(xb)
            
            # 2. Calculate loss
            loss = loss_fn(pred, yb)
            
            # 3. Compute gradients
            loss.backward()
            
            # 4. Update parameters using gradients
            opt.step()
            
            # 5. Reset the gradients to zero
            opt.zero_grad()
        
        # Print the progress
        if (epoch+1) % 10 == 0:
            print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))

#### Train the model for 100 epochs

In [39]:
fit(100, model, loss_fn, opt)

Epoch [10/100], Loss: 532.5573
Epoch [20/100], Loss: 372.1456
Epoch [30/100], Loss: 352.4548
Epoch [40/100], Loss: 160.7981
Epoch [50/100], Loss: 64.4002
Epoch [60/100], Loss: 99.3134
Epoch [70/100], Loss: 74.4656
Epoch [80/100], Loss: 51.2767
Epoch [90/100], Loss: 23.8580
Epoch [100/100], Loss: 18.0098


In [40]:
# Generate predictions
preds = model(inputs)
preds

tensor([[ 58.7915,  70.8567],
        [ 83.4265,  99.1782],
        [113.5642, 135.2807],
        [ 30.1790,  40.5762],
        [ 98.7473, 114.3938],
        [ 58.7915,  70.8567],
        [ 83.4265,  99.1782],
        [113.5642, 135.2807],
        [ 30.1790,  40.5762],
        [ 98.7473, 114.3938],
        [ 58.7915,  70.8567],
        [ 83.4265,  99.1782],
        [113.5642, 135.2807],
        [ 30.1790,  40.5762],
        [ 98.7473, 114.3938]], grad_fn=<AddmmBackward>)

In [41]:
# Compare with targets
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])