# Linear Regression with PyTorch

## Example: Predicting crop yields

- We will create a linear regression model to predict the yields for apples and oranges (target variable).

- The training data contain the average temperature, rainfall, and humidity (input variables or features) of five regions, and the corresponding yields in these regions:

  ![linear-regression-training-data](https://i.imgur.com/6Ujttb4.png)


- The linear regression model (a system of linear equations) contains a set of weights ($w_{feature}$) and a set of biases ($b_i$):

  ```
  yield_apple  = w11 * temp + w12 * rainfall + w13 * humidity + b1
  yield_orange = w21 * temp + w22 * rainfall + w23 * humidity + b2
  ```
- We will use PyTorch to train the model, i.e. to learn the weights and biases of the model.


In [24]:
import numpy as np
import torch

# Training data

# Input (row: each region; column: temp, rainfall, humidity)
inputs = np.array([[73, 67, 43],
                   [91, 88, 64],
                   [87, 134, 58],
                   [102, 43, 37],
                   [69, 96, 70],
                   [74, 66, 43],
                   [91, 87, 65],
                   [88, 134, 59],
                   [101, 44, 37],
                   [68, 96, 71],
                   [73, 66, 44],
                   [92, 87, 64],
                   [87, 135, 57],
                   [103, 43, 36],
                   [68, 97, 70]],
                  dtype='float32')

# Targets (row: each region; column: yields of apples, oranges)
targets = np.array([[56, 70],
                    [81, 101],
                    [119, 133],
                    [22, 37],
                    [103, 119],
                    [57, 69],
                    [80, 102],
                    [118, 132],
                    [21, 38],
                    [104, 118],
                    [57, 69],
                    [82, 100],
                    [118, 134],
                    [20, 38],
                    [102, 120]],
                   dtype='float32')

# Convert inputs and targets to tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)


## Steps for training a model using gradient descent

After defining the model, we can train it using the following steps:

1. Generate predictions
2. Calculate the loss
3. Compute gradients w.r.t. the weights and biases (parameters)
4. Update the parameters by subtracting a small quantity proportional to the gradient
5. Reset the gradients to zero

We can perform the above steps manually or with built-in PyTorch functions.


#### More info about gradient descent:
- Animation: https://www.youtube.com/watch?v=IHZwWFHWa-w
- Notes on derivates and gradient descent: https://storage.googleapis.com/supplemental_media/udacityu/315142919/Gradient%20Descent.pdf


## Training a model manually

In [25]:
# Define the model

# Initialize weights and biases with random values
w = torch.randn(2, 3, requires_grad=True) # 2x3 tensor (each row: weights for each linear equation in the model)
b = torch.randn(2, requires_grad=True) # 1x2 tensor (each column: bias for each linear equation in the model)

# Create a function for the model
def model(x):
    return x @ w.t() + b # x will be the inputs; @ means matrix multiplication; transpose w using .t (see figure below)

Note: The model is a system of linear equations. It can be represented as matrix multiplication:
![matrix-mult](https://i.imgur.com/WGXLFvA.png)


In [26]:
# Define loss function (here we use MSE, mean squared error)
def mse(t1, t2):
    diff = t1 - t2
    return torch.sum(diff * diff) / diff.numel()

# Train for 100 epochs
for i in range(100):
    preds = model(inputs) # Generate predictions
    loss = mse(preds, targets) # Calculate the loss
    loss.backward() # Compute gradients
    # Update weights & reset gradients
    with torch.no_grad(): # use no_grad() to tell PyTorch not to modify gradients while updating w and b
        w -= w.grad * 1e-5 # 1e-5 is the learning rate
        b -= b.grad * 1e-5
        w.grad.zero_() # reset gradients to zero, so next time when we call .backward(), the new gradient values won't be accumulated
        b.grad.zero_()

# generate predictions using the trained model
# model(inputs)

## Training a model using PyTorch built-ins

In [27]:
import torch.nn as nn
from torch.utils.data import TensorDataset # use this to define dataset (merging inputs and targets into one dataset object)
from torch.utils.data import DataLoader # use this to split dataset into batches (data loader)
import torch.nn.functional as F # use this to get the loss function

In [28]:
# Define dataset
train_ds = TensorDataset(inputs, targets)
train_ds[0:3]

# Define data loader
batch_size = 5
train_dl = DataLoader(train_ds, batch_size, shuffle=True) # Shuffling helps randomize input to the optimization algorithm later

# Define the linear regression model
model = nn.Linear(3, 2) # 3 input features and 2 target ouputs (the model will contain a 2x3 tensor for weights and 1x2 tensor for biases)
# Note: to access weights and biases, use model.weight and model.bias. Use model.parameters() to get a generator of all parameters.

# Define loss function
loss_fn = F.mse_loss

# Define optimizer (use SGD, i.e. "stochastic gradient descent")
# This will be used to update the parameters
opt = torch.optim.SGD(model.parameters(), lr=1e-5) # lr: learning rate

In [29]:
def fit(num_epochs, model, loss_fn, opt, train_dl):

    for epoch in range(num_epochs):
        for xb,yb in train_dl: # Train with batches of data
            pred = model(xb)         # 1. Generate predictions
            loss = loss_fn(pred, yb) # 2. Calculate loss
            loss.backward()          # 3. Compute gradients
            opt.step()               # 4. Update parameters using gradients
            opt.zero_grad()          # 5. Reset the gradients to zero

        # Print the progress
        if (epoch+1) % 10 == 0:
            print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))

In [30]:
# Train the model
fit(100, model, loss_fn, opt, train_dl)

# Generate predictions using the trained model
preds = model(inputs)
print(preds)

# Predict one observation
model(torch.tensor([[75, 63, 44.]]))

Epoch [10/100], Loss: 648.2901
Epoch [20/100], Loss: 193.2015
Epoch [30/100], Loss: 153.7234
Epoch [40/100], Loss: 102.9023
Epoch [50/100], Loss: 128.5764
Epoch [60/100], Loss: 22.4976
Epoch [70/100], Loss: 66.1930
Epoch [80/100], Loss: 33.0335
Epoch [90/100], Loss: 27.8171
Epoch [100/100], Loss: 21.7172
tensor([[ 58.0666,  71.2510],
        [ 82.0000,  99.1841],
        [115.7624, 134.1540],
        [ 28.1871,  43.0441],
        [ 97.4741, 113.0348],
        [ 57.0199,  70.2665],
        [ 81.7898,  98.9884],
        [116.0784, 134.6326],
        [ 29.2338,  44.0287],
        [ 98.3105, 113.8237],
        [ 57.8563,  71.0554],
        [ 80.9533,  98.1995],
        [115.9727, 134.3496],
        [ 27.3506,  42.2553],
        [ 98.5208, 114.0193]], grad_fn=<AddmmBackward0>)


tensor([[54.9764, 68.2570]], grad_fn=<AddmmBackward0>)