## Multioutput Regression: PyTorch

Multioutput regression are regression problems that involve predicting two or more numerical values given an input example. An example might be to predict a coordinate given an input, e.g. predicting x and y values. Another example would be multi-step time series forecasting that involves predicting multiple future time series of a given variable. 

We'll create a model that predicts crop yeilds for apples and oranges (target variables) by looking at the average temperature, rainfall and humidity (input variables or features) in a region. In a linear regression model, each target variable is estimated to be a weighted sum of the input variables, offset by some constant, known as a bias :

```
yield_apple  = w11 * temp + w12 * rainfall + w13 * humidity + b1
yield_orange = w21 * temp + w22 * rainfall + w23 * humidity + b2
```

Visually, it means that the yield of apples is a linear or planar function of the temperature, rainfall & humidity.

Our objective: Find a suitable set of weights and biases using the training data, to make accurate predictions.



## Import libraries

In [1]:
import numpy as np
import torch
from torch.utils.data import TensorDataset, DataLoader
import torch.nn.functional as F

## Training data

In [2]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

In [3]:
# Targets (apples, oranges)
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')

In [4]:
# Convert inputs and targets to tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


## Linear Regression Model (simple)

The weights and biases can also be represented as matrices, initialized with random values. The first row of w and the first element of b are use to predict the first target variable i.e. yield for apples, and similarly the second for oranges.

In [5]:
# Weights and biases
w = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, requires_grad=True)
print(w)
print(b)

tensor([[ 0.9336, -0.8414,  1.0118],
        [ 1.2007, -0.7262, -0.4521]], requires_grad=True)
tensor([-1.2941,  0.3321], requires_grad=True)


The model is simply a function that performs a matrix multiplication of the input x and the weights w (transposed) and adds the bias b (replicated for each observation).

In [6]:
# Define the model
def model(X):
    return torch.add(torch.matmul(X, w.T), b)

### Step 1: Forward pass

In [7]:
# Generate predictions
preds = model(inputs)
print(preds)

tensor([[ 53.9922,  19.8917],
        [ 74.3752,  16.7616],
        [ 25.8670, -18.7322],
        [ 95.1887,  74.8520],
        [ 53.1759, -18.1753]], grad_fn=<AddBackward0>)


In [8]:
# Compare with targets
print(targets)

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


Because we've started with random weights and biases, the model does not a very good job of predicting the target varaibles.

### Step 2: Calculate loss (cost) by comparison of prediction and target



In [9]:
# MSE loss
def mse(my_y_hat, my_y):
  mse_cost = (1/(len(my_y))) * torch.sum((my_y_hat - my_y)**2)
  return mse_cost

In [10]:
# Compute loss
loss = mse(preds, targets)
print(loss)

tensor(13888.0303, grad_fn=<MulBackward0>)


The resulting number is called the loss, because it indicates how bad the model is at predicting the target variables. Lower the loss, better the model.

### Step 3: Compute gradients of loss with respect to w and b

A key insight from calculus is that the gradient indicates the rate of change of the loss, or the slope of the loss function w.r.t. the weights and biases.

If a gradient element is postive,
- increasing the element's value slightly will increase the loss.
- decreasing the element's value slightly will decrease the loss.

If a gradient element is negative,
- increasing the element's value slightly will decrease the loss.
- decreasing the element's value slightly will increase the loss.

The increase or decrease is proportional to the value of the gradient.

In [11]:
# Compute gradients
loss.backward()

In [12]:
# Gradients for weights
print(w)
print(w.grad)

tensor([[ 0.9336, -0.8414,  1.0118],
        [ 1.2007, -0.7262, -0.4521]], requires_grad=True)
tensor([[ -1929.8463,  -5933.3320,  -2676.6975],
        [-12051.3965, -17057.4160,  -9819.2520]])


In [13]:
# Gradients for bias
print(b)
print(b.grad)

tensor([-1.2941,  0.3321], requires_grad=True)
tensor([ -31.3604, -154.1609])


### Step 4: Adjust w and b with gradient descent



In [14]:
# Adjust weights & reset gradients
with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5
    w.grad.zero_()
    b.grad.zero_()

In [15]:
print(w)
print(b)

tensor([[ 0.9529, -0.7820,  1.0385],
        [ 1.3212, -0.5556, -0.3539]], requires_grad=True)
tensor([-1.2937,  0.3336], requires_grad=True)


With the new weights and biases, the model should have a lower loss.

In [16]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(9111.3730, grad_fn=<MulBackward0>)


### Iteratively minimize loss (MSE) toward zero

To reduce the loss further, we repeat the process of adjusting the weights and biases using the gradients multiple times. Each iteration is called an epoch.

In [17]:
# Train for 100 epochs
for i in range(100):
    preds = model(inputs)
    loss = mse(preds, targets)
    loss.backward()
    with torch.no_grad():
        w -= w.grad * 1e-5
        b -= b.grad * 1e-5
        w.grad.zero_()
        b.grad.zero_()

In [18]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(538.9111, grad_fn=<MulBackward0>)


In [19]:
# Print predictions
preds

tensor([[ 61.0541,  75.9017],
        [ 89.2427,  98.6308],
        [ 96.4353, 128.6068],
        [ 45.3944,  68.6145],
        [ 99.7155,  97.2147]], grad_fn=<AddBackward0>)

In [20]:
# Print targets
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

## Linear Regression Model (PyTorch built-ins)

### Load data

In [21]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], [91, 88, 64], [87, 134, 58], [102, 43, 37], [69, 96, 70], [73, 67, 43], [91, 88, 64], [87, 134, 58], [102, 43, 37], [69, 96, 70], [73, 67, 43], [91, 88, 64], [87, 134, 58], [102, 43, 37], [69, 96, 70]], dtype='float32')
# Targets (apples, oranges)
targets = np.array([[56, 70], [81, 101], [119, 133], [22, 37], [103, 119], 
                    [56, 70], [81, 101], [119, 133], [22, 37], [103, 119], 
                    [56, 70], [81, 101], [119, 133], [22, 37], [103, 119]], dtype='float32')

In [22]:
# convert from numpy to torch format
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

In [23]:
inputs.shape

torch.Size([15, 3])

In [24]:
inputs

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.],
        [ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.],
        [ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])

In [25]:
targets.shape

torch.Size([15, 2])

In [26]:
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

## Dataset and DataLoader

We'll create a TensorDataset, which allows access to rows from inputs and targets as tuples. We'll also create a DataLoader, to split the data into batches while training. It also provides other utilities like shuffling and sampling.

In [27]:
# Define dataset
train_ds = TensorDataset(inputs, targets)
train_ds[0:3] # first three rows of inputs and targets

(tensor([[ 73.,  67.,  43.],
         [ 91.,  88.,  64.],
         [ 87., 134.,  58.]]), tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.]]))

In [28]:
# Define data loader
batch_size = 5 # give five rows of targets and inputs per iteration
train_dl = DataLoader(train_ds, batch_size, shuffle=True)
next(iter(train_dl))

[tensor([[ 69.,  96.,  70.],
         [ 73.,  67.,  43.],
         [ 87., 134.,  58.],
         [ 73.,  67.,  43.],
         [102.,  43.,  37.]]), tensor([[103., 119.],
         [ 56.,  70.],
         [119., 133.],
         [ 56.,  70.],
         [ 22.,  37.]])]

### torch.nn.Linear

Instead of initializing the weights & biases manually, we can define the model using nn.Linear.

In [29]:
# Define model
model = torch.nn.Linear(3, 2) # 3 inputs, 2 targets
print(model.weight)
print(model.bias)

Parameter containing:
tensor([[-0.3137,  0.4896, -0.4389],
        [-0.1163,  0.2314, -0.0939]], requires_grad=True)
Parameter containing:
tensor([-0.5385,  0.3544], requires_grad=True)


### Optimizer

Instead of manually manipulating the weights & biases using gradients, we can use the optimizer optim.SGD.

In [30]:
# Define optimizer
opt = torch.optim.SGD(model.parameters(), lr=1e-5)

### Loss function

Instead of defining a loss function manually, we can use the built-in loss function mse_loss.

In [31]:
# Define loss function
loss_fn = F.mse_loss

In [32]:
# compute loss after forward pass
loss = loss_fn(model(inputs), targets)
print(loss)

tensor(8071.2856, grad_fn=<MseLossBackward>)


### Train model

We are ready to train the model now. We can define a utility function fit which trains the model for a given number of epochs.

In [33]:
# Define a utility function to train the model
def fit(num_epochs, model, loss_fn, opt):
    for epoch in range(num_epochs):
        for xb,yb in train_dl:
            # Step 0: reset gradients to 0
            opt.zero_grad()
            # Step 1: make prediction
            pred = model(xb)
            # Step 2: compute loss
            loss = loss_fn(pred, yb)
            # Step 3: compute grad of loss w.r.t. w & b
            loss.backward()
            # Step 4: adjust w & b via gradient descent
            opt.step()
    print('Training loss: ', loss_fn(model(inputs), targets))

In [34]:
# Train the model for 100 epochs
fit(100, model, loss_fn, opt)

Training loss:  tensor(40.8121, grad_fn=<MseLossBackward>)


### Make predictions

In [35]:
# Generate predictions
preds = model(inputs)
preds

tensor([[ 58.1122,  71.5185],
        [ 77.9521,  97.3291],
        [127.0259, 138.6841],
        [ 26.4697,  43.2231],
        [ 91.2556, 109.7681],
        [ 58.1122,  71.5185],
        [ 77.9521,  97.3291],
        [127.0259, 138.6841],
        [ 26.4697,  43.2231],
        [ 91.2556, 109.7681],
        [ 58.1122,  71.5185],
        [ 77.9521,  97.3291],
        [127.0259, 138.6841],
        [ 26.4697,  43.2231],
        [ 91.2556, 109.7681]], grad_fn=<AddmmBackward>)

In [36]:
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

## Feed Forward Neural Network

Conceptually, you think of feedforward neural networks as two or more linear regression models stacked on top of one another with a non-linear activation function applied between them. To use a feedforward neural network instead of linear regression, we can extend the nn.Module class from PyTorch.

In [37]:
# create neural network
class SimpleNet(torch.nn.Module):
    # Initialize the layers
    def __init__(self):
        super().__init__()
        self.linear1 = torch.nn.Linear(3, 3) # 3 inputs, 3 outputs
        self.act1 = torch.nn.ReLU() # Activation function
        self.linear2 = torch.nn.Linear(3, 2) # 3 inputs, 2 outputs
    
    # Perform the computation
    def forward(self, x):
        x = self.linear1(x) # feed into first layer
        x = self.act1(x) # apply activation function
        x = self.linear2(x) # feed through second layer
        return x # output

Now we can define the model, optimizer and loss function exactly as before.

In [38]:
# define model
model = SimpleNet()

In [39]:
# define optimizer
opt = torch.optim.SGD(model.parameters(), 1e-5)

In [40]:
# define loss function
loss_fn = F.mse_loss

Finally, we can apply gradient descent to train the model using the same fit function defined earlier for linear regression.

In [41]:
# train model
fit(100, model, loss_fn, opt)

Training loss:  tensor(34.7236, grad_fn=<MseLossBackward>)


In [42]:
# Generate predictions
preds = model(inputs)
preds

tensor([[ 59.7845,  70.2596],
        [ 81.6336,  96.0347],
        [120.6448, 142.0556],
        [ 31.6014,  37.0124],
        [ 94.5566, 111.2798],
        [ 59.7845,  70.2596],
        [ 81.6336,  96.0347],
        [120.6448, 142.0556],
        [ 31.6014,  37.0124],
        [ 94.5566, 111.2798],
        [ 59.7845,  70.2596],
        [ 81.6336,  96.0347],
        [120.6448, 142.0556],
        [ 31.6014,  37.0124],
        [ 94.5566, 111.2798]], grad_fn=<AddmmBackward>)

In [43]:
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])