# Linear Regression and Gradient Descent

| Region | Temp. (F) | Rainfall (mm) | Humidity (%) | Apples (tons) | Oranges (ton) |
|--------|-----------|---------------|--------------|---------------|---------------|
| Kanto  | 73        | 67            | 43           | 56            | 70            |
| Johto  | 91        | 88            | 64           | 81            | 101           |
| Hoenn  | 87        | 134           | 58           | 119           | 133           |
| Sinnoh | 102       | 43            | 37           | 22            | 37            |
| Unova  | 69        | 96            | 70           | 103           | 119           |

#### In a linear regression model, each target variable is estimated to be a weighted sum of the input variables, offset by some constant, known as a bias:

#### yield_apple  = w11 * temp + w12 * rainfall + w13 * humidity + b1
#### yield_orange = w21 * temp + w22 * rainfall + w23 * humidity + b2

#### This is done by adjusting the weights (w) slightly many times to make better predictions, using an optimization technique called gradient descent.

## Traning data

In [1]:
import numpy as np
import torch

#### Read some (CSV) files as numpy arrays, do some processing, and then convert them to PyTorch tensors

In [2]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

In [3]:
# Targets (apples, oranges)
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')

In [4]:
# Convert inputs and targets to tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


In [5]:
# Weights and biases, initialized as random values
w = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, requires_grad=True)
print(w)
print(b)

tensor([[-0.4600, -0.1477, -0.5588],
        [-1.0777,  1.8221,  0.9706]], requires_grad=True)
tensor([ 0.6040, -0.7516], requires_grad=True)


##### @ represents matrix multiplication in PyTorch, and the .t method returns the transpose of a tensor:

##### model = X * W^T + b (X, W^T and b are matrixes)

In [6]:
def model(x):
    return x @ w.t() + b

In [7]:
# Generate predictions
preds = model(inputs)
print(preds)

tensor([[-66.9026,  84.3877],
        [-90.0199, 123.6341],
        [-91.6192, 205.9379],
        [-73.3472,   3.5795],
        [-84.4331, 167.7447]], grad_fn=<AddBackward0>)


In [8]:
# Compare with targets
print(targets)

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


##### You can see that there's a huge difference between the predictions of our model, and the actual values of the target variables. Obviously, this is because we've initialized our model with 「random」 weights and biases, and we can't expect it to just work.

# Loss function (MSE loss)

#### By loss function, we can evaluate how well our model is performing:
#### 1. Calculate the difference between the two matrices (preds and targets);
#### 2. Square all elements of the difference matrix to remove negative values;
#### 3. Calculate the average of the elements in the resulting matrix.
#### The result is a single number, known as the mean squared error (MSE).

##### torch.sum returns the sum of all the elements in a tensor, and the .numel method returns the number of elements in a tensor:

In [9]:
# MSE loss
def mse(t1, t2):
    diff = t1 - t2
    return torch.sum(diff * diff) / diff.numel()

In [10]:
# Compute loss
loss = mse(preds, targets)
print(loss)

tensor(14246.7783, grad_fn=<DivBackward0>)
