**Linear regression**
- Create a model that predicts **target variables** by looking at **input variables**
- Our model:
  - Target variables: crop yields for apples and oranges
  - Input variables or features: average temperature, rainfall, humidity
- Each target var is estimated to be weighted sum of input vars, offset by a constant (bias):

```
yield_apple = w11 * temp + w12 * rainfall + w13 * humidity + b1
yield_orange = w21 * temp + w22 * rainfall + w23 * humidity + b2
```
- i.e. each target var is a linear function of all input vars
- *Learning* = using training data to find **weights** and **biases** that result in accurate predictions for new data
- *Training* = start w/ random weights, adjust them slightly many times using **gradient descent**


In [1]:
import numpy as np
import torch

**Training data**
- Each row is a data point
- Each column is a variable

In [2]:
# inputs (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43],
                   [91, 88, 64],
                   [87, 134, 58],
                   [102, 43, 37],
                   [69, 96, 70]], dtype='float32')

# targets (apples, oranges)
targets = np.array([[56, 70],
                    [81, 101],
                    [119, 113],
                    [22, 37],
                    [103, 119]], dtype='float32')

# separated inputs and targets because we'll operate on them separately

# use floating point nums because model will make non-integer predictions
# integers are harder to work with bc not continuous

In [3]:
# typically read in training data from CSV as numpy arrays, do some processing,
# and convert to pytorch tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 113.],
        [ 22.,  37.],
        [103., 119.]])


**Linear regression model from scratch**

In [4]:
# notice: 
# weights form a matrix w/ num rows = num targets, num cols = num input vars
# biases form a vector w/ num rows = num targets

# start with random weights and biases
w = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, requires_grad=True)
# torch.randn chooses elements from normal distribution (mean 0, stdev 1)
print(w)
print(b)

tensor([[ 2.1329, -0.1202,  0.9868],
        [-2.2505,  0.5090,  1.4815]], requires_grad=True)
tensor([-0.6191, -1.2895], requires_grad=True)


In [5]:
# matrix mult of inputs (x) with weights w (transposed), then add bias b
def model(x):
  # @ represents matrix mult in pytorch
  # .t() returns transpose of tensor
  return x @ w.t() + b

In [6]:
# generate predictions
preds = model(inputs)
print(preds)

tensor([[ 189.4614,  -67.7708],
        [ 246.0520,  -66.4800],
        [ 226.0715,  -42.9540],
        [ 248.2786, -154.1400],
        [ 204.0879,   -4.0080]], grad_fn=<AddBackward0>)


In [7]:
# compare with targets
print(targets)
# random weights and biases, so pretty bad

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 113.],
        [ 22.,  37.],
        [103., 119.]])


**Loss function**
- Evaluate model by comparing prediction w/ actual targets:
  - Calculate difference between preds and targets
  - Square all elements to make all positive
  - Calculate average of resulting elements
- Result is number called **mean squared error** (MSE)
- Called **loss**: higher loss = higher error = worse model

In [8]:
# MSE loss function
def mse(t1, t2):
  diff = t1 - t2
  return torch.sum(diff * diff) / diff.numel() # .numel = num elements

In [9]:
# compute loss
loss = mse(preds, targets)
print(loss)

# this means on average, each element in prediction differs from actual target 
# by about sqrt(loss)

# preds is a function of our weights and biases (inputs are fixed)
# targets are also fixed, so...
# loss is a function of our weights and biases

tensor(24095.6680, grad_fn=<DivBackward0>)


**Compute gradients**

In [10]:
# compute gradients w.r.t. weights and biases
loss.backward()

In [11]:
# gradients stored in .grad property of respective tensors
# note that derivative w.r.t. matrix is a matrix w/ same dimensions
print(w)
print(w.grad)
# each entry in gradient matrix is partial derivative of loss w.r.t.
# corresponding entry in weights matrix

tensor([[ 2.1329, -0.1202,  0.9868],
        [-2.2505,  0.5090,  1.4815]], requires_grad=True)
tensor([[ 12826.6240,  11449.6992,   7592.1562],
        [-13369.9570, -12978.9023,  -8274.1885]])


- If gradient > 0:
  - Increase weight --> increase loss
  - Decrease weight --> decrease loss (goal)
- If gradient < 0:
  - Increase weight --> decrease loss (goal)
  - Decrease weight --> increase loss
- To decrease loss, need to change weight in opposite direction as gradient
- Resulting change in loss is proportional to gradient value

In [12]:
# reset gradients to zero
# needed bc pytorch adds gradient values each time .backward() is called
w.grad.zero_()
b.grad.zero_()

print(w.grad)
print(b.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])


**Adjust weights and biases using gradient descent**
1. Generate predictions
2. Calculate loss
3. Compute gradients w.r.t. weights and biases
4. Adjust weights and biases - subtract proportional to gradient
5. Reset gradients to 0

In [13]:
# generate predictions
preds = model(inputs)
print(preds)

tensor([[ 189.4614,  -67.7708],
        [ 246.0520,  -66.4800],
        [ 226.0715,  -42.9540],
        [ 248.2786, -154.1400],
        [ 204.0879,   -4.0080]], grad_fn=<AddBackward0>)


In [14]:
# calculate loss
loss = mse(preds, targets)
print(loss)

tensor(24095.6680, grad_fn=<DivBackward0>)


In [15]:
# compute gradients
loss.backward()
print(w.grad)
print(b.grad)

tensor([[ 12826.6240,  11449.6992,   7592.1562],
        [-13369.9570, -12978.9023,  -8274.1885]])
tensor([ 146.5903, -155.0706])


In [16]:
# adjust weights and biases, then reset gradients

# pytorch will continue to track gradient values as w and b are updated
# use torch.no_grad() to tell pytorch not to track/modify gradients
with torch.no_grad():
  # subtract small quantity (10^-5 in this case) proportional to gradient
  # use small number to ensure we don't overcorrect and change weights too much
  # number is called "learning rate" of the algorithm
  w -= w.grad * 1e-5
  b -= b.grad * 1e-5
  # reset gradients to zero to avoid affecting future computations
  w.grad.zero_()
  b.grad.zero_()

In [17]:
# look at new weights and biases
print(w)
print(b)

tensor([[ 2.0046, -0.2347,  0.9109],
        [-2.1168,  0.6388,  1.5642]], requires_grad=True)
tensor([-0.6206, -1.2879], requires_grad=True)


In [18]:
# with the new weights and biases, model should now have lower loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(17091.5508, grad_fn=<DivBackward0>)


**Train for multiple epochs**
 - To further reduce loss, keep repeating this process of adjusting weights and biases using the gradients
 - Each iteration is called an **epoch**

In [47]:
# train for 1000 epochs
for i in range(1000):
  # generate predictions
  preds = model(inputs)
  # calculate loss
  loss = mse(preds, targets)
  # calculate gradients
  loss.backward()
  with torch.no_grad():
    # adjust weights and biases
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5
    # reset gradients to zero
    w.grad.zero_()
    b.grad.zero_()

In [48]:
# calculate new loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss) # much better!

tensor(1.7933, grad_fn=<DivBackward0>)


In [49]:
# new predictions
print(preds)

tensor([[ 57.0545,  66.8739],
        [ 82.3012, 100.8705],
        [118.6544, 113.2716],
        [ 21.0794,  38.1854],
        [101.9591, 120.3450]], grad_fn=<AddBackward0>)


In [24]:
# compare with targets
print(targets)

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 113.],
        [ 22.,  37.],
        [103., 119.]])
