# ML 101 Recap

**model + loss + optimizer**


## Linear regression example

1. Model:
  - $\hat y = X \beta$

2. Loss / criterion:
  - $ err_i = y_i - f(X_i)$
  - $MSE = \frac{1}{n} \sum_{i=1}^{N} err_i^2$
  
3. Optimize:
  - minimize the MSE yields the optimal $\hat\beta$ (after doing some math)
  - $\hat\beta = (X^TX)^{-1}X^Ty$
  - (or, more generally, use gradient descent to optimize the parameters)

In [None]:
import numpy as np
from numpy.linalg import inv
from numpy.linalg import multi_dot as mdot

import matplotlib.pyplot as plt

%matplotlib inline

## LinReg with numpy

In [None]:
X = np.random.random((5, 3))
y = np.random.random(5)
X.shape, y.shape

Calculate the optimal parameter:
$\hat\beta = (X^T X)^{-1} X^T y$

In [None]:
XT = X.T  # transpose

beta_ = mdot([inv(XT @ X), XT, y])
beta_

In [None]:
XT = X.T  # transpose

beta_ = inv(XT @ X) @ XT @ y
beta_

The model $f$:

In [None]:
def f(X, beta):
    return X @ beta

f(X, beta_)

## LinReg with PyTorch

In [None]:
import torch

In [None]:
# X = torch.rand((5, 3))
# y = torch.rand(5)
X = torch.from_numpy(X)
y = torch.from_numpy(y)
X.shape, y.shape

$\hat\beta = (X^T X)^{-1} X^T y$

In [None]:
XT = X.t()

beta__ = (XT @ X).inverse() @ XT @ y
beta__

In [None]:
beta__.numpy() - beta_

## LinReg with PyTorch and Gradent Descent

Previously, we had to do some math to calculate the optimal $\hat\beta$.
PyTorch calculates the gradients for us automatically (more on that later)
and we can use some version of gradient desctent to find our $\hat\beta$.

In [None]:
from sklearn.datasets import make_regression

n_features = 1
n_samples = 100

X, y = make_regression(
    n_samples=n_samples,
    n_features=n_features,
    noise=10,
)

fix, ax = plt.subplots()
ax.plot(X, y, ".")

In [None]:
X = torch.from_numpy(X).float()
y = torch.from_numpy(y.reshape((n_samples, n_features))).float()

In [None]:
from torch import nn

class LinReg(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.beta = nn.Linear(input_dim, 1)
        
    def forward(self, X):
        return self.beta(X)


model = LinReg(n_features)

In [None]:
criterion = nn.MSELoss()

In [None]:
from torch import optim

optimizer = optim.SGD(model.parameters(), lr=0.00001)

In [None]:
# Train step
model.train()
optimizer.zero_grad()

y_ = model(X)

loss = criterion(y_, y)
loss.backward()
optimizer.step()

# Eval
model.eval()
with torch.no_grad():
    y_ = model(X)
    

# Vis
fig, ax = plt.subplots()
ax.plot(X.numpy(), y_.numpy(), ".", label="pred")
ax.plot(X.numpy(), y.numpy(), ".", label="data")
ax.set_title(f"MSE: {loss.item():0.1f}")
ax.legend();

In [None]:
model.beta

In [None]:
model.beta.weight

In [None]:
model.beta.weight.data

In [None]:
model.beta.bias

## LinReg with GPU

Simply move the data and the model to the GPU.

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = LinReg(n_features).to(device)  # <-- here
optimizer = optim.SGD(model.parameters(), lr=0.0001)
criterion = nn.MSELoss()

X, y = X.to(device), y.to(device)  # <-- here

The rest stays the same.

In [None]:
# Train step
model.train()
optimizer.zero_grad()

y_ = model(X)
loss = criterion(y_, y)

loss.backward()
optimizer.step()

# Eval
model.eval()
with torch.no_grad():
    y_ = model(X)    

# Vis
fig, ax = plt.subplots()
ax.plot(X.cpu().numpy(), y_.cpu().numpy(), ".", label="pred")
ax.plot(X.cpu().numpy(), y.cpu().numpy(), ".", label="data")
ax.set_title(f"MSE: {loss.item():0.1f}")
ax.legend();