# Pattern Guide to Machine Learning Models
Steve Henty, 2022

### Pattern Template

**Name** (linear regression, logistic regression, classification, k-means, PCA, ...)
- architecture (neural network, regression, SVM, ...)
- type (supervised, unsupervised, ...)
- use (real prediction, category classification, clustering, time-series prediction, ...)
  - strength / weakness (aside from typical under- and over-fitting)
- characteristics
  - objective
  - learning algorithm
- mechanics
  - foreward func / prediction
  - loss function
  - cost function
  - backward func / learning
  - parameters
  - hyperparamters
- evaluation
  - bias / variance (aka under- / over-fitting)
  - decision boundary
  - learning curve
  - error rate
  - precision / recall
- reference / library support


## Regression Patterns

### Linear Regression - univariate
**Abbreviation**
linreguni

**Architecture**
Polynomial in one variable (aka 'input feature')

**Type**
Supervised

**Use**
Real value prediction

<u>_Weakness_</u>
- underfits non-linear data (high bias)

**Characteristics**
<u>_Objective_</u>
Tune a polynomial that takes the input feature, and returns a predicted output value

<u>_Learning algorithm_</u>
Gradient descent

**Mechanics**

In [5]:
# Packages needed for the examples
import numpy as np

rng = np.random.default_rng(seed=42)

<u>_Forward function / Prediction_</u>
> $\large
\begin{align}
h_{\theta}(x) &= \theta_0 x_0 + \theta_1 x_1 \\
  &= \theta \cdot x \\
\end{align}
$

> where,
> $
\begin{align}
\theta &\text{ is a row vector of coefficients (parameters) of the polynomial } h_{\theta}(x) \\
x &\text{ is a col vector with } x_0 \text{ set to } 1
\end{align}
$

> **Example linreguni-1:**
>
> Given,
> $
\begin{align}
\theta &= \begin{bmatrix}
2 & 5
\end{bmatrix}
,\;
x = \begin{bmatrix}
1 \\
2
\end{bmatrix} \\
\end{align}
$

> Then,
> $
\begin{align}
\theta \cdot x &= \begin{bmatrix} (\theta_{[0,0]} \times x_{[0,0]}) + (\theta_{[0,1]} \times x_{[1,0]}) \end{bmatrix} \\
  &= \begin{bmatrix} (2 \times 1) + (5 \times 2) \end{bmatrix} \\
  &= \begin{bmatrix} 2 + 10 \end{bmatrix} \\
  &= \begin{bmatrix} 12 \end{bmatrix}
\end{align}
$


In [64]:
def h_theta (theta, features):
  """
  :param theta: Row vector of polynomial coefficients (i.e. parameters)
  :param features: Col vector of features, with x_0 set to 1 (i.e. bias term)
  :return: Estimate of y (i.e. y_hat)
  """

  y_hat = np.dot(theta, features)
  return y_hat

In [82]:
theta = np.array([[2., 5.]])
x = np.array([2])
x_bias = np.ones(x.shape[0])
X = np.array([x_bias,x])
print('theta (shape): ', theta.shape); print(theta, '\n')
print('x (shape): ', x.shape); print(x, '\n')
print('X (shape) with bias terms: ', X.shape); print(X, '\n')

y_hat = h_theta(theta, X)
print ('y_hat (shape): ', y_hat.shape); print(y_hat)

theta (shape):  (1, 2)
[[2. 5.]] 

x (shape):  (1,)
[2] 

x_bias (shape):  (1,)
[1.] 

X (shape) with bias terms:  (2, 1)
[[1.]
 [2.]] 

y_hat (shape):  (1, 1)
[[12.]]


_--- Vectorized (across samples) ---_
> $\large
\begin{align}
h_{\theta}(X) &= \theta \cdot X \\
\end{align}
$

> where,
> $
\begin{align}
\theta &\text{ is a row vector of coefficients (parameters) of the polynomial } h_{\theta} \\
X &\text{ is a } n \times m \text{ matrix of samples} \\
n &\text{ is the number of features; for univariate, } n = 2, (x_0,\;x_1) \\
m &\text{ is the number of samples}
\end{align}
$

> **Example linreguni-2:**
>
> Given,
> $
\begin{align}
\theta = \begin{bmatrix}
2 & 5
\end{bmatrix}
,\;
X = \begin{bmatrix}
1 & 1 & 1 \\
2 & 3 & 5 \\
\end{bmatrix}
\end{align}
$
> so $X$ contains three samples (the cols of $X$)


> Then,
> $
\begin{align}
\theta \cdot X &= \begin{bmatrix}
  (\theta_{[0,0]} \times X_{[0,0]}) + (\theta_{[0,1]} \times X_{[1,0]}) &
  (\theta_{[0,0]} \times X_{[0,1]}) + (\theta_{[0,1]} \times X_{[1,1]}) &
  (\theta_{[0,0]} \times X_{[0,2]}) + (\theta_{[0,1]} \times X_{[1,2]})
\end{bmatrix} \\
  &= \begin{bmatrix}
    (2 \times 1) + (5 \times 2) &
    (2 \times 1) + (5 \times 3) &
    (2 \times 1) + (5 \times 5)
  \end{bmatrix} \\
  &= \begin{bmatrix}
    (2 + 10) & (2 + 15) & (2 + 25)
  \end{bmatrix} \\
  &= \begin{bmatrix}
    12 & 17 & 27
  \end{bmatrix}
\end{align}
$
> so three predictions are returned (the cols of the result)


In [85]:
theta = np.array([[2., 5.]])
x = np.array([2, 3, 5])
x_bias = np.ones(x.shape[0])
X = np.array([x_bias,x])
print('theta (shape): ', theta.shape); print(theta, '\n')
print('x (shape): ', x.shape); print(x, '\n')
print('X (shape) with bias terms: ', X.shape); print(X, '\n')

y_hat = h_theta(theta, X)
print ('y_hat (shape): ', y_hat.shape); print(y_hat)

theta (shape):  (1, 2)
[[2. 5.]] 

x (shape):  (3,)
[2 3 5] 

X (shape) with bias terms:  (2, 3)
[[1. 1. 1.]
 [2. 3. 5.]] 

y_hat (shape):  (1, 3)
[[12. 17. 27.]]


<u>_Loss function_</u>
> $\large
\begin{align}
L_{\theta}(x^{(i)}) &= h_{\theta}(x^{(i)}) - y^{(i)}
\end{align}
$


> where,
> $
\begin{align}
y &\text{ is the training value that } h_{\theta}(x) \text{ should match} \\
i &\text{ is the index into } m \text{ samples of } x
\end{align}
$


<u>_Cost function_</u>
> $\large
\begin{align}
J(\theta) &= \frac{1}{2m}\sum\limits_{i=1}^{m} L_{\theta}(x^{(i)})^2 \\
  &= \frac{1}{2m}\sum\limits_{i=1}^{m} \left[h_{\theta}(x^{(i)}) - y^{(i)}\right]^2
\end{align}
$

> where,
> $
\begin{align}
m &\text{ is the number of samples,}
\end{align}
$

<u>_Backward function / Learning / Parameter update_</u>
> $\large
\begin{align}
\theta_{j} &:= \theta_{j} - \alpha \frac{1}{m}\sum\limits_{i=1}^{m} (\hat{y}^{(i)} - y^{(i)})x_{j}^{(i)}
\end{align}
$

> where,
> $
\begin{align}
\theta &\text{ is the set of parameters, i.e. polynomial coefficients} \\
j &\text{ is an index into the polynomial terms} \\
m &\text{ is the number of samples,} \\
\alpha &\text{ is the learning rate}
\end{align}
$

<u>_Parameters_</u>
- polynomial co-efficients, $\theta$

<u>_Hyper-parameters_</u>
- learning rate, alpha: $\alpha$
- polynomial degree / higher-order terms: $x_{j}^{p}$
- (for multivariate:  $x_{j}^{p}x_{k}^{q}$)
