# Guide to Machine Learning Models
Steve Henty, 2022

Pattern Template
- name (linear regression, logistic regression, classification, k-means, PCA, ...)
- architecture (neural network, regression, SVM, ...)
- type (supervised, unsupervised, ...)
- use (real prediction, category classification, clustering, time-series prediction, ...)
  - strength / weakness (aside from typical under- and over-fitting)
- characteristics
  - objective
  - learning algorithm
- mechanics
  - foreward func / prediction
  - loss function
  - cost function
  - backward func / learning
  - parameters
  - hyperparamters
- evaluation
  - bias / variance (aka under- / over-fitting)
  - decision boundary
  - learning curve
  - error rate
  - precision / recall
- reference / library support


## Linear Regression - univariate
**Architecture**
Polynomial in one variable (aka 'input feature')

**Type**
Supervised

**Use**
Real value prediction

<u>_Weakness_</u>
- inefficient for complex non-linear correlated data

**Characteristics**
<u>_Objective_</u>
Tune a polynomial that takes the input feature, and returns a predicted output value

<u>_Learning algorithm_</u>
Gradient descent

**Mechanics**
<u>_Forward function / Prediction_</u>
> $\large
\begin{align}
h_{\theta}(x) &= \theta^{T}x \\
\hat{y}^{(i)} &= h_{\theta}(x^{(i)})
\end{align}
$

where,
> $
\begin{align}
\theta &\text{ is the set of coefficients (parameters) of the polynomial } h_{\theta}(x) \\
i &\text{ is an index into the samples } x \\
\hat{y} &\text{ is the prediction from } h_{\theta}(x)
\end{align}
$

<u>_Loss function_</u>
> $\large
\begin{align}
L_{(i)}(y, \hat{y}) &= \hat{y}^{(i)} - y^{(i)}
\end{align}
$

<u>_Cost function_</u>
> $\large
\begin{align}
J(y, \hat{y}) &= \frac{1}{2m}\sum\limits_{i=1}^{m} (\hat{y}^{(i)} - y^{(i)})^2
\end{align}
$

<u>_Backward function / Learning / Parameter update_</u>
> $\large
\begin{align}
\theta_{j} &:= \theta_{j} - \alpha \frac{1}{m}\sum\limits_{i=1}^{m} (\hat{y}^{(i)} - y^{(i)})x_{j}^{(i)}
\end{align}
$

where,
> $
\begin{align}
m &\text{ is the number of samples,} \\
j &\text{ is an index into the polynomial terms} \\
\theta &\text{ is the set of parameters, i.e. polynomial coefficients} \\
\alpha &\text{ is the learning rate}
\end{align}
$


<u>_Parameters_</u>
- polynomial co-efficients, $\theta$

<u>_Hyper-parameters_</u>
- learning rate, alpha: $\alpha$
- polynomial degree / higher-order terms: $x_{j}^{p}$
- (for multivariate:  $x_{j}^{p}x_{k}^{q}$)
