# Linear regression

<a href="https://nbviewer.jupyter.org/github/hongjiaherng/ML-Collections/blob/main/just4funml/notes/note_linear_regression.ipynb" 
   target="_parent">
   <img align="left" 
      src="https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg" 
      width="109" height="20">
</a>

## 1. Model hypothesis
- In this part, the hypothesis is based on 1 training example 

Before everything, define:
<br><br>
$
\mathbf{x} = 
\begin{bmatrix}
x_0 \\
x_1 \\
\vdots \\
x_n \\
\end{bmatrix}
,
\theta =
\begin{bmatrix}
\theta_0 \\
\theta_1 \\
\vdots \\
\theta_n \\
\end{bmatrix}
,
y \in \mathbb{R}
$

### Hypothesis
$
h_\theta(\mathbf{x}) = \theta^\top\mathbf{x} \ \ \ \ \ ,where \  h_\theta(\mathbf{x}) \in \mathbb{R}
$

## 2. Cost function

$
J(\theta) = \dfrac{1}{2m} \sum_\limits{i=1}^{m} \left( h_\theta \left( \mathbf{x}^{(i)} \right) - y^{(i)} \right)^{2}
$

- This cost is Mean Squared Error

### Minimizing $J(\theta)$
- The following gradient vector is what we need to compute

$
\nabla_{\theta} J(\theta) = 
\begin{bmatrix}
\dfrac{\partial}{\partial\theta_0} J(\theta) \\
\dfrac{\partial}{\partial\theta_1} J(\theta) \\
\vdots \\
\dfrac{\partial}{\partial\theta_n} J(\theta) \\
\end{bmatrix}
$

### Compute derivative of $J(\theta)$

$
J(\theta) = \dfrac{1}{2m} \sum_\limits{i=1}^{m} \left( h_\theta \left( \mathbf{x}^{(i)} \right) - y^{(i)} \right)^{2}
$

$
\begin{align*}
\dfrac{\partial}{\partial\theta_j} J(\theta) 
&=
\dfrac{\partial}{\partial\theta_j} \dfrac{1}{2m} \sum_\limits{i=1}^{m} \left( h_\theta \left( \mathbf{x}^{(i)} \right) - y^{(i)} \right)^{2} 
\\ &=
\dfrac{1}{2m} \sum_\limits{i=1}^{m} \dfrac{\partial}{\partial\theta_j} \left( h_\theta \left( \mathbf{x}^{(i)} \right) - y^{(i)} \right)^{2}
\\ &=
\dfrac{1}{2m} \sum_\limits{i=1}^{m} (2) \left( h_\theta \left( \mathbf{x}^{(i)} \right) - y^{(i)} \right) \dfrac{\partial}{\partial\theta_j} \left( h_\theta \left( \mathbf{x}^{(i)} \right) - y^{(i)} \right) 
\\ &=
\dfrac{1}{m} \sum_\limits{i=1}^{m} \left( h_\theta \left( \mathbf{x}^{(i)} \right) - y^{(i)} \right) \left( \dfrac{\partial}{\partial\theta_j} \theta^\top\mathbf{x}{(i)} - 0 \right)
\\ &=
\dfrac{1}{m} \sum_\limits{i=1}^{m} \left( h_\theta \left( \mathbf{x}^{(i)} \right) - y^{(i)} \right) \mathbf{x}^{(i)}_j
\end{align*}
$

## 3. Linear regression variant

a) **Linear regression** (Unregularized) <br>
b) **Ridge regression** ($l_2$ regularization) <br>
c) **Lasso regression** ($l_1$ regularization) <br>
d) **ElasticNet** (Combine $l_2$ and $l_1$ regularization) <br>
e) **Polynomial regression** (Expand the number of features with polynomial features to fit linearly inseparateble data) <br>
f) **Normal equation** (closed-form solution, directly get the $\theta$ value) <br>

### a) Linear regression

$
J(\theta) = \dfrac{1}{2m} \sum_\limits{i=1}^{m} \left( h_\theta \left( \mathbf{x}^{(i)} \right) - y^{(i)} \right)^{2}
$

$\dfrac{\partial}{\partial\theta_j} J(\theta)  = \dfrac{1}{m} \sum_\limits{i=1}^{m} \left( h_\theta \left( \mathbf{x}^{(i)} \right) - y^{(i)} \right) \mathbf{x}^{(i)}_j$

### b) Ridge regression

$
J(\theta) = \dfrac{1}{2m} \sum_\limits{i=1}^{m} \left( h_\theta \left( \mathbf{x}^{(i)} \right) - y^{(i)} \right)^{2} + \dfrac{\lambda}{2} \sum_\limits{j=1}^{n} \theta_j^{2}
$

$\dfrac{\partial}{\partial\theta_j} J(\theta)  = \dfrac{1}{m} \sum_\limits{i=1}^{m} \left( h_\theta \left( \mathbf{x}^{(i)} \right) - y^{(i)} \right) \mathbf{x}^{(i)}_j + \lambda \theta_j  \ \ \ \ \ \ \ \ \ where\ \ j \in \lbrace 1,2...n\rbrace$

### c) Lasso regression

$
J(\theta) = \dfrac{1}{2m} \sum_\limits{i=1}^{m} \left( h_\theta \left( \mathbf{x}^{(i)} \right) - y^{(i)} \right)^{2} + \lambda \sum_\limits{j=1}^{n} |\theta_j|
$

$\dfrac{\partial}{\partial\theta_j} J(\theta)  = \dfrac{1}{m} \sum_\limits{i=1}^{m} \left( h_\theta \left( \mathbf{x}^{(i)} \right) - y^{(i)} \right) \mathbf{x}^{(i)}_j + \lambda sign(\theta_j)  \ \ \ \ \ \ \ \ \ where\ \ j \in \lbrace 1,2...n\rbrace$

### d) ElasticNet
- r => mix ratio

$
J(\theta) = \dfrac{1}{2m} \sum_\limits{i=1}^{m} \left( h_\theta \left( \mathbf{x}^{(i)} \right) - y^{(i)} \right)^{2} + r \lambda \sum_\limits{j=1}^{n} |\theta_j| + \dfrac{1 - r}{2} \lambda \sum_\limits{j=1}^{n} \theta_j^{2}
$

$\dfrac{\partial}{\partial\theta_j} J(\theta)  = \dfrac{1}{m} \sum_\limits{i=1}^{m} \left( h_\theta \left( \mathbf{x}^{(i)} \right) - y^{(i)} \right) \mathbf{x}^{(i)}_j + r \lambda sign(\theta_j) + (1 - r) \lambda \theta_j  \ \ \ \ \ \ \ \ \ where\ \ j \in \lbrace 1,2...n\rbrace$

### e) Polynomial regression

### f) Normal equaltion