# Training Models

## Linear Regression

Equation of Linear Regression model prediction

$ y = \theta_{0} + \theta_{1}x_{1} + \theta_{2}x_{2} + ... + \theta_{n}x_{n} $

- y is the predicted value.
- n is the number of features
- $x_{i}$ is the $i^{th}$ feature value
- $\theta_{j}$ is the $j^{th}$ model parameter (including the bias term $\theta_{0}$ and the feature weights $\theta_{1},\theta_{2},...,\theta_{n}$)

This can be written much more concisely using a vectorized form:

$y = h_{\theta}(x) = \theta · x$ 

- $\theta$ is the model's parameter vector, containing the bias term $\theta_{0}$ and the feature weights $\theta_{1}$ to $\theta_{n}$.
- x is the instance's feature vector, containing $x_{0}$ to $x_{n}$, with $x_{0}$ always equal to 1.
- $\theta . x$ is the dot product of the vectors $\theta$ and $x$, which is of course equal to $\theta_{0}x_{0} + \theta_{1}x_{1} + \theta_{2}x_{2} + ... + \theta_{n}x_{n}$.
- $h_{\theta}$ is the hypothesis function, using the model parameters $\theta$.

To train a Linear Reegression model, we need to find the value of $\theta$ that minimizes the RMSE. In practice, it is simpler to minimize the mean square error (MSE) than the RMSE, and it leads to the same result.

Equation of MSE cost function for a Linear Regression model:

$MSE(X,h_{\theta}) = \frac{1}{m} \sum_{i=1}^{m} (\theta^{T}x^{(i)} - y^{(i)})^{2}$

## The Normal Equation

To find the value of $\theta$ that minimizes the cost function, there is a mathematical equation that gives the result directly called the Normal Equation:

$\theta = (X^{(T)}X)^{-1} X^{T}  y$

In this equation:

- $\theta$ is the value of $\theta$ that minimizes the cost function.
- y is the vector of target values containing $y^{1}$ to $y^{m}$.

In [3]:
import numpy as np

X = 2 * np.random.rand(100,1)
y = 4 + 3 * X + np.random.randn(100,1)

