### The Problem of Overfitting

* Underfitting or "high bias": the model is too simple (e.g use linear model to represent a quadratic model)
* Overfitting or "high variance": the model is too complex (e.g use fifth order polynomial model to represent a quadratic model)


#### Addressing overfitting

Options:
1. Reduce number of features.
    * Manulally select wich features to keep.
    * Model selection algorithm (later in course)
2. Regularization.
    * Keep all features, but reduce magnitude/values of parameters $\theta_j$.
    * Works well when we have a lot of features, each of wich contributes a bit to predicting $y$.

### Cost Function

#### Intuition
The idea is to choose (or find) some parameters and make it small, so they contribute less to the hypothesis approximation.

e.g 
* correct hypohtesis : $h_\theta(x) = \theta_0 + \theta_1x + \theta_2x^2$
* current hypothesis : $h_\theta(x) = \theta_0 + \theta_1x + \theta_2x^2 \theta_3x^3 + \theta_4x^4$

We want to remove the influence of $\theta_3$ and $\theta_4$ terms and so add an addittional term to error function to "force" $\theta_3$ and $\theta_4$ to be small

$ min \frac{1}{2m}\sum_{i=0}^{m}(h_\theta(x^{(i)}) - y^{(i)})^2 + 1000\theta_3^2 + 1000\theta_4^2$

To get a low error  $\theta_3$ and $\theta_4$ will be about zero

#### Regularization
Small values for parameters $\theta_0, \theta_1, \dots, \theta_n$
* Simpler hypothesis
* Less prone to overfitting
the regularuzation term gives a smoother shape to the hypothesis


$J(\theta) = \frac{1}{2m}\sum_{i=0}^{m}(h_\theta(x^{(i)}) - y^{(i)})^2 + \lambda\sum_{i=0}^{n}\theta_j^2$

$\lambda\sum_{i=0}^{n}\theta_j^2$ : Regualarization parameter

If the $\lambda$ parameter is too large (e.g $\lambda = 10^{10}$) the train goes to underfitting