# Cost Function

## What is it?

- Tells us how well the model is doing so we can try to improve it
> **Parameters of a model:** The variables you can adjust during training in order to improve the model
- Can also be referred to as **coefficients** or **weights**
- Examples in linear regression:

$$w: gradient$$
$$b: y-intercept$$
$$ f(x)=wx+b$$


To find the cost function, we want to measure how well our linear function with parameters w and b fits the training data. 

## Cost Function for Linear Regression

> **Errors (Residuals):** The difference between the estimated target $\hat{y}$ and and the true target y

![errors](errors.png)

1. Calculate the residuals for training point i
2. Square them:
    - **Removing the sign:** no cancellation of positive and negative residuals.
    - **Penalise Larger residuals:** emphasises large errors so the model finds a better fit.
    - **Easier to Differentiate:** Without squaring, the error function would not be differentiable at zero.
3. Sum all the residuals for all m training examples
4. Divide by m to find the **Mean Squared Error**
5. By convention, the cost function is divided by 2 to make calculations further down the line easier

### Squared Error Cost Function:

![reg_cost](regression_cost_function.png)
- This is the most commonly used cost function for **all regression** models
- We want to find values of the parameters (w,b) to reduce the cost function

### Aim

$$\min_{w,b} J(w,b)$$

## Simplify the Cost Function

Write the cost function with respect to each parameter by setting the other parameter to 0.
Linear Regression example:

With respect to w:
$$f_{w}(x) = wx$$

With respect to b:
$$f_{b}(x) = b$$

### Plot the Cost Function:


With respect to w:

$$
J_{w}(x) = \frac{1}{2m} \sum_{i=0}^{m} \bigl(wx_i - y_i\bigr)^2
$$

And the training set of data points are from a perfectly linear graph (0,0), (1,1)...

![cost_plot](cost_function_plotted.png)
We can see the cost function is a parabola with a minimum.  
What is the value of w that minimises $J(w)$?
We see in the above that it is w=1

Generally, what are the values of w and b that minimises J?

## Cost Function in 3D:
![3d](3d_cost_function.png)

> **Contour plot:** Shows the set of points on the 3D surface which are at the same height, so the same value for $J(w,b)$

To get a contour plot, you take horizontal 'slices' of the 3D surface to get all the points that are at the same height, and you return with an oval or ellipse shape.

![contour](contour_plot.png)

The top right graph is the contour plot.
The blue, orange and green points are all on the same contour, so they have the same value for J but with different w and b values.

The center of the contour graph represents the minimum of J

![cont_min](contour_min.png)

INTERACTIVE CONTOUR PLOT IN LAB