# Cost function

The cost function is the most commonly used function for [[regression problem]]. It is designed to find the error between the hypothesis and the provided data.

**aka:** "Squared error function", or "Mean squared error".

## Hypothesis

We could use the [linear hypothesis](linear_hypothesis.ipynb#Equation).

## Parameters

$\theta_0$ - y intercept at $x = 0$  
$\theta_1$ - slope

## Cost function

The cost function can be defined as:

$$
J(\theta_0, \theta_1) = \frac{1}{2m}\sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2
$$

In the cost function, $h_{\theta}(x^{(i)})$ is the same as $\theta_0 + \theta_1x^{(i)}$ (note this is the same as the hypothesis, and it is effected by changing $\theta$'s).

It is equal to $\frac{1}{2}\bar{x}$ where $\bar{x}$ is the mean of the squares of $h_\theta(x^{(i)}) - y^{(i)}$. This equals the difference between predicted and actual value. It is halved for convenience of computing [[gradient descent]] (derivative of the square will cancel out the half).

## Goal

We want to choose $\theta_0$ and $\theta_1$ such that $h_{\theta}(x)$ is close to $y$ for our training examples $(x, y)$ (minimizing error).

$\substack{minimize\\\theta_0\theta_1} J(\theta_0, \theta_1)$

$\substack{minimize\\\theta_0\theta_1}$ means "find me the values of $\theta_0$ and $\theta_1$ that minimize the equation".

## Examples

### With zero-value for $\theta_0$

Simplify by setting $\theta_0 = 0$ (same as removing $\theta_0$ from the equations).

Hypothesis: $h_{\theta}(x) = \theta_{1}x$
Parameters: $\theta_1$
Cost function: $J(\theta_1) = \frac{1}{2m}\sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^{2}$
Goal: $\substack{minimize\\\theta_1} J(\theta_1)$

- for fixed $\theta_1$, hypothesis $h_\theta(x)$ is a function of $x$
- cost function $J(\theta_1)$ is a function of $\theta_1$

$$
J(1) = \frac{1}{2\times3}((1-1)^2 + (2-2)^2 + (3-3)^2) = \frac{1}{2\times3}(0 + 0 + 0) = 0
$$

$$
J(0.5) = \frac{1}{2\times3}((0.5-1)^2 + (1-2)^2 + (1.5-3)^2)= \frac{1}{2\times3}(0.25 + 1 + 2.25) = \frac{3.5}{6} \approx 0.583
$$

$$
J(0) = \frac{1}{2\times3}((0-1)^2 + (0-2)^2 + (0-3)^2)= \frac{1}{2\times3}(1 + 4 + 9) = \frac{14}{6} \approx 0.2.333
$$

![](../static/cost_function_example.png)

From this we can see that the global minimum is at $\theta_1 = 1$.

### With non-zero $\theta_0$

In this example we will not set $\theta_0$ to equal $0$.

Hypothesis: $h_\theta(x) = \theta_0 + \theta_1x$
Parameters: $\theta_0$, $\theta_1$
Cost function: $J(\theta_0, \theta_1) = \frac{1}{2m}\sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^{2}$
Goal: $\substack{minimize\\\theta_0\theta_1} J(\theta_0, \theta_1)$

Since there are two parameters, we need a new way to graph $J(\theta_0, \theta_1)$. Rather than use a 3D surface plot, it can be represented in two dimensions using a contour plot.

> This will always be a "convex" function, in that there will always be a single low point. Local minimum == global minimum.

![](../static/3d_surface_vs_contour_plots.png)