# Cost function

The cost function tests a hypothesis with a set of parameters against the actual target value to determine the "cost" or error between the two. This can be used with [gradient descent](gradient_descent.ipynb) to gradually find the best parameters.

## Contents

- [Quick reference](#Quick-reference) - quick reference for equations
- [Equation](#Equation) - the cost function
- [Vectorized](#Vectorized) - the vectorized equation
- [Octave](#Octave) - octave implementation
- [Python](#Python) - python implementation

## Quick reference

Basic equation:

$$
J(\theta) = \frac{1}{2m}\sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2
$$

Vectorized equation (with [linear hypothesis](linear_hypothesis.ipynb)):

$$
J(\theta) = \frac{1}{2m} (X\theta - y)^T(X\theta - y)
$$

## Equation

The difference between the hypothesis and the actual value is calculated for each item in the training set, squared, and summed. Finally, it is divided by the number of items in the training set to get the mean error. It is halved for convenience of calculating the derivative in gradient descent:

$$
J(\theta) = \frac{1}{2m}\sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2
$$

## Vectorized

To run cost function in a vectorized way you first need to prep a matrix $X$ with a column for each training set. You will also need a vector $y$ with all of the output variables.

First, choose your hypothesis. For example, let's use the [linear hypothesis](linear_hypothesis.ipynb):

$$
X\theta
$$

Now calculate the error:

$$
X\theta - y
$$

Now we will multiply the transpose error with itself in order to square and sum the values:

$$
J(\theta) = \frac{1}{2m} (X\theta - y)^T(X\theta - y)
$$

## Octave

> View the code for [costFunction](https://github.com/liamross/machine-learning-notes/blob/master/octave_examples/costFunction.m) with comments here.

```octave
function J = costFunction (X, y, theta)

    m = length(y);
    hypotheses = X * theta;
    err = hypotheses - y;
    J = (1 / (2 * m)) * err' * err;

end
```

## Python

In [1]:
import numpy as np

In [2]:
def costFunction(X, y, theta):
    m = len(y)
    hypothesis = X @ theta
    err = hypothesis - y
    return ((1 / (2 * m)) * (np.transpose(err) @ err).item((0, 0)))

Let's run it against some fake housing data:

In [3]:
X = np.array([
    [1, 3000],
    [1, 4000],
    [1, 5000],
])

y = np.array([
    [1550000],
    [2050000],
    [2550000],
])

bad_theta = np.array([
    [0],
    [0],
])

good_theta = np.array([
    [50000],
    [500],
])

print("With bad theta values  - cost:", costFunction(X, y, bad_theta))

print("With good theta values - cost:", costFunction(X, y, good_theta))

With bad theta values  - cost: 2184583333333.3333
With good theta values - cost: 0.0
