# Cost Functions
A cost function, also known as a loss function or an error function, plays a vital role in machine learning, particularly in training neural networks. It measures the difference between the predicted output and the actual output of the model — essentially providing feedback on how 'far off' the model's predictions are from the actual outcomes.

In the context of a GPS navigation system, think of the cost function as a measure of 'distance' between the current location (current prediction) and the destination (correct outcome). The cost function helps the model navigate through the 'terrain' of parameter space, adjusting the parameters of the model in order to minimize this 'distance' and arrive at the correct solution.

## Mean Squared Error (MSE)
MSE is typically used for regression problems. It calculates the average of the squares of the differences between the predicted and actual values. Mathematically, it's represented as:

$$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y_i})^2$$

Where:
- $n$ is the total number of data points
- $y_i$ is the actual value
- $\hat{y_i}$ is the predicted value

In [ ]:
import numpy as np

def mean_squared_error(y_true, y_pred):
    """Calculate Mean Squared Error."""
    mse = np.mean((y_true - y_pred)**2)
    return mse

# Example usage
y_true = np.array([1.0, 1.5, 2.0, 2.5, 3.0])
y_pred = np.array([0.8, 1.5, 1.8, 2.6, 3.2])

print(f'Mean Squared Error: {mean_squared_error(y_true, y_pred)}')

## Cross-Entropy
Cross-Entropy is often used for binary or multi-class classification problems. It measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label.

For binary classification, it's calculated as:

$$Cross-Entropy = -\frac{1}{n} \sum_{i=1}^{n} [y_i log(\hat{y_i}) + (1 - y_i) log(1 - \hat{y_i})]$$

Where:
- $n$ is the total number of data points
- $y_i$ is the actual value
- $\hat{y_i}$ is the predicted value

In [ ]:
def binary_cross_entropy(y_true, y_pred):
    """Calculate Binary Cross-Entropy Loss."""
    bce = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
    return bce

# Example usage
y_true = np.array([0, 0, 1, 1, 0])
y_pred = np.array([0.1, 0.2, 0.9, 0.8, 0.1])

print(f'Binary Cross-Entropy Loss: {binary_cross_entropy(y_true, y_pred)}')

## Log Loss
Log Loss is essentially the same as cross-entropy for binary classification. It quantifies the accuracy of a classifier by penalising false classifications. It's calculated as:

$$Log Loss = -\frac{1}{n} \sum_{i=1}^{n} [y_i log(\hat{y_i}) + (1 - y_i) log(1 - \hat{y_i})]$$

Where:
- $n$ is the total number of data points
- $y_i$ is the actual value
- $\hat{y_i}$ is the predicted value

## Choosing the right cost function
The choice of cost function can depend on the problem at hand:

- **Regression Problems:** Mean Squared Error (MSE) is a common choice as it penalises larger errors more due to squaring. Other options include Mean Absolute Error (MAE) and Huber loss.

- **Binary Classification:** Binary Cross-Entropy/Log Loss is often used. It penalises predictions that are confident and wrong.

- **Multi-class Classification:** Multi-class Cross-Entropy is a common choice. Similar to binary cross-entropy, but generalised to multiple classes.

It's worth noting that the choice of loss function is not always straightforward and it can be beneficial to experiment with different loss functions for a specific task.

## Real-world examples of cost function usage
Different cost functions can be appropriate for different scenarios. Here are a few examples:

- **Mean Squared Error (MSE):** A real estate company trying to predict house prices based on features like location, size, and condition of the house. In this case, it's a regression problem and using MSE as the cost function would be suitable.

- **Cross-Entropy:** A healthcare company building a model to predict whether a patient has a disease (yes/no) based on their symptoms. This is a binary classification problem and using binary cross-entropy as the cost function would be appropriate.

- **Log Loss:** A tech company building a spam email detector. The model classifies emails into 'spam' or 'not spam' categories. This is a binary classification problem and using log loss as the cost function could work well.