# Cost Functions

In this lab, we'll be exploring cost functions, which are a pivotal part of how machine learning algorithms, particularly neural networks, learn from data. Just like a GPS navigation system provides feedback to guide a driver towards their destination, cost functions help guide the machine learning model towards the best solution in the vast 'terrain' of parameter space.

## What is a Cost Function?

A cost function (also referred to as 'loss function' or 'objective function') is a measure of how 'off' a machine learning model's predictions are from the actual outcomes. The objective of the training process is to minimize this difference. This is achieved through an iterative process of adjusting the model's internal parameters, guided by the gradient of the cost function.

## Types of Cost Functions

There are various types of cost functions used in machine learning, each with their specific uses and characteristics. We'll go over some of the most commonly used cost functions: Mean Squared Error (MSE), Cross-Entropy, and Log Loss.

### Mean Squared Error (MSE)

MSE is commonly used in regression problems. It calculates the square of the difference between the actual and predicted values, giving a quadratic penalty to larger errors.

Mathematically, for N total instances and where y is the actual value and y_hat is the predicted value, it is defined as:

$$MSE = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2$$

In [1]:
import numpy as np

def mse(y, y_hat):
    """
    This function computes the mean squared error (MSE).

    Args:
    y : numpy array, true values
    y_hat : numpy array, predicted values
    
    Returns:
    float, Mean Squared Error
    """
    return ((y - y_hat) ** 2).mean()

### Cross-Entropy

Cross-entropy is primarily used for binary classification problems, though it can be used for multi-class classification problems as well. It measures the dissimilarity between the true label distribution and the predicted label distribution.

For binary classification, where y is the actual class (0 or 1) and p(y) is the predicted probability of the instance being in class 1, the formula is:

$$Cross-Entropy = -\frac{1}{N} \sum_{i=1}^{N} [y_i\log(p(y_i)) + (1-y_i)\log(1-p(y_i))]$$

In [2]:
def cross_entropy(y, p):
    """
    This function computes the cross-entropy.

    Args:
    y : numpy array, true values
    p : numpy array, predicted probabilities
    
    Returns:
    float, Cross-Entropy
    """
    return -(y * np.log(p) + (1 - y) * np.log(1 - p)).mean()

### Log Loss

Log loss is a slight variation of the cross-entropy function and is also used for classification problems. Log loss penalizes both types of errors (false positives and false negatives), but it gives much more weight to outputs that are far off from the actual class.

For binary classification, it is identical to the formula for cross-entropy.

## Choosing the Right Cost Function

The choice of cost function can depend on the problem at hand. Here are some guidelines:

- **Regression problems:** Mean Squared Error or Mean Absolute Error are commonly used.
- **Binary classification problems:** Cross-Entropy or Log Loss are often used.
- **Multi-class classification problems:** Multi-class Cross-Entropy is a good option.

However, these are not hard and fast rules. The best cost function also depends on the specific data and task. For example, if outliers are a concern in a regression problem, you may opt for Mean Absolute Error (which is less sensitive to outliers) instead of Mean Squared Error.

## Real-World Examples

Choosing the right cost function can have a significant impact on the performance of the model. Here are some examples of scenarios where different cost functions would be appropriate:

- **Predicting house prices (a regression problem):** Here, we might use the Mean Squared Error. If the model makes a prediction that's off by $100,000, the squared penalty would incentivize the model to correct large errors.

- **Email spam detection (a binary classification problem):** Cross-Entropy or Log Loss would be suitable here, as these functions work well with problems where the output can be one of two possible classes (spam or not spam).

- **Digit recognition (a multi-class classification problem):** Multi-class Cross-Entropy could be used here, as we are dealing with multiple classes (ten digits from 0 to 9).

## Conclusion

Cost functions play an essential role in the learning process of neural networks and other machine learning models. Understanding different cost functions and their implications can help make informed decisions when designing your machine learning models. It's crucial to match the cost function with the problem type, characteristics of the data, and the specific goals of your model.