# Loss/Cost Function and types

Deep Learning (Artifical Neural Networks) is divided into two categories: 

1. Regression (Output will be a continous value)

2. Classification (Output will be a categorical value)

Different loss/cost functions are used for both Regression and Classification. Before proceeding, its imporant to understand the difference between Loss and Cost functions.

## Difference between Loss and Cost Function

For example, lets consider a neural network where we need to train with 100 records. But everytime, we pass one by one record then calculate the loss function and repeat. This task is completely cumbersome. Hence, to avoid this we can pass 10 records in a single batch likwise within 10 times we can train our Neural network completely. Here, we will use the Cost function. 

In simple words, 

| Loss Function | Cost Function |
| :---: | :---: |
| It will be applicable only when record in given as input | It will be applicable when a batch of records are passed together |

## Types of Loss/Cost Functions w.r.t Regression

Since regression provides output in the form of Continous, the loss or cost functions used should support the same. There are three types which are mostly used. They are: 

1. Mean Squared Error (MSE)

2. Mean Absolute Error (MAE)
3. Huber Loss

### 1. Mean Squared Error (MSE)

$\ Loss Function = 1/2 * (\ y - \hat{y})^2 $ 

The equation of loss function is a Quadratic Equation. Plotting the quadratic equation gives us a Gradient descent (U - shaped curve)

$\displaystyle Cost Function = 1/2 * \sum_{i=1}^n (\ y - \hat{y})^2 $

**Advantages:**
- Differentiable
- It has only one local/global minima
- It converges faster

**Disadvantages:**
- Not robust to outliers (Best fit line gets shifted because of outliers, penalizing the error by squaring the outlier value)

### 2. Mean Absolute Error (MAE)

$\ Loss Function = 1/2 * (y - \hat{y}) $

$\displaystyle Cost Function = 1/2 * \sum_{i=1}^n (\ y - \hat{y}) $

**Advantages:**
- Robust to outliers (Here we are not squaring the outlier value)

**Disadvantages:**
- Time Consuming

### 3. Huber Loss

Combination of Mean Squared Error (MSE) and Mean Absolute Error (MAE)

We will use the help of hyperparameter $\displaystyle \delta$. The Huber loss function, revolves around two conditions:

**Loss Function if there are no outliers**:

$\displaystyle Loss Function = 1/2 * (y - \hat{y})^2 $

if $\ |y- \hat{y}| \leq \delta $


**Loss Function if there are outliers**:

$\displaystyle Loss Function = \delta |y - \hat{y}| - 1/2 * \delta ^2 $



## Types of Loss/Cost Function w.r.t Classification

Since classification provides output in the form of categorical variable, the loss or cost functions used should support the same. There are three types which are mostly used. They are: 

1. Binary Cross Entropy (for Binary Classification)

2. Categorical Cross Entropy (for Multiclass Classification)

### 1. Binary Cross Entropy a.k.a Log loss

Since, its a Binary classification we need to use the Sigmoid Activation Function at the Output layer. This only works when the output has only two classes: 0 and 1.

Below image is the loss function:

![Image](https://arize.com/wp-content/uploads/2022/11/log-loss-1.png)

### 2. Categorical Cross Entropy

This entropy is calculated for the multi classification problems where the output classes are more than 2. Since, its a multi-class classification, Softmax activation function should be used.

$\ Loss(x_i, y_i) = - \sum_{j=1}^c y_{ij} * ln (\hat {y_{ij}}) $

where

- $\ y_i = [y_{i1}, y_{i2}, y_{i3}, ..., y_{ic}]$

- $\ y_{ij} $ = {1 if the element is in class, 0 otherwise}
- $\displaystyle \hat{y_{ij}} $ is the Softmax Activation Function (applied on O/P Layer since its a Multi-class classification)