<a href="https://colab.research.google.com/github/jay-thakur/DataScienceTutorial/blob/main/Tensorflow/6_LossFunctions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
import tensorflow as tf
print("TensorFlow version:", tf.__version__)

TensorFlow version: 2.7.0


# Loss Functions
Given an input vector, a loss function is a measure of how bad a particular model performs in predicting a desired output  quantity (regression) or correctly labeling the input vector (classification).


## Mean Squared Error (MSE).

Mean Squared Error is the most commonly used regression loss function and it is defined as the mean of the squared distances between the desired and the predicted output(s).

$$\large MSE=\frac{1}{N}\sum_{i=1}^{N} {(t_i-y_i)}^2$$

where $N$ is the total number of samples , $t_i$ is the desired output for sample $i$ and $y_i$ is the actual output for sample $i$

In [2]:
y_true = tf.Variable(np.random.randint(0, 2, size=(2, 3)))
y_pred = tf.Variable(np.random.random(size=(2, 3)))

y_true = tf.cast(y_true, y_pred.dtype)

def mean_squared_error(y_true, y_pred):
  return tf.reduce_mean(tf.math.square(y_pred - y_true))

print("Mean Squared Error :: ", mean_squared_error(y_true, y_pred).numpy())

# Mean Squared Error using Keras API
mse = tf.keras.losses.MeanSquaredError()
print("Mean Squared Error using Keras API ::", mse(y_true, y_pred).numpy())

Mean Squared Error ::  0.2618975787629521
Mean Squared Error using Keras API :: 0.26189759373664856


## Mean Absolute Error (MAE).

Mean Absolute Error is another common function used for regression models and it is defined as as the mean of the absolute differences between the desired and the predicted output(s).

$$\large MAE=\frac{1}{N}\sum_{i=1}^{N} {|t_i-y_i|}$$

where $N$ is the total number of samples , $t_i$ is the desired output for sample $i$ and $y_i$ is the actual output for sample $i$

In [3]:
y_true = tf.Variable(np.random.randint(0, 2, size=(2, 3)))
y_pred = tf.Variable(np.random.random(size=(2, 3)))

y_true = tf.cast(y_true, y_pred.dtype)

def mean_absolute_error(y_true, y_pred):
  return tf.reduce_mean(tf.math.abs(y_pred - y_true))

print("Mean Absolute Error :: ", mean_absolute_error(y_true, y_pred).numpy())

# Mean Absolute Error using Keras API
mae = tf.keras.losses.MeanAbsoluteError()
print("Mean Absolute Error using Keras API ::", mae(y_true, y_pred).numpy())

Mean Absolute Error ::  0.5366752401709879
Mean Absolute Error using Keras API :: 0.536675214767456


## Hinge (Multiclass Support Vector Machine) Loss

The hinge loss function is used for classification and it is based on the concept of maximum-margin. The hinge loss for sample number $\large s$ is formulated as:


$$\large L_s=\sum_{j \neq s_t}^{C} max(0,y_j-y_{s_t}+\Delta)$$

where $\large s$ is the sample number, $\large C$ is the number of classes, and  $\large s_t$ is the index of the true class for sample number $\large s$, and $\Delta$ is a constant.


The total loss across all the samples can be calculated as:

$$\large L=\frac {1}{N} \sum_{s} L_s$$


where  $N$ is the number of the samples, $L$ is the total loss over all the samples and $s$ is the sample number




In [4]:
y_true = tf.Variable([[0., 1.], [0., 0.]])
y_pred = tf.Variable([[0.6, 0.4], [0.4, 0.6]])

hinge_loss = tf.keras.losses.Hinge()

print("Hinge Loss :: ", hinge_loss(y_true, y_pred).numpy())

Hinge Loss ::  1.3


## Cross Entropy Loss

The cross entropy loss is used for classification and it uses a softmax function to calculate the loss. 

### Softmax Function

The softmax function gives a probabilistic interpretation to the output values and it is formulated as:

$$\large S(i)=\frac{e^{y_i}}{\sum_{j}^{C}e^{y_j}}$$

where $\large S(i)$ is the softmax value corresponding to the output $\large y_i$, and $ C$ is the number of classes. This function interprets the outputs as unnormalized log probabilities of each class. Notice that the denominator of the above equation normalizes the probabilities so the total sums to 1. 

In other words the softmax function takes a vector of floating point numbers and proportionally compresses each number between zero and one such that the total adds up to 1.

Using the softmax function, the cross entropy loss for sample $\large s$ can be calculated as:

$$\large L_s=-log(\frac{e^{y_{s_t}}}{\sum_{j=1}^{C} e^{y_j}})$$


where $\large s$ is the sample number, $ C$ is the number of classes, and  $\large s_t$ is the index of the true class for sample number $\large s$.


The above equation is really a simplified version of a discrete cross entropy between two distributions.

Let's imagine that we have a true discrete distribution $p$ and an estimated discrete distribution $q$. The cross entropy between these two distribution is defined as:

$$\large H(p,q)= - \sum_{x}p(x)log(q(x))$$

Notice that in a multi-class classification problem the true probability distribution has all zeros except for the correct class, $i$, which has the value of 1:

$$\large p=[0,0,..., 1, ... 0]$$

If the above discrete distribution $p$ is substituted into the general cross entropy equation it will result in the simplified cross entropy loss 

$$\large L_s=-log(\frac{e^{y_{s_t}}}{\sum_{j=1}^{C} e^{y_j}})$$

where $\large s_t$ is the index of the correct class.

Notice that to calculate the overall loss we still need to average the loss over all the samples.
$$\large L=\frac {1}{N} \sum_{s} L_s$$


where $ N$  is the number of samples and $L_s$ is the cross entropy loss for sample $s$

**Note:** There is no "softmax loss". The correct terminology is "cross-entropy loss". The "cross entropy loss" uses the "softmax" function to calculate the loss.  

<br>
Cross Entropy can be applied in both Binary & Multicalss tasks.

In [5]:
y_true = tf.Variable([0, 1, 0, 0])
y_pred = tf.Variable([-18.6, 0.51, 2.94, -12.8])

bce = tf.keras.losses.BinaryCrossentropy(from_logits=True)

print("Binary Cross Entropy Loss : ", bce(y_true, y_pred).numpy())

Binary Cross Entropy Loss :  0.865458


The `from_logits=True` attribute inform the loss function that the output values generated by the model are not normalized, a.k.a. logits. In other words, the softmax function has not been applied on them to produce a probability distribution. [learn more](https://datascience.stackexchange.com/questions/73093/what-does-from-logits-true-do-in-sparsecategoricalcrossentropy-loss-function)

There are many cross entropy loss functions in tensorflow. so **How to choose cross entropy loss in tensorflow?**.

[please follow this link to understand it.](https://stackoverflow.com/questions/47034888/how-to-choose-cross-entropy-loss-in-tensorflow)

# References


https://www.tensorflow.org/api_docs/python/tf/math/reduce_mean

https://www.tensorflow.org/api_docs/python/tf/losses

https://www.tensorflow.org/api_docs/python/tf/nn

https://datascience.stackexchange.com/questions/73093/what-does-from-logits-true-do-in-sparsecategoricalcrossentropy-loss-function

https://stackoverflow.com/questions/47034888/how-to-choose-cross-entropy-loss-in-tensorflow


https://github.com/farhadkamangar/CSE5368 

https://cognitiveclass.ai/courses/course-v1:BigDataUniversity+ML0120EN+v2
