https://neptune.ai/blog/keras-loss-functions

In [29]:
import tensorflow as tf
import tensorflow_addons as tfa
import numpy as np
from tensorflow.keras.layers import Dense, Activation, InputLayer
from tensorflow.keras import Sequential
from tensorflow.keras.losses import SparseCategoricalCrossentropy

In [4]:
model = Sequential()
model.add(InputLayer(10,))
model.add(Dense(units=64, kernel_initializer='uniform'))
model.add(Activation('softmax'))

In [5]:
loss_fn = SparseCategoricalCrossentropy(from_logits=True)
model.compile(loss=loss_fn, optimizer='adam')

In [6]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 64)                704       
_________________________________________________________________
activation (Activation)      (None, 64)                0         
Total params: 704
Trainable params: 704
Non-trainable params: 0
_________________________________________________________________


## Binary Classification

### Binary Cross Entropy

The Binary Cross entropy will calculate the cross-entropy loss between the predicted classes and the true classes. By default, the `sum_over_batch_size` reduction is used. This means that the loss will return the average of the per-sample losses in the batch.

In [7]:
y_true = [[0., 1.0], [0.2, 0.8],[0.3, 0.7],[0.4, 0.6]]
y_pred = [[0.6, 0.4], [0.4, 0.6],[0.6, 0.4],[0.8, 0.2]]
bce = tf.keras.losses.BinaryCrossentropy(reduction='sum_over_batch_size')
print(bce(y_true, y_pred).numpy())

0.83944494


The sum reduction means that the loss function will return the sum of the per-sample losses in the batch.

In [8]:
y_true = [[0., 1.0], [0.2, 0.8],[0.3, 0.7],[0.4, 0.6]]
y_pred = [[0.6, 0.4], [0.4, 0.6],[0.6, 0.4],[0.8, 0.2]]
bce = tf.keras.losses.BinaryCrossentropy(reduction='sum')
print(bce(y_true, y_pred).numpy())

3.3577797


Using the reduction as none returns the full array of the per-sample losses.

In [9]:
y_true = [[0., 1.0], [0.2, 0.8],[0.3, 0.7],[0.4, 0.6]]
y_pred = [[0.6, 0.4], [0.4, 0.6],[0.6, 0.4],[0.8, 0.2]]
bce = tf.keras.losses.BinaryCrossentropy(reduction='none')
print(bce(y_true, y_pred).numpy())

[0.9162905  0.5919184  0.79465103 1.0549197 ]


## Multiclass classification

### Categorical Crossentropy

The CategoricalCrossentropy also computes the cross-entropy loss between the true classes and predicted classes. The labels are given in an one-hot format. 

In [10]:
y_true = [[0.1, 1.0, 0.8], [0.1, 0.9, 0.1],[0.2, 0.7, 0.1],[0.3, 0.1, 0.6]]
y_pred = [[0.6, 0.2, 0.2], [0.2, 0.6, 0.2],[0.7, 0.1, 0.2],[0.8, 0.1, 0.1]]
cce = tf.keras.losses.CategoricalCrossentropy()
print(cce(y_true, y_pred).numpy())

1.8131356


In [11]:
y_true = [0, 1, 2]
y_pred = [[0.95, 0.05, 0], [0.1, 0.8, 0.1],[0.1, 0.8, 0.1]]
scce = tf.keras.losses.SparseCategoricalCrossentropy()
print(scce(y_true, y_pred).numpy())

0.85900736


### The Poison Loss

You can also use the Poisson class to compute the poison loss. It’s a great choice if your dataset comes from a Poisson distribution for example the number of calls a call center receives per hour. 

In [12]:
y_true = [[0.1, 1.0, 0.8], [0.1, 0.9, 0.1],[0.2, 0.7, 0.1],[0.3, 0.1, 0.6]]
y_pred = [[0.6, 0.2, 0.2], [0.2, 0.6, 0.2],[0.7, 0.1, 0.2],[0.8, 0.1, 0.1]]

p = tf.keras.losses.Poisson()
print(p(y_true, y_pred).numpy())

0.9377117


### Kullback-Leibler Divergence Loss

The relative entropy can be computed using the KLDivergence class. According to the official docs at PyTorch:

***KL divergence** is a useful distance measure for continuous distributions and is often useful when performing direct regression over the space of (discretely sampled) continuous output distributions.*

In [13]:
y_true = [[0.1, 1.0, 0.8], [0.1, 0.9, 0.1],[0.2, 0.7, 0.1],[0.3, 0.1, 0.6]]
y_pred = [[0.6, 0.2, 0.2], [0.2, 0.6, 0.2],[0.7, 0.1, 0.2],[0.8, 0.1, 0.1]]
kl = tf.keras.losses.KLDivergence()
kl(y_true, y_pred).numpy()

1.1471658

## Object Detection

### The Focal Loss

In classification problems involving imbalanced data and object detection problems, you can use the Focal Loss. The loss introduces an adjustment to the cross-entropy criterion. 

It is done by altering its shape in a way that the loss allocated to well-classified examples is down-weighted. This ensures that the model is able to learn equally from minority and majority classes.

The cross-entropy loss is scaled by scaling the factors decaying at zero as the confidence in the correct class increases. The factor of scaling down weights the contribution of unchallenging samples at training time and focuses on the challenging ones.

In [14]:
y_true = [[0.97], [0.91], [0.03]]
y_pred = [[1.0], [1.0], [0.0]]
sfc = tfa.losses.SigmoidFocalCrossEntropy()
print(sfc(y_true, y_pred).numpy())

[0.00010971 0.00329749 0.00030611]


### Generalized Intersection over Union

The Generalized Intersection over Union loss from the TensorFlow add on can also be used. The Intersection over Union (IoU) is a very common metric in object detection problems. IoU is however not very efficient in problems involving non-overlapping bounding boxes. 

The Generalized Intersection over Union was introduced to address this challenge that IoU is facing. It ensures that generalization is achieved by maintaining the scale-invariant property of IoU, encoding the shape properties of the compared objects into the region property, and making sure that there is a strong correlation with IoU in the event of overlapping objects. 

In [17]:
gl = tfa.losses.GIoULoss()
boxes1 = [[4.0, 3.0, 7.0, 5.0], [5.0, 6.0, 10.0, 7.0]]
boxes2 = [[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0]]

loss = gl(boxes1, boxes2)
print(loss)

tf.Tensor(1.5041667, shape=(), dtype=float32)


## Regression

### Mean Squared Error

The MeanSquaredError class can be used to compute the mean square of errors between the predictions and the true values.

Use Mean Squared Error when you desire to have large errors penalized more than smaller ones.

In [21]:
y_true = [12.0, 20.0, 29.0, 60.0]
y_pred = [14.0, 18.0, 27.0, 55.0]

mse = tf.keras.losses.MeanSquaredError()
print(mse(y_true, y_pred).numpy())

9.25


### Mean Absolute Percentage Error

The mean absolute percentage error is computed using the function below.

$$loss = 100 \cdot \frac{|y_{true} - y_{pred}|} {y_{true}} $$

It is calculated as shown below.

In [22]:
y_true = [12.0, 20.0, 29.0, 60.0]
y_pred = [14.0, 18.0, 27.0, 55.0]
mape = tf.keras.losses.MeanAbsolutePercentageError()
print(mape(y_true, y_pred).numpy())

10.474138


### Mean Squared Logarithmic Error

In [23]:
y_true = [12.0, 20.0, 29.0, 60.0]
y_pred = [14.0, 18.0, 27.0, 55.0]
msle = tf.keras.losses.MeanSquaredLogarithmicError()
print(msle(y_true, y_pred).numpy())

0.010642167


### Cosine Similarity Loss

If your interest is in computing the cosine similarity between the true and predicted values, you’d use the CosineSimilarity.

In [25]:
y_true = [[12.0, 20.0], [29.0, 60.0]]
y_pred = [[14.0, 18.0], [27.0, 55.0]]
cosine_loss = tf.keras.losses.CosineSimilarity(axis=1)
print(cosine_loss(y_true, y_pred).numpy())

-0.9963575

### LogCosh Loss

The LogCosh class computes the logarithm of the hyperbolic cosine of the prediction error.

In [26]:
y_true = [[12.0, 20.0], [29.0, 60.0]]
y_pred = [[14.0, 18.0], [27.0, 55.0]]

l = tf.keras.losses.LogCosh()
print(l(y_true, y_pred).numpy())

2.0704765

### Huber loss

For regression problems that are less sensitive to outliers, the Huber loss is used. 

In [27]:
y_true = [12.0, 20.0, 29.0, 60.0]
y_pred = [14.0, 18.0, 27.0, 55.0]

h = tf.keras.losses.Huber()
print(h(y_true, y_pred).numpy())

2.25


## Creating custom loss functions in Keras

A custom loss function can be created by defining a function that takes the true values and predicted values as required parameters. The function should return an array of losses. The function can then be passed at the compile stage. 

In [28]:
def custom_loss_function(y_true, y_pred):
   squared_difference = tf.square(y_true - y_pred)
   return tf.reduce_mean(squared_difference, axis=-1)

model.compile(optimizer='adam', loss=custom_loss_function)

In [30]:
y_true = [12.0, 20.0, 29.0, 60.0]
y_pred = [14.0, 18.0, 27.0, 55.0]

cl = custom_loss_function(np.array(y_true), np.array(y_pred))
print(cl.numpy())

9.25


## Use of Keras loss weights

During the training process, one can weigh the loss function by observations or samples. The weights can be arbitrary but a typical choice are class weights (distribution of labels). Each observation is weighted by the fraction of the class it belongs to (reversed) so that the loss for minority class observations is more important when calculating the loss.  

One of the ways for doing this is passing the class weights during the training process. 

The weights are passed using a dictionary that contains the weight for each class. You can compute the weights using Scikit-learn or calculate the weights based on your own criterion. 

In [31]:
weights = { 
 0: 1.01300017,
 1: 0.88994364,
 2: 1.00704935,
 3: 0.97863318,
 4: 1.02704553,
 5: 1.10680686,
 6: 1.01385603,
 7: 0.95770152,
 8: 1.02546573,
 9: 1.00857287
}
# model.fit(x_train, y_train, verbose=1, epochs=10,class_weight=weights)

The second way is to pass these weights at the compile stage.

In [32]:
weights = [1.013, 0.889, 1.007, 0.978, 1.027,1.106,1.013,0.957,1.025, 1.008]
model.compile(optimizer=tf.keras.optimizers.SGD(),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              loss_weights=weights,
              metrics=['accuracy'])