date: 2020-12-09 12:17:17
author: Jerry Su
slug: CategoricalCrossentropy VS SparseCategoricalCrossentropy
title: CategoricalCrossentropy VS SparseCategoricalCrossentropy
category: 
tags: Deep Learning, TensorFlow

## SparseCategoricalCrossentropy

Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided in a **one_hot** representation. If you want to provide labels as **integers**, please use **SparseCategoricalCrossentropy** loss. There should be # classes floating point values per feature.

https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy

[tf.keras.losses.Reduction](https://www.tensorflow.org/api_docs/python/tf/keras/losses/Reduction)

```
class TFTokenClassificationLoss:
    """
    Loss function suitable for token classification.
    
    .. note::
        Any label of -100 will be ignored (along with the corresponding logits) in the loss computation.
    """

    def compute_loss(self, labels, logits):
        loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(
            from_logits=True, reduction=tf.keras.losses.Reduction.NONE
        )
        # make sure only labels that are not equal to -100
        # are taken into account as loss
        if tf.math.reduce_any(labels == -1):
            warnings.warn("Using `-1` to mask the loss for the token is deprecated. Please use `-100` instead.")
            active_loss = tf.reshape(labels, (-1,)) != -1
        else:
            active_loss = tf.reshape(labels, (-1,)) != -100
        reduced_logits = tf.boolean_mask(tf.reshape(logits, (-1, shape_list(logits)[2])), active_loss)
        labels = tf.boolean_mask(tf.reshape(labels, (-1,)), active_loss)

        return loss_fn(labels, reduced_logits)
```

In [1]:
import tensorflow as tf

In [2]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction=tf.keras.losses.Reduction.NONE)

In [3]:
labels = tf.constant([[[1, 2, 0, -100], [2, 1, -100, -100]]])
labels

<tf.Tensor: shape=(1, 2, 4), dtype=int32, numpy=
array([[[   1,    2,    0, -100],
        [   2,    1, -100, -100]]], dtype=int32)>

In [4]:
logits = tf.constant([[[2, 4, 5, 0], [1, 4, 0, 0]]])
logits

<tf.Tensor: shape=(1, 2, 4), dtype=int32, numpy=
array([[[2, 4, 5, 0],
        [1, 4, 0, 0]]], dtype=int32)>

In [5]:
tf.reshape(labels, (-1,))

<tf.Tensor: shape=(8,), dtype=int32, numpy=array([   1,    2,    0, -100,    2,    1, -100, -100], dtype=int32)>

In [6]:
active_loss = tf.reshape(labels, (-1,)) != -100
active_loss

<tf.Tensor: shape=(8,), dtype=bool, numpy=array([ True,  True,  True, False,  True,  True, False, False])>

In [7]:
reduced_logits = tf.boolean_mask(tf.reshape(logits, (-1,)), active_loss)
reduced_logits

<tf.Tensor: shape=(5,), dtype=int32, numpy=array([2, 4, 5, 1, 4], dtype=int32)>

In [8]:
labels = tf.boolean_mask(tf.reshape(labels, (-1,)), active_loss)
labels

<tf.Tensor: shape=(5,), dtype=int32, numpy=array([1, 2, 0, 2, 1], dtype=int32)>

In [9]:
loss_fn(y_true=labels, y_pred=reduced_logits)

ValueError: Shape mismatch: The shape of labels (received (5,)) should equal the shape of logits except for the last dimension (received (1, 5)).