<a href="https://colab.research.google.com/github/maciejskorski/nn_hessian_intialization/blob/master/categorical_crossentropy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Categorical Cross Entropy Loss

Cross-entropy loss is used to compare two distributions, predicted one and ground truth. 

When predicting one-out-of-many categories the loss formula greatly simplifies, because the true distribution is one-point mass. Also, it is better to use class indices rather than one-hot-encoded vectors! Essentially **lookups replace vector multplications**!

---


To compute Categorical Cross Entropy Loss, we need only

*   Logits: log-probabilities from network output (unnormalized last layer!), of shape [batch_size,n_classes]
*   True labels, of shape [batch_size]

Note this is much more efficient than pushing trough softmax and applying standard cross-entropy next.

We discuss **custom implementation** because 


*   it is actually simple (log-sum-exp and lookupup, that's it!), but saidly not well known
*   tensorflow has bugs when it comes to its second derivative https://github.com/andrewharp/tensorflow/commit/42add47d50ae7572dbd3a4b69711b800a429ec1b

## TF bug

In [1]:
%tensorflow_version 1.x

import tensorflow as tf

tf.reset_default_graph()

n_batch = 1
n_classes = 10

labels = tf.random.uniform(shape=[n_batch],minval=0,maxval=n_classes,dtype=tf.int32)
logits = tf.random.normal(shape=[n_batch,n_classes])

loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels,logits=logits)
loss = tf.reduce_mean(loss)

tf.hessians(loss,logits)

TensorFlow 1.x selected.


LookupError: ignored

## Custom implementation



In [0]:
## implementation

def SparseCategoricalCrossentropy(labels,logits):
  ''' labels: shape [n_batch] contains true classes as numbers from 0 to n_classes-1
      logits: shape [n_batch,n_classes], predicted log probabilities '''
  Z = tf.reduce_logsumexp(logits,axis=-1)
  lookup_labels = tf.stack([tf.range(tf.shape(labels)[0]),tf.cast(labels,tf.int32)],1)
  true_logits = tf.gather_nd(logits,lookup_labels,batch_dims=0)
  return -true_logits + Z

In [3]:
## test it gives same results as TF 

loss1 = SparseCategoricalCrossentropy(labels,logits)
loss1 = tf.reduce_mean(loss1)

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  val = sess.run([loss,loss1])
  print(val)

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
[1.0842774, 1.0842774]
