The previous $\sum_i{x_i}log(p_i)$ &ensp;&ensp;&ensp;        (1)

Probs $p_i = \frac{exp(y_i)}{\sum_i{exp(y_i)}}$ &ensp;&ensp;&ensp;      (2)&ensp;   where $y_i$s are logits

Computing eq(1) is unstable because of $log(p_i)$ might become $log(0)$. The same reason tf.nn.softmax_cross_entropy_with_logits() uses logits other than probs.

Substitute $p_i$ by eq(2), eq(1) becomes $\sum_i\{{x_i}y_i-log\sum_i{exp(y_i)}\}$

There are some algorithms which can compute the logsumexp term to avoid overflow issues. That function integrated in TensorFlow is a very stable implementation.

In [1]:
from tensorflow.python.ops import math_ops
import tensorflow as tf

with tf.Session() as sess:
    counts = [1., 2., 3.]
    logits = [-1., 2., 3.]
    probs = tf.exp(logits) / tf.reduce_sum(tf.exp(logits))
    
    # The previous one
    print('The previous one', sess.run(math_ops.reduce_sum(counts * math_ops.log(probs), -1)))
    
    # Minus cross entropy
    print('Minus Cross entropy', sess.run(-tf.nn.softmax_cross_entropy_with_logits(labels=counts, logits=logits)))
    
    # The current one
    logsumexp = math_ops.reduce_logsumexp(logits, -1, keep_dims=True)
    print('The current one',sess.run(math_ops.reduce_sum(counts * (logits - logsumexp), -1)))

    print
    # Let some of probs be close to zero
    counts = [1., 2., 3.]
    logits = [-1000., 2000., 2000.]
    probs = tf.exp(logits) / tf.reduce_sum(tf.exp(logits))
    
    # The previous one
    print('The previous one',sess.run(math_ops.reduce_sum(counts * math_ops.log(probs), -1)))
    
    # Minus cross entropy
    print('Minus Cross entropy', sess.run(-tf.nn.softmax_cross_entropy_with_logits(labels=counts, logits=logits)))
    
    # The current one
    logsumexp = math_ops.reduce_logsumexp(logits, -1, keep_dims=True)
    print('The current one',sess.run(math_ops.reduce_sum(counts * (logits - logsumexp), -1)))
    
    

('The previous one', -7.9593754)
('Minus Cross entropy', -7.9593763)
('The current one', -7.9593763)

('The previous one', nan)
('Minus Cross entropy', -3004.1587)
('The current one', -3004.1587)
