# Softmax vs Sigmoid

What is the difference between these 2 functions? Experiment it!

In [47]:
import tensorflow as tf
s = tf.Session()

example_labels = [0.0, 0.0, 1.0, 0.0]

sm = s.run(tf.nn.softmax([1.0, 2.0, 3.0, 4.0]))
sm_ce = s.run(tf.nn.softmax_cross_entropy_with_logits(logits=[1.0, 2.0, 3.0, 4.0], labels=example_labels))
sig = s.run(tf.nn.sigmoid_cross_entropy_with_logits(logits=[1.0, 2.0, 3.0, 4.0], labels=example_labels))

## What is the difference between these 3 functions?

1) `tf.nn.softmax` produces just the result of applying the softmax function to an input tensor. The softmax "squishes" the inputs so that sum(input) = 1; it's a way of normalizing. The outputs of softmax can be interpreted as probabilities.

In [48]:
print(sm)

[ 0.0320586   0.08714432  0.23688281  0.64391422]


2) `tf.nn.softmax_cross_entropy_with_logits` Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). For example, each CIFAR-10 image is labeled with one and only one label: an image can be a dog or a truck, but not both. 
The cross entropy is computing after applying the softmax function (but it does it all together in a more mathematically careful way). It's similar to the result of:
`sm = tf.nn.softmax(x)`
and
`ce = cross_entropy(sm)`

See also [this](http://stackoverflow.com/questions/34240703/difference-between-tensorflow-tf-nn-softmax-and-tf-nn-softmax-cross-entropy-with)

In [58]:
print(sm_ce)

# Let's demonstrate that tf.nn.softmax_cross_entropy_with_logits is equivalent to sm = tf.nn.softmax(x) and ce = cross_entropy(sm)

sm = tf.nn.softmax([1.0, 2.0, 3.0, 4.0])
sm_vals = s.run(sm)
print(sm_vals)
# Simple way to calculate the cross entropy for the example above, as the only predicted label has index 2
import math
print(-(math.log(sm_vals[0]) * example_labels[0] +
        math.log(sm_vals[1]) * example_labels[1] + 
        math.log(sm_vals[2]) * example_labels[2] + # Only this is taking into account as the other example_label indexes are 0
        math.log(sm_vals[3]) * example_labels[3]))


# The sklearn way... Requires a weird epsilon
from sklearn.metrics import log_loss # this is the equivalent of cross entropy in sklearn
ce = log_loss(example_labels, sm_vals, eps=0.845)
print(ce)


1.44019
[ 0.0320586   0.08714432  0.23688281  0.64391422]
1.44018975034
1.44035237283


3) `tf.nn.sigmoid_cross_entropy_with_logits` Measures the probability error in discrete classification tasks in which each class is independent and not mutually exclusive. For instance, one could perform multilabel classification where a picture can contain both an elephant and a dog at the same time.


In [59]:
print(sig)

[ 1.31326175  2.12692809  0.04858735  4.01814985]
