# Softmax Tutorial (MNIST Dataset)

* Usually used as the last layer in an NN for classification.
* Activation function a_i = exp(z_i)/sum(z_all).
* The exponentials ensure that all the output activations are positive. And the sum in the denominator ensures that the softmax outputs sum to 1.
* Can think of softmax as a way of rescaling the z_i, and then squishing them together to form a probability distribution.

* Tutorials:
    * http://neuralnetworksanddeeplearning.com/chap3.html#softmax
    * https://www.tensorflow.org/versions/r1.1/get_started/mnist/beginners
    * http://colah.github.io/posts/2014-10-Visualizing-MNIST/

In [None]:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

### Load MNIST dataset

In [None]:
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

### The input is 28x28 image matrix denoted as a vector of 784 dimensions (with values from 0 to 1 indicating gray values)

In [None]:
x = tf.placeholder(tf.float32, [None, 784])  # None means that dim can be of any length (i.e total number of images in this case)

### Set the variables for the weights and biases

In [None]:
W = tf.Variable(tf.zeros([784, 10])) # 10 classes:- digits 0 to 9
b = tf.Variable(tf.zeros([10]))

### Softmax function and layer

In [None]:
evidence = tf.matmul(x, W) + b # activation function, gives the evidence supporting the claim for a class or against it
y = tf.nn.softmax(evidence)  # softmax = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis) # here logits = tf.matmul(x,W) + b, axis = -1 (last dim)

### Define Cost Function

In [None]:
y_ = tf.placeholder(tf.float32, [None, 10]) # The true output required (10 classes, n images)

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1])) # cross entropy cost function