inspired from https://www.tensorflow.org/get_started/mnist/beginners

In [1]:
# import
import tensorflow as tf

# import mnist data
# data hosted at http://yann.lecun.com/exdb/mnist/
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)

Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz


* one-hot vector is a vector which is 0 in most dimensions, and 1 in a single dimension. In this case, the nth digit will be represented as a vector which is 1 in the nth dimension. For example, 3 would be [0,0,0,1,0,0,0,0,0,0].

In [2]:
print("lets explore the mnist data that we imported")
print("======================================")
print("train data")
print("======================================")
print("images shape", mnist.train.images.shape)
print("labels shape", mnist.train.labels.shape)
print("train number of examples", mnist.train.num_examples)
print("train epoch completed", mnist.train.epochs_completed) #need to understand this
print("======================================")
print("same goes for test data")
print("======================================")
print("images shape", mnist.test.images.shape)
print("labels shape", mnist.test.labels.shape)
print("test number of examples", mnist.test.num_examples)
print("test epoch completed", mnist.test.epochs_completed) #need to understand this
print("======================================")
print("same goes for validation data")
print("======================================")
print("images shape", mnist.validation.images.shape)
print("labels shape", mnist.validation.labels.shape)
print("validation number of examples", mnist.validation.num_examples)
print("validation epoch completed", mnist.validation.epochs_completed) #need to understand this

lets explore the mnist data that we imported
train data
images shape (55000, 784)
labels shape (55000, 10)
train number of examples 55000
train epoch completed 0
same goes for test data
images shape (10000, 784)
labels shape (10000, 10)
test number of examples 10000
test epoch completed 0
same goes for validation data
images shape (5000, 784)
labels shape (5000, 10)
validation number of examples 5000
validation epoch completed 0


* We can see the data is split into three parts: 55,000 data points of training data (mnist.train), 
10,000 points of test data (mnist.test), and 5,000 points of validation data (mnist.validation). 
every MNIST data point has two parts: an image of a handwritten digit and a corresponding label. 
We'll call the images 'x' and the labels 'y'
* mnist.train.images is a tensor (an n-dimensional array) with a shape of [55000, 784]. The first dimension is an index into the list of images and the second dimension is the index for each pixel in each image. Each entry in the tensor is a pixel intensity between 0 and 1, for a particular pixel in a particular image.
* Each image in MNIST has a corresponding label, a number between 0 and 9 representing the digit drawn in the image.

### Softmax Regression (probablity distribution over classes)

every image in MNIST is of a handwritten digit between zero and nine. So there are only ten possible things that a given image can be. We want to be able to look at an image and give the probabilities for it being each digit. For example, our model might look at a picture of a nine and be 80% sure it's a nine, but give a 5% chance to it being an eight (because of the top loop) and a bit of probability to all the others because it isn't 100% sure.

This is a classic case where a softmax regression is a natural, simple model. If you want to assign probabilities to an object being one of several different things, softmax is the thing to do, because softmax gives us a list of values between 0 and 1 that add up to 1.

A softmax regression has two steps: first we add up the evidence of our input being in certain classes, and then we convert that evidence into probabilities.

To tally up the evidence that a given image is in a particular class, we do a weighted sum of the pixel intensities. The weight is negative if that pixel having a high intensity is evidence against the image being in that class, and positive if it is evidence in favor.

We also add some extra evidence called a bias. Basically, we want to be able to say that some things are more likely independent of the input. 



In [3]:
# x isn't a specific value. It's a placeholder, a value that we'll input when we ask TensorFlow to run a computation. 
# We want to be able to input any number of MNIST images, each flattened into a 784-dimensional vector. 
# We represent this as a 2-D tensor of floating-point numbers, with a shape [None, 784]. 
x = tf.placeholder(tf.float32, [None, 784], name="x") # None means that a dimension can be of any length.

# weights and biases will be trainable. Therefore they will be Variable
W = tf.Variable(tf.zeros([784,10]), name="w")
b = tf.Variable(tf.zeros([10]), name="b")

# model
y = tf.matmul(x,W) + b # predicted (also called logits)

# define loss and optimizer
y_ = tf.placeholder(tf.float32, [None, 10], name="y_") # true labels (one-hot)

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

train_step = tf.train.GradientDescentOptimizer(0.1).minimize(cross_entropy)

In [4]:
# now that we have decided our path of execution, the only task remaning is to follow(run) it
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

# train our model
for _ in range(10000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x:batch_xs, y_:batch_ys})
    
# after our model is run, we need to check the accuracy
# tf.argmax gives you the index of the highest entry in a tensor along some axis. 
# For example, tf.argmax(y,1) is the label our model thinks is most likely for each input, 
# while tf.argmax(y_,1) is the correct label. We can use tf.equal to check if our prediction matches the truth.
correct_prediction = tf.equal(tf.arg_max(y,1), tf.arg_max(y_,1))

# That gives us a list of booleans. To determine what fraction are correct, we cast to floating point numbers 
# and then take the mean. For example, [True, False, True, True] would become [1,0,1,1] which would become 0.75.
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# printing the accuracy
print("Accuracy ", sess.run(accuracy, feed_dict={x:mnist.test.images, y_:mnist.test.labels}))


Accuracy  0.9226


This accuracy states that flattening of the input images makes us lose some of the information about the 2D structure of the image.
Next, we will be using more sofisticated methods.

### Explaination- why we use "softmax_cross_entropy_with_logits" instead of raw calculations of cross_entropy
https://stackoverflow.com/questions/37312421/tensorflow-whats-the-difference-between-sparse-softmax-cross-entropy-with-logi/37317322#37317322
Having two different functions is a convenience, as they produce the same result.

The difference is simple:

For sparse_softmax_cross_entropy_with_logits, labels must have the shape [batch_size] and the dtype int32 or int64. Each label is an int in range [0, num_classes-1].
For softmax_cross_entropy_with_logits, labels must have the shape [batch_size, num_classes] and dtype float32 or float64.
Labels used in softmax_cross_entropy_with_logits are the one hot version of labels used in sparse_softmax_cross_entropy_with_logits.


The raw formulation of cross-entropy -->
   
* tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.nn.softmax(y)),reduction_indices=[1]))
   
This can be numerically unstable.
   
So here we use tf.nn.softmax_cross_entropy_with_logits on the raw outputs of 'y', and then average across the batch.