# MNIST For ML Beginners 
-------------------------------
[Tutorial link](https://www.tensorflow.org/get_started/mnist/beginners)



### Dependencies
 - python 3.5, 3.6
 - tensorflow 1.2
 - tqdm (pip install tqdm): visualization tools for a loop


In [None]:
import tensorflow as tf
from tqdm import tqdm_notebook as tqdm
from tensorflow.examples.tutorials.mnist import input_data

In [None]:
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

## Create simple linear classifier
-----------------------------------
28 x 28 MNIST Image, first dimension is **'None'**, because we feed a batch of images (mini-batch) to our classification layer. Tensorflow will change dimension to multifly matirx onto this input batch. It's called **'broadcasting.'**

https://www.tensorflow.org/performance/xla/broadcasting

In [None]:
W = tf.Variable(tf.random_uniform([784, 10], -0.1, 0.1))
b = tf.Variable(tf.zeros([10]))
x = tf.placeholder(tf.float32, [None, 784]) # input images

### Softmax function
----------------------------------
https://en.wikipedia.org/wiki/Softmax_function  
http://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/



\begin{equation*}
\begin{split}
\boldsymbol{y} & = softmax(W * \boldsymbol{x} + \boldsymbol{b}) \\
y_i & = \frac {e^{z_i}} {\sum_{k=1}^K e^{z_k}}  \qquad\textrm{where}\  \boldsymbol{z} = W * \boldsymbol{x} + \boldsymbol{b} \\
\end{split}
\end{equation*}


In [None]:
y = tf.nn.softmax(tf.matmul(x, W) + b)
print(y)

## Define cross entropy
------------------------------

[The entropy](https://en.wikipedia.org/wiki/Entropy_(information_theory) is defined as follows:



\begin{equation*}
H(\boldsymbol{p}) = -\sum \boldsymbol{p}\log{\boldsymbol{p}} \\
\end{equation*}




[The cross entropy](https://en.wikipedia.org/wiki/Cross_entropy) is defined as follows:



\begin{equation*}
\begin{split}
H(\boldsymbol{p},\boldsymbol{q}) & = H(P) + D_{KL}(\boldsymbol{p}||\boldsymbol{q}) \\
& = -\sum \boldsymbol{p}\log{\boldsymbol{p}} + \sum \boldsymbol{p} \log{\frac{\boldsymbol{p}}{\boldsymbol{q}}} \\
& = -\sum \boldsymbol{p} \log(\boldsymbol{q})
\end{split}
\end{equation*}



## Define loss
-----------------------------
We define our loss function using the cross entropy as follows:


\begin{equation*}
L = -\frac{1}{N}\sum \boldsymbol{y'}\log{\boldsymbol{y}}
  \qquad \textrm{where} \ N \ \textrm{is size of mini-batch}
\end{equation*}


In [None]:
y_ = tf.placeholder(tf.float32, [None, 10]) # corrent labels

In [None]:
# cross-entropy
# 'axis=1' indicates that summation over each example
H = - tf.reduce_sum(y_ * tf.log(y), axis=1) 

# the mean over all the examples in the batch
# Note that these equations have numerical unstability on x < 0.
# So, tensorflow provides helper function to deal with it, tf.nn.softmax_cross_entropy_with_logits
# Next tutorial, you can see how to use this function.
L = tf.reduce_mean(H) 

In [None]:
# Note the tensor shapes
print(y_)
print(H)
print(L)

## Create gradient descent optimizer

In [None]:
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(L)

## Create 'InteractiveSession' for interactive python

In [None]:
sess = tf.InteractiveSession()
tf.global_variables_initializer().run() # init all variables

## Let's check our variables

In [None]:
print('W: \n', W.eval()) # randomly initialized
print('b: \n', b.eval()) # zero constant

## Let's train!

In [None]:
for iteration in tqdm(range(1000)):
    images, labels = mnist.train.next_batch(128) # get mini-batch images and corresponding labels
    _, loss = sess.run([optimizer, L], feed_dict={x: images, y_: labels})
    
    if iteration % 100 == 0:
        print ("iter : {:4d},  loss: {:.5f}".format(iteration, loss))

## Evaluate our model

In [None]:
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

In [None]:
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

In [None]:
# after training
print('W: \n', W.eval()) 
print('b: \n', b.eval()) 