# Classifying MNIST digits
![mnist digits](https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist/img/6a54f12d0f63c9bc.png)

## Lets see how we can design a deep learning model to classify these digits
- we learn to use tensorflow low level apis
- we will try out single layer ANN and then go for deeper networks


In [58]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

import numpy as np

### Helper function for loading data
#### No need worry about this for now

In [86]:
dataset = input_data.read_data_sets('mnist_digit', one_hot=True, validation_size=0)

Extracting mnist_digit/train-images-idx3-ubyte.gz
Extracting mnist_digit/train-labels-idx1-ubyte.gz
Extracting mnist_digit/t10k-images-idx3-ubyte.gz
Extracting mnist_digit/t10k-labels-idx1-ubyte.gz


In [87]:
print(dataset.train.images.shape)
print(dataset.test.images.shape)
print(dataset.train.labels.shape)
print(dataset.test.labels.shape)

(60000, 784)
(10000, 784)
(60000, 10)
(10000, 10)


## Lets define a function to create single ANN layer
- Variables are all the parameters that you want the training algorithm to determine for you
- As we know, a ANN unit consists of weights and biases, and some activation function


![ANN](https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist/img/d5222c6e3d15770a.png)

# For simplicity, lets use a activation function known to us

![ANN](https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist/img/604a9797da2a48d7.png)

In [88]:
# def dense_layer(x, in_units, out_units):
#     # in_units -> number of input connections to this layer
#     # out_units -> number of neurons i.e outputs of this layer
#     # weight matrix for this full layer
#     # since there are in_unit neurons, the shape will be (in_units, out_units)
#     W = tf.Variable(tf.random_normal((in_units, out_units)))
#     # one bias for each neuron
#     B = tf.Variable(tf.zeros([out_units]))
    
#     # take weighted sum of input, apply activation and return
#     out = tf.matmul(x, W) + B
#     return tf.nn.softmax(out)

def dense_layer(x, in_units, out_units, activation=tf.nn.relu):
    # in_units -> number of input connections to this layer
    # out_units -> number of neurons i.e outputs of this layer
    # weight matrix for this full layer
    # since there are in_unit neurons, the shape will be (in_units, out_units)
    W = tf.Variable(tf.random_normal((in_units, out_units)))
    # one bias for each neuron
    B = tf.Variable(tf.zeros([out_units]))
    
    # take weighted sum of input, apply activation and return
    out = tf.matmul(x, W) + B
    return activation(out)

def simple_layer(x, in_units, out_units):
    # in_units -> number of input connections to this layer
    # out_units -> number of neurons i.e outputs of this layer
    # weight matrix for this full layer
    # since there are in_unit neurons, the shape will be (in_units, out_units)
    W = tf.Variable(tf.random_normal((in_units, out_units)))
    # one bias for each neuron
    B = tf.Variable(tf.zeros([out_units]))
    
    return tf.matmul(x, W) + B


# A visualization of weighted sum happening in a single layer


![Weighted sum](https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist/img/21dabcf6d44e4d6f.png)

![out](https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist/img/206327168bc85294.png)

## here we define the actual ANN architecture
- we make use of the functions defined above

In [89]:
# place holder for unknown number of images of size 28*28
X = tf.placeholder(tf.float32, shape=(None, 28*28))
# place holder for labels##
Y = tf.placeholder(tf.int32, shape=(None, 10))

# simple single layer network
logits = simple_layer(X, 784, 10)
out = tf.nn.softmax(logits)


# two layer network
# out = dense_layer(X, 784, 512)
# logits = simple_layer(out, 512, 10)
# out = tf.nn.softmax(logits)


*Note: Y will contain actual labels, 'out' will contain the predicted labels*

## Lots of tensorflow magic here..
- we only need to tell tf, which loss function and optimizer to use
- Broad set of optimizers available like Adam, Adagrad, GradientDescent etc

## Cross entropy loss function
_Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be bad and result in a high loss value. A perfect model would have a log loss of 0._

![](https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist/img/1d8fc59e6a674f1c.png)

# Gradient Descent

![](https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist/img/34e9e76c7715b719.png)


## softmax_cross_entropy_with_logits measures the probability error in discrete classification tasks in which the classes are mutually exclusive

In [90]:
# but we dont need to worry much about the mathematical formula
# tf makes it easy!

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=Y, logits=logits)
loss = tf.reduce_mean(cross_entropy)
# lets use the most common optimizer known to us
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
# here we define what we actually want to run during training
# we want to minimize cross_entropy i.e diff b/w predicted and actual probabilities
train_op = optimizer.minimize(cross_entropy)

# % of correct answers found in batch
is_correct = tf.equal(tf.argmax(Y,1), tf.argmax(out,1))
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

In [91]:
# create a new tensorflow session
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())

## Let the training begin!

In [98]:
batch_size = 128
epochs = 10000


for i in range(epochs):
    batchX, batchY = dataset.train.next_batch(batch_size)
    for b in range(0, train_size // batch_size, batch_size):
        sess.run([train_op], feed_dict={X: batchX, Y: batchY})
        
    if i % 1000 == 0:
        res = sess.run([accuracy, loss], feed_dict={X: batchX, Y: batchY})
        print('Epoch', i, 'Train Accuracy =', res[0], '| Loss =', res[1])
        res = sess.run([accuracy, loss], feed_dict={X: dataset.test.images, Y: dataset.test.labels})
        print('Epoch', i, 'Test Accuracy =', res[0], '| Loss = ', res[1])
        print('-'*70)


Epoch 0 Train Accuracy = 0.9453125 | Loss = 0.1268822
Epoch 0 Test Accuracy = 0.9228 | Loss =  0.2932236
----------------------------------------------------------------------
Epoch 1000 Train Accuracy = 0.921875 | Loss = 0.37477785
Epoch 1000 Test Accuracy = 0.921 | Loss =  0.29630587
----------------------------------------------------------------------
Epoch 2000 Train Accuracy = 0.9453125 | Loss = 0.190063
Epoch 2000 Test Accuracy = 0.9222 | Loss =  0.29114234
----------------------------------------------------------------------
Epoch 3000 Train Accuracy = 0.9375 | Loss = 0.19934976
Epoch 3000 Test Accuracy = 0.9223 | Loss =  0.29254723
----------------------------------------------------------------------
Epoch 4000 Train Accuracy = 0.9609375 | Loss = 0.14497551
Epoch 4000 Test Accuracy = 0.9209 | Loss =  0.2931765
----------------------------------------------------------------------
Epoch 5000 Train Accuracy = 0.9375 | Loss = 0.28321114
Epoch 5000 Test Accuracy = 0.9218 | Loss 

# lets go DEEP

![deep](https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist/img/77bc41f211c9fb29.png)


# Additional Info

# Activation Functions


![sigmoid](https://cdn-images-1.medium.com/max/1600/1*XxxiA0jJvPrHEJHD4z893g.png)