## Classify grayscale images of handwrit- ten digits (28 × 28 pixels) into their 10 categories (0 through 9)

#### This will require installing keras. We are using standard MNSIT Dataset

##### MNSIT is like "Hello World" of Deep Learning

In [20]:
from keras.datasets import mnist
from keras import models
from keras import layers
from keras.utils import to_categorical

In [3]:
# Import the dataset here
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz


In [None]:
#What is the process that is to be followed
#train_images and train_labels form the training set, the data that the model will learn from. 
#The model will then be tested on the test set, test_images and test_labels.

In [4]:
train_images.shape # we have 60000 records with 28 by 28 pixel

(60000, 28, 28)

In [8]:
len(train_labels)

60000

In [9]:
train_labels

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

In [6]:
test_images.shape # we have 10000 records with 28 by 28 pixel

(10000, 28, 28)

In [10]:
len(test_labels)

10000

In [11]:
test_labels

array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

In [None]:
# How to proceed: First we will train the NN on training data i.e. train_images and train_labels
# then algo will associate image with each label
# then we will use network to produce predictions for test_images
# and finally we will check the prediction accuracy

In [14]:
# define your network here
network = models.Sequential()

network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))
# 512 and 10 are called as units: Positive integer, dimensionality of the output space.
# activation is the activation functions

In [None]:
# What is relu: relu(x) is max(x, 0).It gives an output x if x is positive and 0 otherwise.


## This I have taken from wikipedia
# What is softmax
#If we take an input of [1, 2, 3, 4, 1, 2, 3], 
#the softmax of that is [0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175]. 
#The output has most of its weight where the '4' was in the original input. 
#This is what the function is normally used for: to highlight the largest values and 
#suppress values which are significantly below the maximum value. But note: softmax 
#is not scale invariant, so if the input were [0.1, 0.2, 0.3, 0.4, 0.1, 0.2, 0.3] (which sums to 1.6) 
#the softmax would be [0.125, 0.138, 0.153, 0.169, 0.125, 0.138, 0.153]. 
#This shows that for values between 0 and 1 softmax, 
#in fact, de-emphasizes the maximum value 
#(note that 0.169 is not only less than 0.475, it is also less than the initial value of 0.4).

In [15]:
network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])
## there are many optimizers which are avaialable classic one being Batch Gradient Descent
# loss functions are MSE, Mean Absolute Error, cosine proximity, categorical for classes classification

#### Prepare the dataset here

In [16]:
train_images = train_images.reshape((60000, 28 * 28))

In [17]:
train_images = train_images.astype('float32') / 255
# why 255, 
#because there are (usually) 256-color values (0-255), 
#but the framework might be using uses the unit RGB value scale 0-1.

In [18]:
test_images = test_images.reshape((10000, 28 * 28))

In [19]:
test_images = test_images.astype('float32') / 255

#### Prepare the Labels here

In [21]:
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

#### Now train the network here

In [22]:
network.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x119189748>

In [None]:
#The network will start to iterate on the training data in mini-batches of 128 samples, 
#5 times over (each iteration over all the training data is called an epoch). 
#At each iteration, the network will compute the gradients of the weights with regard 
#to the loss on the batch, and update the weights
#After these 5 epochs, the network will have performed 2,344 gradient updates (469 iterations per epoch)
# 60000/128 = 468.75      (60000/128)*5 = 2343.75
In the neural network terminology:

#one epoch = one forward pass and one backward pass of all the training examples
#batch size = the number of training examples in one forward/backward pass. 
#The higher the batch size, the more memory space you'll need.
#number of iterations = number of passes, each pass using [batch size] number of examples. 
#To be clear, one pass = one forward pass + one backward pass (we do not count 
#the forward pass and backward pass as two different passes).
#Example: if you have 1000 training examples, and your batch size is 500, 
#then it will take 2 iterations to complete 1 epoch.

#### Now let's check on Train Dataset here

In [23]:
test_loss, test_acc = network.evaluate(test_images, test_labels)



In [24]:
print('test_acc:', test_acc)

test_acc: 0.9777


In [None]:
## We can see that a test accuracy of 97% is achieved here