 <strong>The MNIST database of handwritten digits</strong> has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.
It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

<br>
<strong>The MNIST database (Modified National Institute of Standards and Technology database)</strong> is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning.It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments.Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.

<br>

<strong>MNIST database</strong> contains 60,000 training images and 10,000 testing images. Half of the training set and half of the test set were taken from NIST's training dataset, while the other half of the training set and the other half of the test set were taken from NIST's testing dataset. The original creators of the database keep a list of some of the methods tested on it. In their original paper, they use a support-vector machine to get an error rate of 0.8%. An extended dataset similar to MNIST called EMNIST has been published in 2017, which contains 240,000 training images, and 40,000 testing images of handwritten digits and characters.

<br>


<img src = "https://upload.wikimedia.org/wikipedia/commons/2/27/MnistExamples.png">
                          


In [None]:
#pip install tensorflow-gpu==1.15  # GPU

In [None]:
import tensorflow as tf 

tf.__version__

In [None]:
#import the important modules, libraries, and frameworks
import numpy as np 

import matplotlib.pyplot as  plt

import os 

import cv2

import pandas as pd 

In [None]:
#import the training and test sets
train_df = pd.read_csv("../input/mnist-in-csv/mnist_train.csv")
test_df = pd.read_csv("../input/mnist-in-csv/mnist_test.csv")

In [None]:
#print the shpae of training and test data sets
print(train_df.shape)
print(test_df.shape)

In [None]:
#Explore the dataset
train_df.head()

In [None]:
#split the training dataset into  features --> tr_x and labels --> tr_y
tr_x = train_df[train_df.columns[train_df.columns != 'label']]
tr_y = train_df[train_df.columns[train_df.columns == 'label']]
print("The shape of training features:{0}\nThe shape of labels:{1}".format(tr_x.shape,tr_y.shape))

In [None]:
#Convert the training features and labels into numpy array to feed the CNN During training
tr_x = tr_x.values

tr_y = tr_y.values

In [None]:
#split the testing dataset into  features --> test_x and labels --> test_y
test_x = test_df[test_df.columns[test_df.columns != 'label']]
test_y = test_df[test_df.columns[test_df.columns == 'label']]
print("The shape of testing features:{0}\nThe shape of labels:{1}".format(test_x.shape,test_y.shape))

In [None]:
#Convert the training features and labels into numpy array to feed the CNN during testing
test_x = test_x.values

test_y = test_y.values

# Preparing the data for CNN Model

<br>
<strong>For MNIST,</strong> we will be using two convolutional layers, each followed by a relu and a maxpool layers,
and two fully connected layers. Strides for all convolutional layers are [1, 1, 1, 1].

<br>

<strong>The Architecture of the model looks like that:</strong>

<img src = "https://i.imgur.com/lAh42E5.jpg">
     


## 1- The First step  to bulid the CNN is to Set up CNN weights

In [None]:
#Define the input and output placeholders
x = tf.placeholder(tf.float32, [None, 28*28])
y = tf.placeholder(tf.float32, [None, 10])

#Apply 32 convolutions of window-size 5*5
w1 = tf.Variable(tf.random_normal([5,5,1,32]))
b1 = tf.Variable(tf.random_normal([32]))

#Then Apply 32 more  convolutions of window-size 5*5
w2 = tf.Variable(tf.random_normal([5,5,32,64]))
b2 = tf.Variable(tf.random_normal([64]))

#Then we introduced a fully-connected l ayer
w3 = tf.Variable(tf.random_normal([7*7*64,1024]))
b3 = tf.Variable(tf.random_normal([1024]))

#Finaly, we define the variables for a fully-connected linear layer
w_out = tf.Variable(tf.random_normal([1024,10]))
b_out = tf.Variable(tf.random_normal([10]))

In [None]:
#lets Create dic to hold our  parameters to can get it after updated
parameters = {"W1": w1, "b1": b1, "W2": w2, "b2": b2, "W3": w3, "b3": b3, "WO": w_out, "bO": b_out}

# 2- Define some helper functions
<strong>we define a helper functions</strong> to perform a convolution, add a bias term, and
then an activation function. <strong>Together,</strong> these three steps form a convolution layer of the
network.

<br>

<strong>The helper functions are:</strong>
<ul>
    <li>Conv_layer</li> to create convolutional layer
    <li>maxpool_layer</li>to create max-pool layer
    </ul>

In [None]:
#Create a convolutional layer
def conv_layer(x, w, b):
    conv = tf.nn.conv2d(x, w, strides = [1,1,1,1], padding = 'SAME')
    conv_with_b = tf.nn.bias_add(conv, b)
    conv_out = tf.nn.relu(conv_with_b)
    return conv_out

In [None]:
#Create a max-pool layer
def maxpool_layer(conv, k = 2):
    return tf.nn.max_pool(conv, ksize = [1,k,k,1], strides = [1,k,k,1], padding = 'SAME')

## Using these helper functions tp create the Architecture of the CNN model
<strong>we will create function called model(x) that build the CNN model </strong>
<ul>
    <li>The Function takes the input features, x</li>
    <li>The Function returns the output classes
    </ul>

In [None]:
def model(x):
    #Reshape the features in the form(m,height,wideth,channels), and m represents the number of training examples
    x_reshaped = tf.reshape(x, shape = [-1, 28, 28, 1])
    
    #Construct the first layer of convolution and max-pooling
    conv_out1 = conv_layer(x_reshaped, w1, b1)
    maxpool_out1 = maxpool_layer(conv_out1)
    
    #Construct the second layer of convolution and max-pooling
    conv_out2 = conv_layer(maxpool_out1, w2, b2)
    maxpool_out2 = maxpool_layer(conv_out2)
    
    #Finally, Construct the final fully connected layer
    ##1.First flatten the output from the second layer
    maxpool_reshaped = tf.reshape(maxpool_out2, [-1, w3.get_shape().as_list()[0]])
    
    ##2.Create the linear part of the fully connected layer
    linear_part = tf.add(tf.matmul(maxpool_reshaped, w3), b3)
    
    ##3.Create the non-linear part of the fully connected layer 
    ##in other word,applay the activation function on the linear part
    nonlinear_part = tf.nn.relu(linear_part)
    
    #Get the the output ten classses
    output = tf.add(tf.matmul(nonlinear_part, w_out), b_out)
    
    return output

# 3- Define some operations to measure the cost and accuracy

<strong>some things you should know</strong>
<ul>
    <li><strong>Cross Entropy:</strong></li> Our loss function used for classifications problems
    <li><strong>ADAM Optimizer:</strong></li> Our optimizer used instead of Gradient Descent
    </ul>
    
    <br>
## Note:

We cast  <strong>correct_pred</strong> to float to get decimal value not integer because we will divide integer by integer


In [None]:
#Construct the model
model_op = model(x)

#Define the classification loss function
cost  = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits = model_op, labels = y))

#Define the training optimizer to minimize the loss function
train_op = tf.train.AdamOptimizer(learning_rate = 0.001).minimize(cost)

#Define the Calculation to get the accuracy of the model
##1.first define the correct predictions between the model and the ground truth
correct_pred = tf.equal(tf.argmax(model_op, 1), tf.argmax(y, 1))

##2.Then Define the accuracy fomula which is the number of examples correctly classified over the total number of examples
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# 4- Process the data before feeding the CNN for training 

### we will process the data using:
<ul>
    <li><strong>One-hot encoding</strong></li> In digital circuits, one-hot refers to a group of bits among which the legal combinations of
values are only those with a single high (1) bit and all the others low (0).
In this case, one-hot encoding means that if the output of the image is the digit 7, then the
output will be encoded as a vector of 10 elements with all elements being 0, except for the
element at index 7 which is 1.
    <li><strong>Normalizing the data</strong></li> we normalize the data to get fixed range, between 0 and 1, and thus faster training
    <ul>

In [None]:
#Encoding the output using one_hot() function
def one_hot(labels):
    labels_ = np.zeros((60000, 10))
    labels_[np.arange(60000), labels] = 1
    labels_ = np.array(labels_)
    return labels_

In [None]:
#Normalizing the features in the training and testing data
tr_x = tr_x /255
test_x = test_x /255

# 5- Training the classifier

In [None]:
def model_train(parameters):
    sess = tf.Session()
    
    #important to initialize the variables in order to use it 
    sess.run(tf.global_variables_initializer())
    
    onehot_labels = one_hot(tr_y)
    batch_size = 256
    #Loop through 1000 epochs
    for i in range(0, 1000):    
        #Train the network in batchse
        for j in range(0, 60000, batch_size):
            batch_features = tr_x[j:j+batch_size, :]
            batch_onehot_labels = onehot_labels[j:j+batch_size, :]
            sess.run(train_op, feed_dict = {x: batch_features, y: batch_onehot_labels})
            cost_ = sess.run(cost, feed_dict = {x: batch_features, y: batch_onehot_labels})
            accuracy_ = sess.run(accuracy, feed_dict = {x: batch_features, y: batch_onehot_labels})
            
            if j % 2048 == 0:
                print("At j:{0}, the accuracy:{1}".format(j, accuracy_))
        print("Reached epoch",i ,"cost J = ", cost_)
        
    # lets save the parameters in a variable
    parameters = sess.run(parameters)
    print("\n\nParameters have been trained!") 
    return parameters,sess

In [None]:
parameters,sess = model_train(parameters)