<a href="https://colab.research.google.com/github/rohans1029/pythonPlayground/blob/master/firstNeuralNetworkMNIST.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is my attempt at a neural network! We will use the MNIST Dataset-the "Hello, world!" machine learning problem. 

In [0]:
import tensorflow as tf
import keras

In [0]:
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

We have just imported tensorflow and keras, the two things needed for this network. Keras already comes with common datasets in it, so all we needed to do is import the *mnist* dataset from *keras.datasets*. Then, we assign the data to mnist.load_data() with the classes (labels) and training/testing noted. 

As we know, computers read images as a bunch of different numbers. To break it down, we must define the word "tensor" as another word for dimension. When a figure is three-dimensional, we mean the figure is a 3d tensor. The following heirarchy best describes this: 


*   0D tensor: a single number (scalar) 
*   1d tensor: an array (vector) 
*   2d tensor: a 2d array (matrix) 
*   3d tensor: 2d tensors in another array 
*   nd tensor: n-1d tensor in another array






Now that we have tensors covered, we must define the meaning of "shape". 

A shape is a tuple (sequence in python)  of integers that describes how many dimensions the tensor has along each axis (the length of the sequence equals the dimensions of the tensor-a length of 4 would mean the data is 4d). 

Image datasets are generally read as 3d or 4d tensors (3d if grayscale and 4d if rbg, because their is another value in the shape array for the color).  

Take for example, the mnist training set shape: 

In [0]:
train_images.shape

">>>" (60000, 28, 28), since the dataset is composed of grayscale images. the shape tuple is of length 3, meaning the shape is three dimensions. The first number in the tuple is 60000 meaning that their are 60000 images in our training test set. The second number is 28 and represents the height of the image. The third number is 28 and represents the width of the image. In summary: (# of samples, height, width). 

Now if the image was in color, the shape would be 4d and the shape tuple would be like this: (# of samples, height, width, channels) where channels mean the amount of color values for each image-usually 3 if using the rbg scale (one number for each color value). 

It also worth noting that a grayscale image can also be a 4d tuple by simply making the fourth integer a 1, representing a number for the grayscale value (60000, 28, 28, 1). 

Now lets see the data labels:

In [0]:
train_labels 

In [0]:
train_images

Now it is time to build the neural network. To describe the workflow, we will feed the network the train dataset for it to learn the association between the images and labels. Then we'll test it by asking the network to produce predictions for test_images. Then we will verify to see how well the predictions match the labels.  

In [0]:
# import models and layers for keras. 
from keras import models 
from keras import layers 

# import the Sequential model from models 
network = models.Sequential() 
# create the layers 
# Dense layer for image filtering to produce representations
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
# 10-way softmax layer that returns an array of 10 probability scores (summing to 1)
network.add(layers.Dense(10, activation='softmax'))

We have just built a network (a very simple one) consisting of a Sequential model (linear, input/output) defined as *network* and two layers. One layer is a dense layer that is the first input layer in which feature representations are learned. Then those representations are sent into the next layer known as the classifier layer (also a dense layer). This layer takes in the feature representations it learned in the previous layer and spits out an array of 10 numbers, each of which are a double displaying the probability that the image is that respective number (like percents in which the complete sum of all the numbers will be one). This layer does this through the activation function *softmax*. 

To better understand what is under the hood here, take François Chollet's explanation: 
"The core building block of neural networks is the layer, a data-processing module that you can think of as a filter for data. Some data goes in, and it comes out in a more useful form. Specifically, layers extract representations out of the data fed into them—hopefully, representations that are more meaningful for the problem at hand. Most of deep learning consists of chaining together simple layers that will implement a form of progressive data distillation. A deep-learning model is like a sieve for data processing, made of a succession of increasingly refined data filters—the layers. 

Here, our network consists of a sequence of two Dense layers, which are densely connected (also called fully connected) neural layers. The second (and last) layer is a 10-way softmax layer, which means it will return an array of 10 probability scores (summing to 1). Each score will be the probability that the current digit image belongs to one of our 10 digit classes."-Deep Learning with Python. 

Now since the network is done being built, we are almost ready for training. First, however, we need to do what's called the "compilation step" consisting of defining three things: 


*   **Loss function**: a fuction used during training that helps the network measure its performance on the training data, thus helping steer itself in the right direction (helps with making adjustments (optimizer does this, and the loss function supplies the information) to the representations learned in the network). 
*   **Optimizer**: Françis Chollet says it best "optimizer—The mechanism through which the network will update itself based on the data it sees and its loss function."-Deep Learning with Python. 
*   **Monitor metrics during training and testing**: accuracy (for this case). 

In summary, the loss function supplies the information needed for the network to utilize the optimizer. 

In [0]:
network.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

Now we enter the "preprocessing step" in which we reshape the data into the shape the network expects. We also need to scale it so that all values are in the "[0,1]" interval. 

In [0]:
# reshape into a 2d tuple array. 
train_images = train_images.reshape((60000, 28 * 28))
# transform in float32 array with values in between 0 and 1. 
train_images = train_images.astype('float32') / 255

# do the same for testing images.
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

Now that we are done with the two main steps after building the network (compiling, and preprocessing) we have one final step: preparing the labels. We categorically encode the labels as follows:   

In [0]:
# import to_categorical from keras utility. 
from keras.utils import to_categorical

# assign to both training and testing labels. 
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

We are now done with the previous steps and are ready for training! To do this, we use the *fit method* in keras, essentially "*fit*ting the model to its training data. 

In [0]:
# Assigning the fit method to training data. 
history = network.fit(train_images, train_labels, epochs=5, batch_size=128)

In [0]:
print(history.history.keys())

Done! that is it for training. 
In a later thread, I will elaborate in what all of this *training* and things related to training. It is definately important to understand what is going on under the hood. For now, I will summarize the *training (fit)* line of code. 
network.fit(train_images, train_labels, epochs=5, batch_size=128) 
in the dataset we defined, put in the **images** and the respective **labels**: (train_images, train_labels,. Now define how many epochs you want to cover. **Epochs** means one full iteration through the data. So five epochs here, means the network iterates over the complete training data five times. **The batch size** refers to how many images the model processing at once ("Deep-learning models don’t process an entire dataset at once; rather, they break the data into small batches."-Deep learning with Python, Françis Chollet). 


It also important to note that during training, the two values, loss and accuracy, are stated for each epoch. The 5/5 epoch accuracy for this network reached an accuracy of 0.9875 or 98.7% 

Although we already see the accuracy numbers during training, we now must perform the ***test*** step. This is because we need to see how the network performs on data that it has never seen before. This gives rise to the concept of **overfitting** which we will define later, in another thread, in depth. Briefly, overfitting occurs when the network performs significantly worse on the test data then on the training data. Thus, to see if the network overfits, we perform the *test* step. 

In [0]:
test_loss, test_acc = network.evaluate(test_images, test_labels)

In [0]:
print('Test accuracy: ', test_acc)

In [0]:
print('Test loss: ', test_loss)

Now we are done! The model does somewhat overfit as the test accuracy is 98% which is slightly lower than 99.7%. 

We can stop here, but for the purpose of further understanding and research, we can use some of pythons other libraries to help us visualize the results. 

Lets start with visualizing the training accuracy through **matplotlib** using **numpy**: 

In [0]:
import matplotlib.pyplot as plt
import numpy 
plt.plot(history.history['acc'])
plt.title('Training Accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train'], loc='upper left')
plt.show()

Next we can visualize the training loss: 

In [0]:
plt.plot(history.history['loss'])
plt.title('Training Loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train'], loc='upper left')
plt.show()

At the moment, we cannot visualize the test loss and accuracy. However, you can usually visualize "*validation*", but for this model, we didn't divide the data into validation. 