<a href="https://colab.research.google.com/github/mkayanda/deep-_learning/blob/main/Mnist.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#loading the MNIST dataset in Keras

from tensorflow.keras.datasets import mnist

In [None]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [None]:
train_images.shape

(60000, 28, 28)

In [None]:
len(train_images)

60000

In [None]:
train_labels

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

In [None]:
#test Data
test_images.shape

(10000, 28, 28)

In [None]:
len(test_images)

10000

In [None]:
test_labels

array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

We reshape the data into the shape the model expects and scale it so that all the values are in the [0,1] interval. 

In [None]:
#preparing the image data

train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255 
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255 


First, feed the training data (train_images & train_labels) to the NN. The network learns to associate images and labels. Then finally, the network will make predictions for test_images and we'll verify this with test_labels

In [None]:
# the network architecture

from tensorflow import keras
from tensorflow.keras import layers


In [None]:
model = keras.Sequential([
                          layers.Dense(512, activation='relu'),
                          layers.Dense(10, activation="softmax")
])

To explain what is happening at the architecture, layers are the core building blocks of Nueral Network. Layers extact representations out of the data fed into them. A layer is essentially a filter for data; some data goes in, comes out in a more useful form. A deep learning model is like a sieve for data processing, made of a succession of increasing refined data filters aka layers!

In our model above, we have two Dense layers (fully connected neural layers). The last layer (2nd) a 10 way softmax classification layer, returns an array of 10 probability scores with each score being the probability that the current digit image belongs to one of our digit classes. 

In summary, to make the model ready for training we need to pick three more things as part of the compilation step:

      1. An Optimizer - this is the mechanism through which the model will update itself based on the training data it sees, so as to improve its performance. 
      2. A loss function - how a model will be able to steer itself in the right direction by measuring its performance on the training data.
      3. Metrics to monitor during the training and testing. - Accuracy is our main goal here, in other words, the fraction of the images that were correctly classified. 

In [None]:
# The compilation step

In [None]:
model.compile(optimizer="rmsprop",
              loss= "sparse_categorical_crossentropy",
              metrics = ['accuracy'])

In [None]:
# Fitting the model 

In [None]:
model.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f5f15cbc410>

Having trained the model, we can use it to predict class probabilities for new digits that the model hasn't seen yet; The test data.

In [None]:
test_digits = test_images[0 : 10]
predictions = model.predict(test_digits)
predictions[0]

array([4.9237603e-10, 1.9556004e-12, 2.2516696e-07, 7.3782699e-06,
       7.0213041e-12, 5.8821024e-09, 4.0760640e-15, 9.9999189e-01,
       5.6180347e-08, 4.6208572e-07], dtype=float32)

Each number of index i in that array corresponds to the probability that digit image test_digits[0] belongs to class i .
 This first test digit has the highest probability score (0.99999106, almost 1) at index 7, so according to our model, it must be a 7:


In [None]:
predictions[0].argmax()

7

In [None]:
predictions[0][7]

0.9999919

In [None]:
test_labels[0]

7

In [None]:
#Evaluating the model on new data

In [None]:
test_lost, test_acc = model.evaluate(test_images, test_labels)
print(f'test_acc: {test_acc}')


test_acc: 0.9803000092506409


Comparing the results of test_acc and our model accuracy, we see that there is a gap (a bit lower on test_acc). This gap is showing overfitting; when models tend to perfrom worse on unseen data than they did on training data. 