### References

**Book:**
- Deep Learning with Python, Second Edition
  - Book by François Chollet
  - François Chollet is a French software engineer and artificial intelligence researcher currently working at Google. Chollet is the creator of the Keras deep-learning library, released in 2015, and a main contributor to the TensorFlow machine learning framework.


### Process Outline

![image.png](attachment:image.png)

### Code

In [18]:
# import MNIST dataset from Tensorflow
from tensorflow.keras.datasets import mnist

import numpy as np

from tensorflow import keras
from tensorflow.keras import layers

In [2]:
# load data
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [7]:
# Shape => samples, height, width
print(train_images.shape)
print(train_images.ndim)

(60000, 28, 28)
3


In [8]:
print(train_labels.shape)
print(train_labels.ndim)

(60000,)
1


In [5]:
test_images.shape

(10000, 28, 28)

In [6]:
test_labels.shape

(10000,)

In [9]:
# Looking at first sample from tarin set
train_images[0]

array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   3,
         18,  18,  18, 126, 136, 175,  26, 166, 255, 247, 127,   0,   0,
          0,   0],
       [  

In [11]:
# shape of each image (height, width)
print(train_images[0].shape)

(28, 28)


In [12]:
# Looking at the label of the first sample
print(train_labels[0])

5


In [17]:
# Number of unique labels in the dataset
print(np.unique(train_labels))
print(np.unique(test_labels))

[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]


In [19]:
# Creating the model
model = keras.Sequential(
    [
        # single hidden layer
        layers.Dense(512, activation="relu"),

        # output layer
        layers.Dense(10, activation="softmax")
        
        ])




In [20]:
# Compile the model
model.compile(
    optimizer="rmsprop",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
    )




In [21]:
# Preparing the data to feed into the model
train_images = train_images.reshape((60000, 28 * 28))

In [22]:
# Updated shape of train images
train_images.shape

(60000, 784)

In [24]:
# Looking at first sample image
train_images[0]

array([  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   3,  18,  18,  18,
       126, 136, 175,  26, 166, 255, 247, 127,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,  30,  36,  94, 154, 17

In [25]:
# Shape of the first sample image
train_images[0].shape

(784,)

So, we have flattened the 28 x 28 image into a vector or sequence of 784

In [26]:
# Scaling the values from the range [0, 255] to [0, 1] 
train_images = train_images.astype("float32") / 255

In [27]:
# Reshaping and Scaling the test images as well
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255

In [28]:
# Fitting the data into the model
model.fit(train_images,
          train_labels,
          epochs=5,
          batch_size=128)

Epoch 1/5


Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x22b621914b0>

So, we got the accuracy on the training data as 98.84% 

In [30]:
# We can use the model to make predictions
prediction = model.predict(test_images[0:1])
print(prediction)

[[1.0433870e-07 1.5465635e-09 8.3975901e-06 9.4912873e-05 5.0498516e-12
  2.0779607e-08 1.1855208e-12 9.9989617e-01 9.0380517e-08 3.2861848e-07]]


Note: We have used the syntax `test_images[0:1]` to extract the first test sample instead of `test_images[0]` because the former one returns a 1D array and the later one returns 2D array which is the requirement of the `predict` method.

In [32]:
# Returns 1D array
print(test_images[0].shape)

# Returns 2D array
print(test_images[0:1].shape)

(784,)
(1, 784)


In [33]:
# We can look at the softmax output as probabilities
print(prediction[0])

[1.0433870e-07 1.5465635e-09 8.3975901e-06 9.4912873e-05 5.0498516e-12
 2.0779607e-08 1.1855208e-12 9.9989617e-01 9.0380517e-08 3.2861848e-07]


In [34]:
# We can fetch the probability with the highest value
np.max(prediction[0])

0.99989617

In [35]:
# We can directly get the label (same as the index actually)
np.argmax(prediction[0])

7

So, the predicted label is 7

In [36]:
# We can check the actual label as well
test_labels[0]

7

We can see that the model prediction is correct.

We can check the prediction for all the test data and check the accuracy of the model 

In [37]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"test_acc: {test_acc}")
print(f"test_loss: {test_loss}")

test_acc: 0.9796000123023987
test_loss: 0.06911325454711914


The test-set accuracy turns out to be 97.96%. That’s quite a bit lower than the training-set accuracy (98.84%).

This gap between training accuracy and test accuracy is an example of overfitting.