In [None]:
# Import Keras library
import keras
# Print the version of Keras to verify the installation
keras.__version__

'2.10.0'

The problem we are trying to solve here is to classify grayscale images of handwritten digits (28 pixels by 28 pixels), into their 10 categories (0 to 9). The dataset we will use is the MNIST dataset, a classic dataset in the machine learning community, which has been around for almost as long as the field itself and has been very intensively studied. It's a set of 60,000 training images, plus 10,000 test images, assembled by the National Institute of Standards and Technology (the NIST in MNIST) in the 1980s. You can think of "solving" MNIST as the "Hello World" of deep learning -- it's what you do to verify that your algorithms are working as expected.

In [3]:
# Import the MNIST dataset from Keras
from keras.datasets import mnist
# Load the dataset into training and testing sets
# The data is split between train_images, train_labels, test_images, test_labels
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [3]:
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [5]:
# Check the shape of the training images (60000 images, each 28x28 pixels)
train_images.shape

(60000, 28, 28)

In [None]:
# Check the shape of the training labels (60000 labels)
train_labels.shape

(60000, 28, 28)

In [None]:
# Check the shape of the test images (10000 images, each 28x28 pixels)
test_images.shape

(60000,)

In [None]:
# Check the shape of the test labels (10000 labels)
test_labels.shape

(10000, 28, 28)

In [None]:
# Get the number of training images
len(train_images)

(10000,)

In [None]:
# Get the number of training labels
len(train_labels)

60000

In [None]:
# Get the number of test images
len(test_images)

60000

In [None]:
# Get the number of test labels
len(test_labels)

10000

In [None]:
# Display the first training image as a numpy array
train_images

10000

In [None]:
# Display the first training label as a numpy array
train_labels

array([[[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       ...,

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 

In [None]:
# Display the first test image as a numpy array
test_images

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

In [None]:
# Display the first test labels as a numpy array
test_labels

array([[[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       ...,

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 

In [17]:
test_labels

array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

Our workflow will be as follow: first we will present our neural network with the training data, train_images and train_labels. The network will then learn to associate images and labels. Finally, we will ask the network to produce predictions for test_images, and we will verify if these predictions match the labels from test_labels.

In [20]:
# Import the necessary modules from Keras
from keras import models
from keras import layers

In [None]:
# Initialize a sequential model
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28*28,)))
# Add a densely-connected layer with 10 units and softmax activation
network.add(layers.Dense(10, activation='softmax'))

The core building block of neural networks is the  "layer" a data processing module which you can concieve as a "filter" for data.

Some data comes in and come out in a more useful form.

Precisely, layers extract represntations out of the data fed into them

Most of deep learning really consists of chaining together simple layers which will implement a form of progressive "data distillations".

A deep learning model is like a sieve for data processing, made of a succession of increasingly refined data filters -- "the layers"


Here our network consists of a sequence of two Dense layers, which are densely-connected (also called "fully-connected") neural layers. The second (and last) layer is a 10-way "softmax" layer, which means it will return an array of 10 probability scores (summing to 1). Each score will be the probability that the current digit image belongs to one of our 10 digit classes.

To make our network ready for training, we need to pick three more things, as part of "compilation" step:

A loss function: the is how the network will be able to measure how good a job it is doing on its training data, and thus how it will be able to steer itself in the right direction.

An optimizer: this is the mechanism through which the network will update itself based on the data it sees and its loss function.

Metrics to monitor during training and testing. Here we will only care about accuracy (the fraction of the images that were correctly classified).

In [22]:
network.compile(optimizer='rmsprop',
                loss ='categorical_crossentropy',
                metrics=['accuracy'])

Here's the breakdown:

*   **Classification vs. Regression:**
    *   **Classification:** Assigning an input to a category. In this case, assigning a handwritten digit image to one of 10 categories (0-9).
    *   **Regression:** Predicting a continuous value. For example, predicting the price of a house.

*   **Why Categorical Crossentropy and Softmax?**
    *   The MNIST problem is treated as a multi-class classification problem. Even though the classes are numbers, the model is not trying to predict a continuous value representing the digit. Instead, it's trying to classify the image into one of the discrete digit categories.
    *   `categorical_crossentropy` is a loss function designed for multi-class classification. It compares the predicted probability distribution (output by softmax) with the true class label (represented as a one-hot encoded vector).
    *   `softmax` activation outputs a probability distribution over the classes. Each number will have a probability.

*   **Why not Regression?**
    *   You *could* theoretically treat it as a regression problem where you try to predict a value between 0 and 9. However, this would imply an ordinal relationship between the digits (e.g., 4 is "closer" to 5 than it is to 0), which isn't necessarily true for handwritten digits. A poorly written 4 might look more like a 9 than a well-written 5.
    *   Classification allows the model to learn distinct features for each digit without imposing any artificial relationships between them.

*   **In summary:**
    *   We use categorical labels and `categorical_crossentropy` because we're solving a multi-class classification problem.
    *   This approach allows the model to learn distinct features for each digit and output a probability distribution over the possible digits.

The use of categorical labels isn't directly related to the images themselves or the need to detect edges. It's about how we frame the problem (classification) and the appropriate tools (softmax, categorical crossentropy) for that framing.


Before training, we will preprocess our data by reshapingit into the shaope that the network expects, and scaling it so that all values are in the [0,1] interval. Previously, our training images for instance were stored in an array of shape(6000, 28,28) of type uint8 with values in the [0.255] interval. We transform it into a float32 array of shape (60000, 28*28)

In [25]:
train_images = train_images.reshape((60000, 28*28))
train_images = train_images.astype('float32')/ 255

test_images = test_images.reshape((10000, 28*28))
test_images = test_images.astype('float32')/255

We also need to categorically encode the labels

from keras.utils import to_categorical is used for one-hot encoding of the labels. In classification problems, especially multi-class classification, it's common to convert integer labels into a binary matrix. Each row of the matrix represents a data point, and each column represents a class. A '1' in a column indicates that the data point belongs to that class, while '0' indicates it does not. This representation is suitable for training neural networks with categorical crossentropy loss.



In [26]:
from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

We are now ready to train our network, which in Keras is done via a call to the fit method of the network: we "fit" the model to its training data.

In [28]:
network.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x2941bd6e3d0>


Two quantities are being displayed during training: the "loss" of the network over the training data, and the accuracy of the network over the training data.

We quickly reach an accuracy of 0.989 (i.e. 99.8%) on the training data. Now let's check that our model performs well on the test set too:

In [29]:
test_loss, test_acc = network.evaluate(test_images, test_labels)



In [30]:
print('test_acc:',test_acc)

test_acc: 0.9822999835014343



Using the model to make predictions

In [34]:
test_digits = test_images[0:10]
predictions = network.predict(test_digits)
predictions[0]



array([5.4777773e-14, 8.7355895e-20, 4.1645019e-13, 1.1220120e-08,
       1.4126993e-20, 4.9030926e-16, 9.6802423e-25, 1.0000000e+00,
       2.3782659e-14, 1.4821464e-12], dtype=float32)

In [35]:
predictions[0].argmax()

7

In [36]:
predictions[0][7]

1.0

In [37]:
test_labels[0]

array([0., 0., 0., 0., 0., 0., 0., 1., 0., 0.], dtype=float32)