## Setup

In [3]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from keras.datasets import mnist

## Prepare the data

Load the MNIST data. Returns a tuple (used to store multiple items in a single variable) training and test sets, 
where X_train and X_test are the image data, and y_train and y_test are the corresponding labels.

In [6]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Reshape and normalize the data
The CNNs expect a 4D tensor with the shape (batch_size, height, width, channels). 
This step is adding the missing channel dimension to the data, which is 1 in this case as the images are grayscale.
In the case of an RGB (Red, Green, Blue) image it would have been 3.

In [6]:
X_train = X_train.reshape(60000, 28, 28, 1)
X_test = X_test.reshape(10000, 28, 28, 1)

To ensure that the pixel values are in the range of 0 to 1, which is a common preprocessing step for image data.

In [6]:
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255 # why 255?
X_test /= 255

Convert the labels to categorical
This function converts the integer labels to a binary format where each label is represented as a one-hot encoded vector. 
This step is necessary because the final output layer of the network uses a softmax activation function 
which expects the labels to be in this format. 
The input argument 10 means that we have 10 classes (0-9 digits).

In [6]:
from keras.utils import to_categorical
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

## Build the model

Create a sequential model
A sequential model is a linear stack of layers, where the output of one layer is the input of the next.

In [None]:
model = Sequential()

In [None]:
# Add a convolutional layer with 32 filters, a kernel size of 3x3, and a ReLU activation function
# The ReLU activation function is a simple equation that takes the input of a neuron and returns the input if it is positive, 
# and returns 0 if it is negative. 

In [None]:
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1))) # input is a 28x28 image with 1 color channel.

# Add a max pooling layer with a pool size of 2x2
# This layer applies a max operation over a 2x2 window of the input, reducing the spatial dimensions of the input by half.
model.add(MaxPooling2D(pool_size=(2, 2)))

# Add a convolutional layer with 64 filters, a kernel size of 3x3, and a ReLU activation function
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))

# Add a max pooling layer with a pool size of 2x2
model.add(MaxPooling2D(pool_size=(2, 2)))

# Flatten the output from the previous layers
model.add(Flatten())

# Add a fully connected layer with 128 units and a ReLU activation function
# This layer has 128 neurons and it is fully connected to the previous layer.
model.add(Dense(128, activation='relu'))

# Add a final output layer with 10 units and a softmax activation function
# The softmax function is used to convert the output of the final layer into probability distribution over 10 possible classes.
model.add(Dense(10, activation='softmax'))

## Train the model

Compiling the model with a categorical crossentropy loss function and an Adam optimizer
- loss: This argument specifies the loss function that the model should use to measure its performance during training. A loss function is a mathematical equation that measures how well the model is able to make predictions. 
- optimizer: This argument specifies the optimization algorithm that the model should use to update its weights during training.
- metrics: this argument specifies the metric(s) that the model should use to evaluate its performance during training.

In [None]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model on the training data
# X_train and y_train: These arguments specify the training data and labels. 
# X_train is the input data and y_train is the corresponding target data.
# epochs: This argument specifies the number of times the model should iterate over the entire training data.
# batch_size: This argument specifies the number of samples per gradient update. ??
model.fit(X_train, y_train, epochs=10, batch_size=32)

# The model will be trained on the X_train data with the corresponding y_train labels using the categorical crossentropy loss function and the Adam optimizer for 10 epochs with a batch size of 32. 
# The training process will be evaluated with the accuracy metric.

## Evaluate the trained model

In [None]:
# Evaluate the model on the test data
test_loss, test_acc = model.evaluate(X_test, y_test)
print('Test accuracy:', test_acc)