### Import Libraries

In [26]:
import numpy as np
import argparse
import cv2
from cnn.neural_network_prog import CNN
from keras.utils import to_categorical
from keras.optimizers import SGD
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split


### Argument Parsing

1. **`argparse.ArgumentParser()`**:
   - This line initializes a new argument parser object which will be used to handle command-line arguments. The parser is stored in the variable `ap`.

2. **`ap.add_argument()`**:
   - This function is used to specify which command-line options the program is willing to accept. Each `add_argument()` method defines a single argument:
     - `"-s", "--save_model"`: This argument allows the user to specify whether the model should be saved after training. It expects an integer value. If it's set to a positive value, the model will be saved. The default value is `-1`, meaning the model will not be saved by default.
     - `"-l", "--load_model"`: This determines whether a pre-trained model should be loaded before training begins. Similar to `save_model`, it takes an integer and defaults to `-1`, indicating that no model is loaded by default.
     - `"-w", "--save_weights"`: This option allows the user to specify a file path where the model's weights should be saved. It takes a string input, which should be the path to the file.

3. **`vars(ap.parse_args())`**:
   - `parse_args()` parses the arguments provided at the command line when the script is executed (it looks at the arguments passed after the command like `python CNN_MNIST.py -s 1 -l 0 -w weights.hdf5`). The `vars()` function then converts the parsed arguments into a dictionary. This dictionary (`args`) allows easier access to the arguments in the script via keys matching the names of the arguments (`save_model`, `load_model`, `save_weights`).

By using these arguments, the script can be configured to behave differently based on user input without changing the code, making it flexible for different training conditions or when deploying the model in various environments.

In [27]:
# Define default arguments or use a dictionary to simulate command-line argument parsing
args = {
    "save_model": -1,
    "load_model": -1,
    "save_weights": None
}

# If you have specific values you want to test, you can set them here
# For example:
# args["save_model"] = 1  # Simulate passing '-s 1'
# args["load_model"] = 0  # Simulate passing '-l 0'
# args["save_weights"] = 'path_to_weights_file.hdf5'  # Simulate passing '-w path_to_weights_file.hdf5'


### Load and Prepare MNIST Data


1. **Loading the MNIST Dataset**:
    ```python
    print('Loading MNIST Dataset...')
    dataset = fetch_openml('mnist_784')
    ```
    - The `fetch_openml` function from `sklearn.datasets` is used to download the MNIST dataset. The dataset is hosted on OpenML, which is a public repository for sharing datasets for research. The parameter `'mnist_784'` specifies the specific version of the MNIST dataset where images are flattened into 784-dimensional vectors (28x28 pixels flattened).

2. **Reshaping the Data**:
    ```python
    mnist_data = dataset.data.reshape((dataset.data.shape[0], 28, 28))
    mnist_data = mnist_data[:, np.newaxis, :, :]
    ```
    - The MNIST data originally comes in a flat format where each image is a single array of 784 pixel values. The first operation reshapes this flat array into a 28x28 matrix, which represents the original shape of the images.
    - The second operation adds a new axis, transforming the array shape from `(n, 28, 28)` to `(n, 1, 28, 28)`. The additional axis is used to represent the single color channel of the grayscale images, which is a standard format for CNNs that deal with single-channel (grayscale) images.

3. **Dividing the Data into Training and Testing Sets**:
    ```python
    train_img, test_img, train_labels, test_labels = train_test_split(mnist_data/255.0, dataset.target.astype("int"), test_size=0.1)
    ```
    - The `train_test_split` function from `sklearn.model_selection` is used to split the dataset into training and testing sets. This function partitions the data into two subsets in a random manner.
    - `mnist_data/255.0` scales the pixel values from a range of 0-255 to 0-1. This normalization is a common practice in machine learning and helps to speed up the training by reducing the variability in the input data.
    - `dataset.target.astype("int")` converts the target labels (originally stored as strings in the dataset) to integers, which are necessary for classification tasks.
    - `test_size=0.1` specifies that 10% of the data should be reserved for testing, with the remaining 90% used for training.



In [28]:
print('Loading MNIST Dataset...')
dataset = fetch_openml('mnist_784')

# Convert DataFrame to NumPy array and reshape
mnist_data = np.array(dataset.data).reshape((dataset.data.shape[0], 28, 28))
mnist_data = mnist_data[:, np.newaxis, :, :]

# Divide data into testing and training sets.
train_img, test_img, train_labels, test_labels = train_test_split(mnist_data/255.0, dataset.target.astype("int"), test_size=0.1)


Loading MNIST Dataset...


### Transform Labels

In [29]:
total_classes = 10  # 0 to 9 labels
train_labels = to_categorical(train_labels, 10)
test_labels = to_categorical(test_labels, 10)

### Define and Compile Model


1. **Stochastic Gradient Descent (SGD) Optimizer**:
   - `sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)`: Here, the Stochastic Gradient Descent optimizer is configured. Let's break down the parameters:
     - `lr=0.01`: Learning rate, which controls how much to change the model in response to the estimated error each time the model weights are updated.
     - `decay=1e-6`: Decay rate, which is used to reduce the learning rate over the epochs. It helps in fine-tuning the convergence by slowly reducing the steps of learning.
     - `momentum=0.9`: Momentum helps the optimizer to navigate along the relevant directions and smoothens the updates. It accelerates the SGD in the right direction.
     - `nesterov=True`: Nesterov momentum is a modification to the traditional momentum technique and has a slightly better convergence rate.

2. **CNN Model Building**:
   - `clf = CNN.build(width=28, height=28, depth=1, total_classes=10, Saved_Weights_Path=args["save_weights"] if args["load_model"] > 0 else None)`: This function call constructs the CNN architecture.
     - `width=28, height=28`: These parameters define the dimensions of the input images (28x28 pixels for MNIST).
     - `depth=1`: Indicates the number of color channels in the image. For MNIST, which are grayscale images, the depth is 1.
     - `total_classes=10`: Number of output classes. MNIST digits go from 0 to 9, so there are 10 classes.
     - `Saved_Weights_Path`: This is used to specify the path to the weights file. If `load_model` argument is greater than 0, it will try to load the pretrained weights from the specified path. Otherwise, it remains `None`, indicating that the model should be trained from scratch.

3. **Compiling the Model**:
   - `clf.compile(loss="categorical_crossentropy", optimizer=sgd, metrics=["accuracy"])`: This line compiles the model for training.
     - `loss="categorical_crossentropy"`: This is the loss function used for a multi-class classification problem. It is suitable for cases where each target class label is provided in a one-hot encoded format.
     - `optimizer=sgd`: The optimizer we configured earlier is used to minimize the loss function.
     - `metrics=["accuracy"]`: Metrics to evaluate the model during training and testing. Here, accuracy is the proportion of correctly predicted labels to total predictions.

### Summary:
This section sets up and prepares the neural network model for training by specifying the architecture, optimizer, and how the network should learn from the data. It bridges the model architecture with the actual training process, ensuring that the model is ready to fit and evaluate on the provided MNIST dataset.

In [30]:
print('\nCompiling model...')
sgd = SGD(learning_rate=0.01, decay=1e-6, momentum=0.9, nesterov=True)
clf = CNN.build(width=28, height=28, depth=1, total_classes=10, Saved_Weights_Path=args["save_weights"] if args["load_model"] > 0 else None)
clf.compile(loss="categorical_crossentropy", optimizer=sgd, metrics=["accuracy"])


Compiling model...


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


### Train or Load Model




1. **Initialization of Training Parameters:**
   - `b_size = 128`: This sets the batch size to 128. In machine learning, especially in training deep neural networks, the batch size is the number of samples processed before the model is updated.
   - `num_epoch = 20`: This specifies that the model should go through 20 complete passes of the training dataset.
   - `verb = 1`: This is a verbosity option which is common in many machine learning frameworks. A verbosity of 1 typically means that the model will output progress logs during training, allowing you to track the training progress.

2. **Condition Check for Model Training:**
   - `if args["load_model"] < 0`: This line checks the command-line argument for loading a model. If the `load_model` argument is less than zero, it implies that the user does not want to load a pre-trained model and instead opts to train the model from scratch.
   
3. **Model Training:**
   - `clf.fit(train_img, train_labels, batch_size=b_size, epochs=num_epoch, verbose=verb)`: This line initiates the training process. The method `fit` is used to train the model using the specified batch size, number of epochs, and verbosity. Here:
     - `train_img` and `train_labels` are the training data and labels, respectively.
     - `batch_size` dictates how many samples to work through before updating the internal model parameters.
     - `epochs` tells the model how many times to iterate over the entire dataset.
     - `verbose` controls the verbosity of the training process output.

4. **Model Evaluation:**
   - After training, the model's performance is evaluated on the test dataset using the `evaluate` method:
     - `loss, accuracy = clf.evaluate(test_img, test_labels, batch_size=128, verbose=1)`: This line calculates the model's loss and accuracy on the test dataset. Here, the test images (`test_img`) and labels (`test_labels`) are used to evaluate how well the model has learned and can generalize to new data.
     - The accuracy and loss are then printed to give an indication of model performance: `'Accuracy of Model: {:.2f}%'.format(accuracy * 100)` formats the accuracy as a percentage to make it more intuitive.



In [31]:
b_size = 128  # Batch size
num_epoch = 20  # Number of epochs
verb = 1  # Verbose
if args["load_model"] < 0:
    print('\\nTraining the Model...')
    clf.fit(train_img, train_labels, batch_size=b_size, epochs=num_epoch, verbose=verb)
    print('Evaluating Accuracy and Loss Function...')
    loss, accuracy = clf.evaluate(test_img, test_labels, batch_size=128, verbose=1)
    print('Accuracy of Model: {:.2f}%'.format(accuracy * 100))

\nTraining the Model...
Epoch 1/20
[1m493/493[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 7ms/step - accuracy: 0.4747 - loss: 1.6315
Epoch 2/20
[1m493/493[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 7ms/step - accuracy: 0.9381 - loss: 0.2051
Epoch 3/20
[1m493/493[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 7ms/step - accuracy: 0.9619 - loss: 0.1263
Epoch 4/20
[1m493/493[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 7ms/step - accuracy: 0.9714 - loss: 0.0931
Epoch 5/20
[1m493/493[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 7ms/step - accuracy: 0.9763 - loss: 0.0764
Epoch 6/20
[1m493/493[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 7ms/step - accuracy: 0.9795 - loss: 0.0655
Epoch 7/20
[1m493/493[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 8ms/step - accuracy: 0.9824 - loss: 0.0552
Epoch 8/20
[1m493/493[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 8ms/step - accuracy: 0.9843 - loss: 0.0514
Epoch 9/20
[1m4

### Save Model


In [32]:
if args["save_model"] > 0:
    print('Saving weights to file...')
    clf.save_weights(args["save_weights"], overwrite=True)

### Predict and Display Results


1. **Random Selection of Test Images**:
   - The script uses `np.random.choice()` to randomly select a few indices (specifically 5 in this case) from the range of all test images. This function is useful for creating a small, manageable sample to visualize predictions without having to process the entire test set.
   ```python
   for num in np.random.choice(np.arange(0, len(test_labels)), size=(5,)):
   ```

2. **Making Predictions**:
   - For each selected image, the script makes a prediction using the trained CNN model. The `clf.predict()` method is called on a single test image at a time (reshaped and indexed appropriately to match the input shape expected by the model).
   - `np.newaxis` is used here to add a batch dimension to the image, as Keras models expect input in batches—even if the batch size is 1.
   ```python
   probs = clf.predict(test_img[np.newaxis, num])
   ```

3. **Interpreting Predictions**:
   - The output of the model (`probs`) contains the probabilities of the image belonging to each class. The `argmax()` function is then used to find the index of the highest probability, which corresponds to the model's predicted label for the image.
   ```python
   prediction = probs.argmax(axis=1)
   ```

4. **Image Processing for Display**:
   - The test image (which was originally resized and normalized as part of preprocessing) is converted back to its 8-bit format using `astype("uint8")`.
   - Since the original images are single-channel (grayscale), they are converted into a 3-channel image by merging the single grayscale channel into a three-channel image (`cv2.merge([image] * 3)`). This is often done to make use of color in the display output, such as adding colored text labels.
   - The image is then resized to a larger size (100x100 pixels in this case) to make it easier to view and to add text annotations visibly.
   ```python
   image = (test_img[num][0] * 255).astype("uint8")
   image = cv2.merge([image] * 3)
   image = cv2.resize(image, (100, 100), interpolation=cv2.INTER_LINEAR)
   ```

5. **Adding Text Annotations**:
   - Text annotations are added to the image displaying the predicted label. This is done using OpenCV's `putText()` method, which allows you to specify the text, position, font style, color, and thickness directly on the image.
   - The prediction and the actual label (from `test_labels`) are printed to the console for reference.
   ```python
   cv2.putText(image, str(prediction[0]), (5, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 255, 0), 2)
   print('Predicted Label: {}, Actual Value: {}'.format(prediction[0], np.argmax(test_labels[num])))
   ```

6. **Displaying the Image**:
   - Although the script includes commented-out lines for displaying the image using OpenCV's `imshow()` and `waitKey()` functions, these are typically used in local script execution to display images in separate windows. They are commented out because they do not function within Jupyter notebooks or in non-GUI environments.

In [33]:
for num in np.random.choice(np.arange(0, len(test_labels)), size=(5,)):
    probs = clf.predict(test_img[np.newaxis, num])
    prediction = probs.argmax(axis=1)
    image = (test_img[num][0] * 255).astype("uint8")
    image = cv2.merge([image] * 3)
    image = cv2.resize(image, (100, 100), interpolation=cv2.INTER_LINEAR)
    cv2.putText(image, str(prediction[0]), (5, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 255, 0), 2)
    print('Predicted Label: {}, Actual Value: {}'.format(prediction[0], np.argmax(test_labels[num])))
    # cv2.imshow('Digits', image)
    # cv2.waitKey(0)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 103ms/step
Predicted Label: 0, Actual Value: 0
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 25ms/step
Predicted Label: 4, Actual Value: 4
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 26ms/step
Predicted Label: 0, Actual Value: 0
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step
Predicted Label: 8, Actual Value: 8
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 25ms/step
Predicted Label: 2, Actual Value: 2
