## Dataset

We will use tensorflow's `mnist` dataset, which allows us to classify handwritten numbers.

The `mnist` dataset has been separated into:
- 60,000 samples for training
- 10,000 samples for testing

## Feature

Each data has two features: `image`, `label`
- `image` has the class of `Image`, with the shape of (`x_pixel`, `y_pixel`, `color_channel`), e.g., (28, 28, 1), which means 28 by 28 pixel with the color channel of 1 meaning black and white.
  - The `color_channel` will be 3 if it is colored (3 stands for Red, Green, and Blue).

In [159]:
import pandas as pd
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds
import numpy as np
import math
import matplotlib.pyplot as plt

In [None]:
# TONOTE: Use tfds.load('dataset_name') to load datasets from TensorFlow
data, metadata = tfds.load('mnist', as_supervised=True, with_info=True)
# TONOTE: `as_supervised` tells TensorFlow to load the dataset in a supervised format, meaning each data point is returned as a tuple (input, label). For example, (image, label).
# TONOTE: `with_info` includes metadata about the dataset, such as its description, version, and features. It’s like getting a user manual along with the dataset.

In [131]:
data



In [132]:
metadata



In [133]:
# Prepare the train and test data

data_train = data['train']
data_test = data['test']

In [134]:
data_train



In [135]:
metadata.features['label']



In [None]:
# `metadata.features['label']` is a ClassLabel, according to the tensorflow docs, it has an attribute called `.names` which returns the string names of the classes. Since the `num_classes=10`, the string name defaults to ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
class_names: list = metadata.features['label'].names

In [None]:
# TONOTE: Each pixel ranges from 0 to 255 (which is represented by 1 byte), it's a good idea to normalize the data before training, because all models work much between if the input values are scaled to smaller numbers.

# TONOTE: We will do this for both the training and testing data. To not repeat ourselves, first we will create a function called normalizer, then using python's map function to map each data to the normalizer function

def normalizer(images, labels):
    # TONOTE: Since the image data are integers from 0 to 255, normalizing it will make it float instead, so we'd better convert the numbers to float32 first
    images = tf.cast(images, tf.float32)
    # We want to convert the numbers from 0.0 to 255.0 to 0.0 and 1.0, we will divide them by 255
    images = images / 255
    
    return images, labels

data_train = data_train.map(normalizer)
data_test = data_test.map(normalizer)

# TONOTE: Save data to cache to process faster from the second time on
data_train = data_train.cache()
data_test = data_test.cache()

In [138]:
# Create a new dataset that contains the first element from `data_train` using tensorflow's take function
first_element = data_train.take(1)

# Iterate over the first element to inspect it
for images, labels in first_element:
    # print(images, labels)
    break
# TONOTE: The `for` loop seems to not do anything, but with it, the variables `images` and `labels` have been set and ready to be used.



In [139]:
# Plotting a sample image

plt.figure()
plt.imshow(images)
plt.show()



In [140]:
images.shape



In [141]:
# Replot with to a grayscale image

plt.figure()
plt.imshow(images, cmap=plt.cm.binary) # TONOTE: The images actually only has one channel of color (1), that is the grayscale color, we still need to specify the cmap value here though. If we don't, the system will randomly adds colors to the plot instead of the grayscale.
plt.colorbar() # Here we see that the value is between 0 to 1, instead of 0 to 255
plt.show()



In [142]:
# Let's show the first 25 images
plt.figure(figsize=(10, 10))

for i, (images, labels) in enumerate(data_train.take(25)): # TONOTE: We use `.take(n)` to get the fist `n` elements from data_train
    # TONOTE: Use subplot of matplotlib to return multi-plots by specifying the number of row, column, and index
    plt.subplot(5, 5, i+1)
    plt.imshow(images, cmap=plt.cm.binary)

plt.show()





## Creating the model to train our data

Our data is of the shape (28, 28, 1), which is a 3D tensor. We will flatten this to a 1D tensor of shape (784,).

In [None]:
# TONOTE: Flattening data which will be used as the input feature in the first layer
flatten_data = keras.layers.Flatten(input_shape=(28,28,1))
# TONOTE: The `keras.layers.Flatten` method will convert the (28, 28,1) input data into the shape of (28*28,) or (784,)



Create a model:

In [None]:
model = keras.Sequential([
    # TONOTE: This flatten data is the first layer, which acts like the Input Layer in other models. However, since the name `Input Layer` logically ties with `keras.layers.Input(...)` and the used of `keras.layers.Input(...)` in the sequential model should be avoided, it's best to not call it `Input Layer` here.
    flatten_data, # first layer
    # Hidden layer
    keras.layers.Dense(1),
    # Output layer
    keras.layers.Dense(10, activation=tf.nn.softmax) # Because there are possible 10 answers, namely 0, 1, 2, ..., 9
])

In [None]:
# # TONOTE:
# keras.Sequential([
#     keras.layers.Flatten(input_shape=(28,28,1)),
#     keras.layers.Dense(1),
#     keras.layers.Dense(10, activation=tf.nn.softmax)
# ])

# # is the same as:

# keras.Sequential([
#     keras.layers.Dense(1, input_shape=(784,)),
#     keras.layers.Dense(10, activation=tf.nn.softmax)
# ])






Compile the model:

In [20]:
model.compile(
    optimizer="adam",
    loss=keras.losses.SparseCategoricalCrossentropy(),
    metrics=['accuracy']
)

In [None]:
# TONOTE: 
# keras.losses.SparseCategoricalCrossentropy() is a loss function, specifically designed for multi-class classification problems.
# - It's a variant of the categorical cross-entropy loss function, but it's used when the labels are integers instead of one-hot encoded vectors. This means that instead of having a vector of probabilities for each class, you have a single integer representing the class label.

# - SparseCategoricalCrossentropy is a good choice when:

#   - We have a multi-class classification problem (e.g., image classification with multiple classes)
#   - Our labels are integers (not one-hot encoded)
#   - It's not specific to image classification, but it can be used for image classification problems. In fact, it's a popular choice for image classification tasks, especially when using convolutional neural networks (CNNs).

# However, if we're working with a binary classification problem (e.g., image classification with only two classes), we might want to use keras.losses.BinaryCrossentropy() instead.

# In general, the choice of loss function depends on the specific problem you're trying to solve, so it's always a good idea to consider the characteristics of your problem and choose the most suitable loss function.

In [None]:
# TONOTE: We also see that we can choose our metrics while compiling the model: `metrics=['accuracy']`

In [149]:
# Check the length of data_train and data_test
print(f"Training Data Length: {len(data_train)}")
print(f"Testing Data Length: {len(data_test)}")



In [None]:
# TONOTE: With so many training data, we should do some pre-processing to optimize the model's performance, making it run faster and more efficiently. Create a batch size to train the model in a batch of `n` every time, instead of running one by one.:

batch_size = 32

data_train = data_train.repeat().shuffle(60000).batch(batch_size)
data_test = data_test.batch(batch_size)  # TONOTE: Since data_test only has 10000 records, it's not necessary to use the `.repeat()` and `.shuffle()` methods

# TONOTE: TensorFlow data is a stream. By default, it does not repeat or cycle like the way some finite and static data (like Numpy array) does. We must explicitly use `.repeat()` to make the dataset cycle indefinitely for multiple epochs.

# TONOTE: The shuffle value should be equal to the entire dataset size (number of dataset).

# TONOTE: The `.batch()` method allows us to run the model in batch, instead of running one by one.

Train the model:

In [None]:
# Show the first image
import matplotlib.pyplot as plt

plt.figure()
plt.imshow(images, cmap=plt.cm.binary) # the `images` variable is actually the one set in the for loop in the immediate upper cell
plt.colorbar() # we see that it goes from 0 to 1, instead of 0 to 255
plt.show()



In [None]:

history = model.fit(
    data_train, 
    epochs=10,
    steps_per_epoch=math.ceil(60000/batch_size)
)
# TONOTE: Since we train the model in batch, we need to tell it how many batches to expect by specifying the `steps_per_epoch` value



In [None]:
# Show the first image
import matplotlib.pyplot as plt

plt.figure()
plt.imshow(images, cmap=plt.cm.binary) # the `images` variable is actually the one set in the for loop in the immediate upper cell
plt.colorbar() # we see that it goes from 0 to 1, instead of 0 to 255
plt.show()



In [None]:
# We see that, with only one simple hidden layer, the accuracy of 42% is very low.

# Try adding more layers with more neurons and activation function in the hidden layer
model = keras.Sequential([
    # First layer
    flatten_data,
    # Hidden layer
    keras.layers.Dense(50, activation=tf.nn.relu),
    keras.layers.Dense(50, activation=tf.nn.relu),
    # Output layer
    keras.layers.Dense(10, activation=tf.nn.softmax)
])

In [152]:
# Recompile the model
model.compile(
    optimizer='adam',
    loss=keras.losses.SparseCategoricalCrossentropy(),
    metrics=['accuracy']
)

In [153]:
# Retrain the model
history = model.fit(
    data_train,
    epochs=10,
    steps_per_epoch=math.ceil(60000/batch_size)
)



Plot a grid with multiple predictions, respectively marking correct and incorrect ones as blue and red:

In [157]:
from matplotlib.container import BarContainer

# TONOTE: Get the first batch of test data for a quick inspection and make predictions, as testing the entire dataset is too time consuming
for images_test, labels_test in data_test.take(1):
    # TONOTE: Convert the images_test and labels_test from TensorFlow tensors into numpy arrays so that they can be easily used with Matplotlib
    images_test: np.ndarray = images_test.numpy()
    labels_test: np.ndarray = labels_test.numpy()
    # TONOTE: `images_test.shape` returns (32, 28, 28, 1), 32 is the batch_size, 28 is the height, 28 is the height, 1 is the color channel (grayscale). The shape already matches the one expected by `model.predict`, so we don't need to reshape it.
    predictions: np.ndarray = model.predict(images_test)
    
# Function to plot individual images with their predictions
def plot_image(i: int, predictions: np.ndarray, true_labels: np.ndarray, images: np.ndarray) -> None:
    # Extract data for the specific index
    prediction, true_label, image = predictions[i], true_labels[i], images[i]
    prediction: np.ndarray
    true_label: np.int64
    image: np.ndarray
    
    # Remove grid and axis ticks for cleaner visualization
    plt.grid(False)
    plt.xticks([])
    plt.yticks([])
    
    # Display the image (only first channel as it's grayscale)
    plt.imshow(image[...,0], cmap=plt.cm.binary)
    # TONOTE: each image has the shape of (28, 28, 1), the ellipsis literal image[...,0] converts its shape to (28, 28). This is not necessary to convert the shape of (28, 28, 1) to (28, 28) though, as in this case, matplotlib can plot the exact same graph using both shapes, and it's just to show that it works when using the shape of (28, 28) though.
    
    # TONOTE: Next, we need to get the predicted label. `prediction` is a 1D numpy array with the shape of (10,0). The array may look like this: `[4.5158291e-12 2.1048035e-12 9.9999988e-01 8.6977693e-08 8.7193273e-12 2.4434133e-12 1.8026095e-11 1.8544542e-10 5.9155116e-08 7.4279107e-12]`, where each item is the predicted probability for each digit class (0 through 9). Therefore, to get the predicted label, we use `np.argmax` to return the "index" of the maximum value in the array
    predicted_label: np.int64 = np.argmax(prediction)
    
    # Set color for the predicted label: blue for correct predictions, red for incorrect
    color = 'blue' if predicted_label == true_label else 'red'
    
    print(f"Metadata: {metadata}")
    print(f"Metadata label: {metadata.features['label']}")
    print(f"Metadata label name: {metadata.features['label'].names}")
    
    # Add label showing: predicted digit, confidence percentage, and true digit
    ## TONOTE: This shows the predicted label, probability or percentage of the prediction, and the true label. By showing these values, the model's prediction can be easily evaluated.
    plt.xlabel("{} {:2.0f}% ({})".format(
        class_names[predicted_label],
        100*np.max(prediction),
        class_names[true_label]
    ), color=color)
    # TONOTE: The three placeholders, `{}`, `{:2.0f}`, and `({})`, match their corresponding values, namely `class_names[predicted_label]`, 100*np.max(prediction)`, and `class_names[true_label]`

# Function to plot probability distribution for each prediction
def plot_value_array(i: int, predictions: np.ndarray, true_labels: np.ndarray) -> None:
    prediction, true_label = predictions[i], true_labels[i]
    prediction: np.ndarray
    true_label: np.int64
    
    # Remove grid and axis ticks
    plt.grid(False)
    plt.xticks([])
    plt.yticks([])
    
    # Create bar chart of probability distribution (0-9)
    graph: BarContainer = plt.bar(range(10), prediction, color="#777777")
    plt.ylim([0,1])  # Set y-axis limit to 0-1 for probabilities
    ## TONOTE: Matplotlib's default behavior is to add a 10% margin to the y-axis limits to make the plot look more visually appealing. Setting `ylim([0,1])` is to ensure the y axis only range between 0 and 1, no more and no less.
    
    predicted_label: np.int64 = np.argmax(prediction)
    
    # Highlight bars: red for prediction, blue for true label
    ## TONOTE: Since the labels range from 0 to 9, this is the same as x axis values. To color a bar in this case, we can use the `.set_color('some_color')` method of the bar graph on the specific bar identified by graph[index], like graph[predicted_label] or graph[true_label]
    graph[predicted_label].set_color('red')
    graph[true_label].set_color('blue')
    
# Set up grid dimensions for visualization
num_row = 5
num_column = 5
num_images = num_row * num_column

# Create figure with subplots for both images and their probability distributions
plt.figure(figsize=(2*2*num_column, 2*num_row))
for i in range(num_images):
    # TONOTE: Create subplot for image (2*i+1 for left column) by specifying the number of rows, columns, and index (in this case: 1, 3, 5, ...)
    plt.subplot(num_row, 2*num_column, 2*i+1)
    plot_image(i, predictions, labels_test, images_test)
    # TONOTE: Create subplot for probability distribution (2*i+2 for right column); The index in this case is 2, 4, 6, ...
    plt.subplot(num_row, 2*num_column, 2*i+2)
    plot_value_array(i, predictions, labels_test)









# Others:

## Understand the Use of Ellipsis and ndarrays

In [52]:
images_test.shape



In [53]:
images_test[0]



In [58]:
images_test[0].shape



In [56]:
images_test[0][0]



In [59]:
images_test[0][0].shape



In [57]:
images_test[0][1]



In [55]:
images_test[0][...,0]



In [60]:
images_test[0][...,0].shape



In [61]:
images_test[0][...,0][0]



In [62]:
images_test[0][...,0][0].shape



In [66]:
# 4D array
array_4d = np.random.rand(2, 3, 4, 5)
print(array_4d.shape)  # Output: (2, 3, 4, 5)



In [None]:
[[[[1_1_1_1, 1_1_1_2, 1_1_1_3],
   [2_1_1_1, 2_1_1_2, 2_1_1_3]],
  
  [[1_1_2_1, 1_1_2_2, 1_1_2_3],
   [2_1_2_1, 2_1_2_2, 2_1_2_3]]],

 [[[1_2_1_1, 1_2_1_2, 1_2_1_3],
   [2_2_1_1, 2_2_1_2, 2_2_1_3]],

  [[1_2_2_1, 1_2_2_2, 1_2_2_3],
   [2_2_2_1, 2_2_2_2, 2_2_2_3]]]]

In [69]:
array_4d = np.random.rand(2, 3, 4, 5)
array_4d



In [72]:
np.random.seed(42)

array_4d_2 = np.random.rand(4, 2, 2, 3)
array_4d_2



In [None]:
array_4d_2[...,0]
# TONOTE: By using the ellipsis convention [...,k], an array with the shape of (a_(1), a_(2), ..., a_(n-1), a_(n)) will be converted into the shape of (a_(1), a_(2), ..., a_(n-1)), where k ∈ ℕ & k < n.

# TONOTE: If k = 0, it will take the value of the first index of each inner most column.



In [75]:
array_4d_2[...,0].shape



In [76]:
array_4d_2[...,1]



In [77]:
array_4d_2[...,2]



In [78]:
array_4d_2[...,3]



In [None]:
array_4d_2[0,...]

# TONOTE: In this case, we convert a (4, 2, 2, 3) array into the shape of (2, 2, 3). It takes only the first index of the outer most column.



In [80]:
array_4d_2[0,...].shape



In [165]:
# TONOTE: To take only some idex in the middle, use the `:` convention
array_4d_2[:,:,0,:]



In [82]:
arr = np.array([[5,3,2,3],[4,8,2,6],[8,2,3,0]])
print(arr)

# Ellipsis literal
print(f"Ellipsis literal output:- {arr[...,1]}.")

# general slice notation
print(f"general slice notation output:- {arr[:,1]}")

# Python Ellipsis 
print(f"Python Ellipsis output:- {arr[Ellipsis, 1]}")



In [85]:
arr.shape



In [83]:
arr[...,0]



In [86]:
arr[...,0].shape == (3,)



In [102]:
my_images = np.random.randint(0, 3, size=(3, 3))
my_images



## Understanding the Default Behavior of Matplotlib's color map

In [None]:
# TONOTE: No color mapping is set, but Matplotlib assigns random colors to the graph by default
plt.imshow(my_images)
plt.show()

# TONOTE: We see that, even if the my_images doesn't have a color channel, the system still randomly assigns colors to the graph



In [98]:
np.random.seed(42)
my_images_2 = np.random.rand(3, 3)
my_images_2



In [None]:
plt.imshow(my_images, cmap=plt.cm.binary)
plt.show()

# TONOTE: We have to set the cmap=plt.cm.binary to get the grayscale graph



In [107]:
# Create a sample RGBA image
images_rgba = np.random.randint(0, 256, size=(256, 256, 4), dtype=np.uint8)

plt.imshow(images_rgba)
plt.show()



In [108]:
plt.imshow(images_rgba, cmap=plt.cm.binary)
plt.show()



In [126]:
np.argmax(predictions[2])



# Detailed Key Learning Points

1. **Data Batch Processing**
   - When testing models, processing the entire dataset can be time-consuming
   - Using `.take(1)` helps us get a small batch for quick inspection
   - Example: `data_test.take(1)` gets first batch of 32 images

2. **Data Type Conversion**
   - TensorFlow tensors need conversion to NumPy arrays for Matplotlib compatibility
   - Using `.numpy()` converts TensorFlow tensors to NumPy arrays
   - Important for visualization and manipulation
   - Example: `images_test = images_test.numpy()`

3. **Image Shape Understanding**
   - Image shape: `(32, 28, 28, 1)`
     - 32: batch size (number of images)
     - 28: image height in pixels
     - 28: image width in pixels
     - 1: color channel (grayscale)

4. **Image Dimension Handling**
   - Ellipsis notation `image[...,0]` converts shape from `(28, 28, 1)` to `(28, 28)`
   - Both shapes work with Matplotlib
   - Example:
     ```python
     # Both work the same
     plt.imshow(image[...,0])  # Shape: (28, 28)
     plt.imshow(image)         # Shape: (28, 28, 1)
     ```

5. **Prediction Array Understanding**
   - Model outputs 1D array with 10 probabilities (0-9)
   - Example output:
     ```python
     [4.5e-12, 2.1e-12, 0.99999, 8.6e-08, 8.7e-12, 2.4e-12, 1.8e-11, 1.8e-10, 5.9e-08, 7.4e-12]
     ```
   - `np.argmax()` finds index with highest probability (predicted digit)

6. **Matplotlib Visualization Controls**
   - Y-axis limits control using `plt.ylim([0,1])`
   - Prevents default 10% margin addition
   - Ensures probability visualization stays between 0 and 1

7. **Bar Chart Color Manipulation**
   - Bar graphs stored as `BarContainer` objects
   - Individual bars accessible by index
   - Color coding:
     - Red: predicted label
     - Blue: true label
   - Example: `graph[predicted_label].set_color('red')`

# Detailed Model Training Process

1. **Data Acquisition**
   - Import MNIST dataset using TensorFlow datasets
   - Dataset contains 70,000 grayscale images of handwritten digits
   - Split into training (60,000) and test (10,000) sets

2. **Data Preprocessing**
   - Normalize pixel values from 0-255 to 0-1 range
   - Convert data to TensorFlow dataset format
   - Create batches for efficient processing
   - Apply prefetch for optimization

3. **Model Architecture**
   - Create sequential model
   - Add Flatten layer to convert 2D images to 1D arrays
   - Add Dense layers with appropriate activation functions
   - Configure output layer with 10 neurons (0-9 digits)

4. **Model Configuration**
   - Set loss function (`keras.losses.SparseCategoricalCrossentropy()`)
   - Choose optimizer (Adam)
   - Define metrics (accuracy)

5. **Training**
   - Feed training data through model
   - Adjust weights based on loss
   - Monitor accuracy improvements
   - Validate against test data

6. **Testing and Visualization**
   - Select batch of test images
   - Make predictions
   - Display results in grid format:
     - Left column: actual images with predictions
     - Right column: probability distributions
   - Color-code correct (blue) and incorrect (red) predictions

7. **Results Analysis**
   - Examine prediction accuracy
   - Analyze confidence levels
   - Identify patterns in misclassifications
   - Evaluate model performance metrics

This process creates a complete pipeline from raw data to trained model with visual verification of results.