# Training Model:

## Introduction:
This notebook covers the character recognition model's training process. We'll go over how to load in images, preprocess datasets for model compatibility, and train/evaluate the model. 

The functions that are covered in this notebook are stored in `image_operations.py`, `train_eval.py`, and `ocr_model.py` modules. Please note that the operations showcased here are modified for demonstration purposes and may be different in the provided modules.

# Import Packages:
The packages below are used to compile the notebook.

In [1]:
## Import necessary packages.
import tensorflow as tf
import os
import sys
import numpy as np
import time

# Initialize Working Directory:
Make sure that the notebook's workspace is located in the root folder of this project so that it can have access to all the modules and datasets.

In [2]:
## Initialize the base directory to be in root folder.
base_dir = os.path.abspath(os.path.join('.', '..'))

# Import Modules:
We'll also import some modules needed for this notebook.

In [8]:
## Import the necessary modules in CustomOCR.
sys.path.append(base_dir)
import CustomOCR.utils.file_operations as fo
import CustomOCR.ocr_model as om

# Load in Training Dataset:
We'll load in the training dataset as well as the validation and testing datasets. The dataset split is as follows: training is 64% of the entire dataset, validation is 16% of the entire dataset, and testing is 20% of the entire dataset. To look more into the splitting process, please go to the `create_train_test` and `create_train_val` functions in the `image_operations` module. 

I've already splitted the dataset into train, validate, and test images. These datasets are available for both `case_sensitive` and `case_insensitive` classes. For this demonstration, we'll be using the `case_sensitive` dataset.

In [4]:
## Load in the training, validating and testing datasts for case sensitive.
train_dict = fo.load_mat_data(os.path.join(base_dir, 'datasets', 'case_sensitive', 'train_aug_0.mat'))
val_dict = fo.load_mat_data(os.path.join(base_dir, 'datasets', 'case_sensitive', 'val.mat'))
test_dict = fo.load_mat_data(os.path.join(base_dir, 'datasets', 'case_sensitive', 'test.mat'))

# Modify Datasets for Model Training:
As of right now, each training, validating, and testing datasets are in dictionary form. Each dictionary has two keys: `images` and `labels`. The `images` section stores all the images for each class while the `labels` section stores all the unique labels covered in the dataset. The depiction of the format is shown below, where $m$ is the number of labels in the dataset.

\begin{equation*}
Images = \left\{
\begin{matrix}
  \text{Label 0:} & [image_{00}, image_{01}, \ldots] \\
  \text{Label 1:} & [image_{10}, image_{11}, \ldots] \\
  \vdots \\
  \text{Label m:} & [image_{m0}, image_{m1}, \ldots]
\end{matrix}
\right\}, 
\:
Labels = \left\{
\begin{matrix}
  \text{Label 0} \\
  \text{Label 1} \\
  \vdots \\
  \text{Label m}
\end{matrix}
\right\}
\end{equation*}

To train a model in Tensorflow, we need to have the `images` section to be in array format where the shape of the array should be one dimensional. Furthermore, instead of having a list of unique labels, we need to have an array containing corresponding labels to the individual images stored in our images array. Below shows the two vectory arrays for images ($\vec{v_{1}}$) and labels ($\vec{v_{2}}$), where $n$ is the total number of training images.

\begin{equation*}
\vec{v_{1}} = \begin{bmatrix}
                image_{0} \\
                image_{1} \\ 
                \vdots \\
                image_{n}
              \end{bmatrix}, 
\:
\vec{v_{2}} = \begin{bmatrix}
                label_{0} \\
                label_{1} \\
                \vdots \\
                label_{n}
              \end{bmatrix}
\end{equation*}

To achieve this, we can vertically stack all the images in the `images` section using `np.vstack`. This will create a one-dimensional array containing all the images in the given dataset. We can then copy the image array and fill it with the corresponding labels using `np.full`. This operation is shown below using the `stack_imgs_labels` function.

In [5]:
## Flattens the images into one vector and create corresponding labels.
def stack_imgs_labels(data_dict):
    '''
    Creates a one dimensional vector for images and labels.

    Args:
        data_dict (dictionary): A dictionary containing images and labels.

    Returns:
        np.array, np.array: A flattened array containing images and an array of labels.
    '''
    ## Initialize empty list for labels.
    combined_labels = []
    
    ## For each array containing images in data_dict, create the same number of labels.
    for i in range(len(data_dict['images'][0])):
        ## Get image for iteration.
        images = data_dict['images'][0][i]

        ## Create array storing the same number of labels as images.
        labels_arr = np.full((images.shape[0], 1), i, dtype = np.float32)

        ## Append to list of labels.
        combined_labels.append(labels_arr)

    ## Return flattened array of images and labels.
    return np.vstack(data_dict['images'][0]), np.vstack(combined_labels)

In [6]:
## Apply the stack_imgs_labels function to the train, validate, and test datasets
train_ds, train_labels = stack_imgs_labels(train_dict)
val_ds, val_labels = stack_imgs_labels(val_dict)
test_ds, test_labels = stack_imgs_labels(test_dict)

Before we train our model, we have to slightly modify our labels to be compatible with our categorical cross entropy loss function. Currently, our labels are stored as strings ("A", "b", "0", etc). To make sure our loss function is applied correctly, we need to re-format our labels to be one-hot encoded. This means we need to convert our categorical variables to numerical inputs, where the correct class is labeled as 1 and incorrect classes are labeled as 0. That way, we can compute gradient descent of our loss functions for backpropagation. To do this, we'll use the `to_categorical` function in Keras.

In [7]:
## Format our labels to be one-hot encoded.
train_labels = tf.keras.utils.to_categorical(train_labels, num_classes = 62)
val_labels = tf.keras.utils.to_categorical(val_labels, num_classes = 62)
test_labels = tf.keras.utils.to_categorical(test_labels, num_classes = 62)

# Training and Evaluating Model:

## Initializing and Compiling Model:
Now, we can start training the model. First, we'll initialize and compile our model with correct output classes. The default model uses categorical cross entropy loss function and the Adam optimizer. 

In [11]:
## Create our model using the ocr_model module.
ocr_model = om.CustomOCRModel()
ocr_model.initialize_model()
train_model = ocr_model.generate_model()

## Display the model's architecture.
train_model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_12 (Conv2D)          (None, 32, 32, 64)        1664      
                                                                 
 batch_normalization_12 (Bat  (None, 32, 32, 64)       256       
 chNormalization)                                                
                                                                 
 leaky_re_lu_15 (LeakyReLU)  (None, 32, 32, 64)        0         
                                                                 
 max_pooling2d_6 (MaxPooling  (None, 16, 16, 64)       0         
 2D)                                                             
                                                                 
 conv2d_13 (Conv2D)          (None, 14, 14, 128)       73856     
                                                                 
 batch_normalization_13 (Bat  (None, 14, 14, 128)     

## Training Model:
We'll now fit our training and validating datasets to our model. For this demonstration, we'll train the model over 20 epochs with a batch size of 32 and implement early stopping with a patience of 3 epochs. 

**NOTE:** The parameters for the actual training process is different. Please read `MORE_INFO` section for more information.

In [12]:
## Define early stopping with patience of three epochs.
early_stopping = tf.keras.callbacks.EarlyStopping(
    monitor = 'val_loss', 
    patience = 3, 
    restore_best_weights = True
)

## Start the training process.
print("Starting Model Training:\n")
start_time = time.time()
with tf.device('/GPU:0'):
    train_history = train_model.fit(
        train_ds, train_labels, 
        epochs = 15, 
        batch_size = 32, 
        shuffle = True, 
        validation_data = (val_ds, val_labels), 
        callbacks = [early_stopping]
    ) 
end_time = time.time()
print("Model Training Finished!\n\n")
print(f"Time: {end_time - start_time}")

Starting Model Training:

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Model Training Finished!


Time: 98.80996537208557


## Evaluating Model:
Using our testing dataset, we'll evaluate the saved model to see how accurately it can predict images that it hasn't seen during it's training process.

In [13]:
## Evaluate trained model on testing dataset.
print("Starting Model Evaluation:\n")
with tf.device('/GPU:0'):
    test_loss, test_acc = train_model.evaluate(test_ds, test_labels, batch_size = 32)
## Print out evaluation metrics.
print("Model Evaluation Finished!\n")
print(f'Test Loss: {test_loss:.4f}')
print(f'Test Accuracy: {test_acc:.4f}')

Starting Model Evaluation:

Model Evaluation Finished!

Test Loss: 0.5597
Test Accuracy: 0.8546
