#2. **Train DigitNet (Final Model using Transfer Learning)**

Here we will train the final model using transfer
learning. The idea of transfer learning is to take
knowledge gained from solving one problem and apply this knowledge on a similar, but different, problem. We can transfer the knowledge gathered from the MNIST-dataset and apply it to our dataset. In this case, we can take our model trained on the MNIST-dataset, freeze all parameters, change the output layer and then train the model on our the new dataset. By freezing the parameters from the old model we can ensure that only the last classification layer gets trained. A nice additional benefit from this is that it will also reduce the training time significantly because we are only training the variables of our last classification layer and not the entire model.

The dataset used is provided by [EmpanS](https://github.com/Empans). It has almost 1282 labeled pictures which we now can use to train our final model.




In [None]:
# Install useful dependencies
!pip install numpy==1.18.5
!pip install matplotlib==3.2.2
!pip install tensorflow==2.3.0
!pip install opencv-python==4.1.2.30
!pip install scikit-learn==0.22.2

In [None]:
# Importing useful libraries
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import load_model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint
import os
import os.path as path
try:
    import cv2
except:
    from cv2 import cv2
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import zipfile

BATCH_SIZE = 32
NUM_CLASSES = 9
EPOCHS = 50
LR=1e-3

# Input image dimensions
img_rows, img_cols = 28, 28

In [None]:
!git clone https://github.com/EmpanS/Project-Sudoku

In [None]:
# First, we need to extract the images from the zip file to a new folder Training images.
archive = zipfile.ZipFile('/content/Project-Sudoku/docs/Train Models/Training images - final model.zip')
for file in archive.namelist():
    archive.extract(file, os.getcwd() + '/dataset')


BASE_PATH = os.getcwd() + '/dataset'
NUM_EXAMPLES = len(os.listdir(BASE_PATH))
DIM = 28

# Load training and test data
X = np.zeros((NUM_EXAMPLES, DIM, DIM, 1))
y = np.zeros((NUM_EXAMPLES,))
for idx, image_path in enumerate(os.listdir(BASE_PATH)):
    image = cv2.imread(BASE_PATH+"/" + image_path, cv2.IMREAD_GRAYSCALE)
    # Normalize and reshape image
    image = image / 255.0
    image = np.reshape(image, (DIM,DIM,1))
    # label 0 represents number 1, ..., label 8 represents number 9
    label = int(image_path.split("__")[0]) - 1
    X[idx] = image
    y[idx] = label
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

# Convert class vectors to binary class matrices
y_train = tf.keras.utils.to_categorical(y_train, NUM_CLASSES)
y_test = tf.keras.utils.to_categorical(y_test, NUM_CLASSES)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Now we load the model trained on the MNIST-dataset. Then we freeze all parameters and remove the last layer. Then we add a new last layer with 9 neurons since we want to be able to predict the numbers 1-9. The first model is trained on 10 digits (0-9).

In [None]:
# Load pre-trained model
model = load_model("/content/drive/My Drive/Sudoku Solver/BestDigitNet.pb")

# Remove last layer, set layers to un-trainable and add new output layer
model.pop()
for l in model.layers:
    l.trainable = False
model.add(Dense(NUM_CLASSES, activation='softmax', name="Output"))

In [None]:
checkpoint = tf.compat.v1.keras.callbacks.ModelCheckpoint(
    'BestDigitNetFinalModel.pb', monitor='val_loss', save_best_only=True, mode='auto')  # Callback for Model with best validation loss
earlystopper = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
adam = Adam(lr=LR)
model.compile(loss="categorical_crossentropy", optimizer=adam, metrics=["accuracy"])

In [None]:
# Fit the model
session = model.fit(X_train, y_train,
                    epochs=EPOCHS,
                    validation_data=(X_test, y_test),
                    callbacks=[checkpoint,earlystopper])

In [None]:
acc = session.history['accuracy']
val_acc = session.history['val_accuracy']

loss = session.history['loss']
val_loss = session.history['val_accuracy']

epochs_range = range(len(acc))

# Plot the accuracy
plt.figure(figsize=(15, 6))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

# Plot the loss
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()

We can see above that the earlystopper did not kick in, the validation loss decreased continously throughout the training session, but we have a validation accuracy of 100%. At first sight, this might seem strange, but remember, we are training on computer genererated numbers, so the numbers are very similar and the model will be used to predict computer generated numbers.

Now we have our final model, ready to be used in the project to predict numbers! 