# Training

We will train the network and save the model to disk. We will use this model in the `predict.ipynb` notebook.

**Note**:

We are currently using tensorflow 2.0 which is currently in beta state. So it is expected and ok that there are warnings!

## import libraries and set constants

In [None]:
from __future__ import absolute_import, division, print_function, unicode_literals

%load_ext autoreload
%autoreload 2
%matplotlib inline

import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import datetime
import glob

import tensorflow as tf
keras = tf.keras

import bodestm.util as util
import gitignored.config as cfg

n_examples_per_class = 30
IMG_SIZE = 128
SHUFFLE_BUFFER_SIZE = 1000
BATCH_SIZE = 32

normalize = False
use_class_weight = True

## load classification

Load the classification of the training images into a training and an evaluating data frame. There should be one classification file per folder with training images. The folder must have the same name as the classification file without the file extension. The images in the folder must be named exactly like in the classification file. There can be multiple classification files and folders.

In [None]:
classification_files = glob.glob("gitignored/classified_images/*.txt")
pd.DataFrame({"classification_files": classification_files}).head()

The classes in the classification file are `[0,1,3,4,5,6]`. Note, that there is no class `2`. This nomenclature is called `ext_label` below. The scientists are using this nomenclature to do checks and they will use it for further data analysis. We will use our own nomenclature that has the classes `[0,1,2,3,4,5]`. We shifted the classes `[3,4,5,6]`, so our classes are incrementing by 1. This is necessary to use the TensorFlow libraries. We call this nomenclature `int_label` below.

In [None]:
df = util.get_classification_df(classification_files)
print(len(df.index))
df.head()

Split data set in training and evaluating data sets. In the evaluating data set, there will be `n_examples_per_class`.

In [None]:
df_train, df_eval = util.split_data_set_in_train_and_eval(df, n_examples_per_class)
df_eval.head()

In [None]:
df_eval.groupby("int_label").count()

Check the training data set.

In [None]:
df_train.head()

Check the balance of the training data set.

In [None]:
df_show_balance = df_train.groupby("int_label").count()
print(df_show_balance)

The training data set is heavily unbalanced. To make the predictions also accurate for the minority classes, one can use a weighted loss function. If `use_class_weight`, we will calculate a weight for each class.

If $w_i$ is the weight for a class $c_i$ and there are $n_i$ images for that class,

$w_i \cdot n_i = w_j \cdot n_j$

for every $i,j \in [0,1,2,3,4,5]$

In [None]:
if use_class_weight:
    class_weight = util.get_class_weights_map(df_show_balance)
else:
    class_weight = None
    
class_weight

## exporting the data to tfrecods files

In [None]:
%%time
print("exporting eval data set with {} records".format(len(df_eval)))
util.export_dataframe(df_eval, cfg.path_to_eval_data_set, verbose=True)

In [None]:
%%time
print("exporting training data set with {} records".format(len(df_train)))
util.export_dataframe(df_train, cfg.path_to_train_data_set)

## training of the model

Here we are training the model. The training can be observed by using a TensorBoard. Keras will check the evaluation loss and accuracy after each epoch. It will save the model weights if the evaluation loss is smaller than the previous one. After the training is done, the best weights are loaded and the model is saved to disk.

**Important:** Make sure to use the most recent `model.h5` file.

In [None]:
%%time

model = tf.keras.models.Sequential([
    tf.keras.layers.BatchNormalization(input_shape=(128, 128, 1)),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 3)),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    #
    tf.keras.layers.Flatten(),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dropout(rate=0.2),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dropout(rate=0.2),
    tf.keras.layers.Dense(6, activation='softmax', name='predictions')
])

# default values for adam optimizer
# keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
# see: https://keras.io/optimizers/
adam = tf.keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)

model.compile(optimizer=adam,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

log_dir = cfg.gitignored_path + "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = [
    tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1),
    tf.keras.callbacks.ModelCheckpoint(filepath=cfg.path_to_checkpoints, save_weights_only=True, save_best_only=True, verbose=1)
]

history = model.fit(
    tf.data.TFRecordDataset(cfg.path_to_train_data).map(util.parse_tfrecord).shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE),
    validation_data=tf.data.TFRecordDataset(cfg.path_to_eval_data).map(util.parse_tfrecord).shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE),
    epochs=40,
    callbacks=tensorboard_callback,
    class_weight=class_weight
)

# load weights for the best metrics (currently loss)
model.load_weights(cfg.path_to_checkpoints)
model.save(cfg.gitignored_path + "model.h5")

## check the accuracy per class

For our aim to classify also the minority classes well, we have to check not only the total accuracy, but also the accuracy per class.

In [None]:
predictions = model.predict(
    tf.data.TFRecordDataset(cfg.path_to_eval_data).map(util.parse_tfrecord).batch(32),
    batch_size=None
)
print("shape of predictions: {}".format(predictions.shape))
print('succesfully predicted {} images'.format(len(predictions)))

In [None]:
ds = tf.data.TFRecordDataset(cfg.path_to_eval_data).map(util.parse_tfrecord)
parsed_records = [parsed_record for parsed_record in ds.take(-1)]

In [None]:
labels = []
class_predictions = []
max_probs = []
i = 0

for parsed_record in parsed_records:
    labels.append(parsed_record[1].numpy())
    class_predictions.append(np.argmax(predictions[i]))
    max_probs.append(np.max(predictions[i]))
    
    i += 1

df_predictions = pd.DataFrame({
    "int_label": labels,
    "class_prediction": class_predictions,
    "0_prediction" : predictions.T[0],
    "1_prediction" : predictions.T[1],
    "2_prediction" : predictions.T[2],
    "3_prediction" : predictions.T[3],
    "4_prediction" : predictions.T[4],
    "5_prediction" : predictions.T[5],
    "prob": max_probs
})

df_predictions["ext_label"] = util.remap_cross_back(df_predictions.int_label)

df_predictions.head()

In [None]:
# accuracy per class
for i in range(6):
    with_class_prediction = df_predictions[df_predictions["class_prediction"]==i]
    hits = len(with_class_prediction[with_class_prediction["int_label"]==i])
    all_of_class = len(df_predictions[df_predictions["int_label"]==i])
    print("accuracy for class {}: {}".format(i, hits/all_of_class))

## plot the certainty per class

We have to check the overall certainty distribution and the certainty distribution of the individual classes. If there is a correlation between wrongly classified images and a low certainty of the network, we could use this as a selector for candidates for a double check by humans.

In [None]:
# prob distribution
plt.hist(df_predictions.prob)
plt.show()

In [None]:
# prob distribution for single class
for cl in [0,1,2,3,4,5]:
    plt.hist(df_predictions[df_predictions["class_prediction"] == cl].prob)
    plt.title("certainty distribution for class {}".format(cl))
    plt.show()

## plot a few examples

We are plotting a few examples, so the scientists can do a brief manual sanity check.

In [None]:
# some samples
rows = 4
cols = 4
idx = 1

ax = None
fig = plt.figure(figsize=(cols * 4, rows * 3))
for i in range(len(parsed_records[:16])):
    parsed_record = parsed_records[i]
    ax = fig.add_subplot(rows, cols, idx)
    img = parsed_record[0]
    img = np.squeeze(img)
    int_label = df_predictions.int_label[i]
    ext_label = df_predictions.ext_label[i]
    ax.imshow(img)
    ax.set_title("label {} (int {})".format(ext_label, int_label))
    ax.set_xlabel("prediction (int): {}".format(np.argmax(predictions[idx-1])))
    ax.set_xticks([])
    ax.set_yticks([])
    
    idx += 1

fig.tight_layout()

## check the certainty distribution for wrongly classified images

Here we can check the hypothesis that there is a correlation between wrongly classified images and a low certainty of the network by plotting the certainty distribution for wrongly classified images.

In [None]:
df_wrongly_classified = df_predictions[df_predictions["class_prediction"] != df_predictions["int_label"]]
df_wrongly_classified.head()

In [None]:
plt.hist(df_wrongly_classified.prob)
plt.title("certainty distribution for wrongly classified images")
plt.show()