# Prediction

In this notebook, we will load the `model.h5` that the `train.ipynb` yielded. Then we will use it to classify the input images. The folder with the input images can be set by the `input_folder` variable. We will save the classification to a CSV file on disk. Finally, we will plot a few examples for a brief manual sanity check by the scientists.

**Note**:

We are currently using TensorFlow 2.0 which is currently in beta state. So it is expected and ok that there are warnings!

## settings, imports and constants

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import os
import glob
import imageio
import tensorflow as tf
import tensorflow.keras as keras

import gitignored.remap as remap
import gitignored.config as cfg

import bodestm.util as util

input_folder = 'gitignored/to_predict/*.png'

## create data set from generator

In [None]:
# show which data will be classified
content_of_inputfolder = glob.glob(input_folder)
pd.DataFrame({"input_files": content_of_inputfolder}).head()

In [None]:
ds = tf.data.Dataset.from_generator(
    generator=lambda: util.gen_filenames(content_of_inputfolder),
    output_types=tf.string,
    output_shapes=None
)

## load the model from disk

It is crucial to pick the right model file. E.g. if the model was trained in the cloud and one executes this notebook on a local system, he has to transfer the `model.h5` file.

In [None]:
model = keras.models.load_model(cfg.gitignored_path + "model.h5")

## prediction

In [None]:
predictions = model.predict(
    ds.map(util.parse_image).batch(32),
    # docu says that "do not specify the batch_size is your data is in the form of .., dataset, ..."
    # but it must be explizitly set to "None" even if the docu says the default values is "None". Otherwise there will be an error.
    batch_size=None
)
print("shape of predictions: {}".format(predictions.shape))
print('succesfully predicted {} images'.format(len(predictions)))

In [None]:
predicted_class = []
predicted_class_prob = []

for prediction in predictions:
    predicted_class.append(np.argmax(prediction))
    predicted_class_prob.append(np.max(prediction))
    
df_predictions = pd.DataFrame({
    "file_name": content_of_inputfolder,
    "predicted_class_int": predicted_class,
    "predicted_class_prob": predicted_class_prob
})

df_predictions["predicted_class_ext"] = remap.remap_cross_back(df_predictions.predicted_class_int)

df_predictions.to_csv('gitignored/to_predict/predictions.csv')

df_predictions.head()

## plot predicted images with classification

We plot a few images for a brief sanity check by the scientists. The prediction is shown in the external as well as in the internal nomenclature. Also, the confidence of the network for the classification is shown.

In [None]:
n = len(df_predictions)

if n > 50:
    n = 50

cols = 4
rows = (n//cols) + 1
idx = 1

ax = None
fig = plt.figure(figsize=(cols * 4, rows * 3))

for i in range(n):
    ax = fig.add_subplot(rows, cols, idx)
    img = imageio.imread(df_predictions.file_name[i])
    predicted_class = df_predictions.predicted_class_int[i]
    ax.imshow(img)
    ax.set_title("prediction: {} (int {})".format(df_predictions.predicted_class_ext[i], predicted_class))
    prob = df_predictions.predicted_class_prob[i]
    ax.set_xlabel("confidence {:.3f}".format(prob))
    ax.set_xticks([])
    ax.set_yticks([])
    
    idx += 1

fig.tight_layout()