## Evaluation our model performance

The notebook is a quick evaluation of our model performance. Note that if you trained the model yourself locally there are likely to be small differences between the model you got from your training run and the model whose performance is being evaluated here.

In [1]:
# You probably don't need this.
# import t4
# t4.Bucket("s3://alpha-quilt-storage").fetch(
#     "aleksey/sagemaker/alekseylearn/alekseylearn-notebooks-build-8/output/model.tar.gz",
#     "../models/model.tar.gz"
# )
# !rm ../models/clf.h5 2>/dev/null
# !tar -xvf "../models/model.tar.gz"
# !mv clf.h5 ../models/clf.h5
# !mv history.json ../models/history.json

In [2]:
import json
with open('../models/history.json', 'r') as f:
    hist = json.load(f)

In [3]:
hist

{'val_loss': [0.6403966411467521,
  0.6127266787713573,
  0.5962540805339813,
  0.6470143996900127,
  0.5596090093735726,
  0.5438002696851405,
  0.536070837128547,
  0.5606116306397223,
  0.5101688513832707,
  0.5737233556086018,
  0.5323542361336995,
  0.5174307380953143,
  0.44509513435825226,
  0.5877856698728376,
  0.48688628788917293,
  0.5060960316076512,
  0.5278721720941605],
 'val_acc': [0.657258064516129,
  0.6471774193548387,
  0.7056451612903226,
  0.6491935483870968,
  0.7157258064516129,
  0.7012195126797126,
  0.7379032258064516,
  0.7278225806451613,
  0.7580645161290323,
  0.6995967741935484,
  0.7296747962633768,
  0.7681451612903226,
  0.7903225806451613,
  0.6935483870967742,
  0.7439516129032258,
  0.7276422759381737,
  0.7258064516129032],
 'loss': [0.6867040309229155,
  0.6651339889333199,
  0.6488449365328374,
  0.6274135245850972,
  0.6181225183477812,
  0.6131593026042554,
  0.6059242456533502,
  0.5957962876301633,
  0.5967469439742288,
  0.584502640095624,


In [1]:
from keras.models import load_model
model = load_model('../models/clf.h5')

Using TensorFlow backend.


In [5]:
from keras.preprocessing.image import ImageDataGenerator

test_datagen = ImageDataGenerator(rescale=1./255)
validation_generator = test_datagen.flow_from_directory(
    '../data/images_cropped/quilt/open_images/',
    target_size=(128, 128),
    batch_size=16,
    class_mode='binary'
)

Found 2508 images belonging to 2 classes.


In [8]:
import pathlib

sample_size = len(list(pathlib.Path('../data/images_cropped/').rglob('./*')))
batch_size = 16

In [2]:
# We can't iterate automatically using `predict_generator` because it only returns classifications.
# It does not report the names of the images being classified. But we need to know these to compare
# against the ground truth. See the next code cell for the solution.

# y_pred = model.predict_generator(
#     validation_generator,
#     steps=sample_size // batch_size
# )

# ...

# Following code taken from https://stackoverflow.com/a/49977036/1993206
# import numpy as np
# from tqdm import tqdm_notebook

# y_pred = []
# image_names = []

# for i in tqdm_notebook(range(sample_size // batch_size)):
#     indices = next(validation_generator.index_generator)
#     images, labels = validation_generator._get_batches_of_transformed_samples(indices)
#     for index in indices:
#         image_name = validation_generator.filenames[index].split('/')[-1]
#         image_names.append(image_name)
#     y_pred += labels.astype(int).tolist()

# y_pred = np.array(y_pred)

# import pandas as pd
# X_meta = pd.read_csv('../data/training/X_meta.csv')


# y = (X_meta
#      .set_index('ImageCropURL')
#      .loc[image_names]
#      .LabelName
#      .map(lambda v: validation_generator.class_indices[v])
#      .values
#     )


# current classification report---two class stopped
# from sklearn.metrics import classification_report
# print(classification_report(y, y_pred))

In [3]:
from keras.preprocessing.image import load_img
import os
import skimage
from tqdm import tqdm_notebook

# The following code generates an prediction vector for our input dataset, y_pred
# So that we can compare that to our ground truth y.
# This is a fast and manual approach, there are better ways.

img_uris = []
y_pred = []

for img_uri in tqdm_notebook(os.listdir('../data/images_cropped/quilt/open_images/Hamburger/')):
    img = load_img('../data/images_cropped/quilt/open_images/Hamburger/' + img_uri)
    arr = skimage.transform.resize(np.array(img), (128, 128, 3))
    y_pred.append(model.predict_classes(arr[np.newaxis, :, :, :])[0][0])
    img_uris.append(img_uri)
    
for img_uri in tqdm_notebook(os.listdir('../data/images_cropped/quilt/open_images/Sandwich/')):
    img = load_img('../data/images_cropped/quilt/open_images/Sandwich/' + img_uri)
    arr = skimage.transform.resize(np.array(img), (128, 128, 3))
    y_pred.append(model.predict_classes(arr[np.newaxis, :, :, :])[0][0])
    img_uris.append(img_uri)
    
y_pred = np.array(y_pred)
y = (X_meta
 .set_index('ImageCropURL')
 .loc[img_uris]
 .LabelName
 .map(lambda v: validation_generator.class_indices[v])
 .values
)

In [47]:
from sklearn.metrics import classification_report
print(classification_report(y, y_pred))

              precision    recall  f1-score   support

           0       0.90      0.59      0.71      1399
           1       0.64      0.92      0.75      1109

   micro avg       0.73      0.73      0.73      2508
   macro avg       0.77      0.75      0.73      2508
weighted avg       0.78      0.73      0.73      2508



We could generate further model accuracy improvements by removing depictions and (potentially) occluded images from the dataset. This would likely push average accuracy up to roughly 80%, the same baseline for Cats versus Dogs featured in the article ["Building powerful image classification models using very little data"](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html). However we haven't done that here. Exercise left to the reader!