ML Course, Bogotá, Colombia  (&copy; Josh Bloom; June 2019)

In [1]:
%run ../talktools.py

## Classification with Keras

We just saw how to do regression problems with neural nets in `keras`. Let's now explore classification. Because we'll use this dataset later, let's introduce the [FashionMNIST](https://github.com/zalandoresearch/fashion-mnist#labels) dataset: 70k small (28$\times$28) images of 10 different types of clothing.

<img src="https://github.com/zalandoresearch/fashion-mnist/blob/master/doc/img/fashion-mnist-sprite.png?raw=true" width="80%">

Each training and test example is assigned to one of the following labels:

| Label | Description |
| --- | --- |
| 0 | T-shirt/top |
| 1 | Trouser |
| 2 | Pullover |
| 3 | Dress |
| 4 | Coat |
| 5 | Sandal |
| 6 | Shirt |
| 7 | Sneaker |
| 8 | Bag |
| 9 | Ankle boot |

Tensorflow has a simple method to get this data locally

In [2]:
import datetime, os

import tensorflow.keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras import backend as K
import tensorflow as tf

# Print keras version
print(tensorflow.keras.__version__)

ModuleNotFoundError: No module named 'tensorflow.keras'

In [None]:
fashion_mnist = tf.keras.datasets.fashion_mnist

(x_train, y_train),(x_test, y_test) = fashion_mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0  # scale the images to 0-1

In [None]:
x_train.shape

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

ind = 1
plt.axis('off')
plt.imshow(x_train[ind], cmap=plt.cm.gray_r, interpolation='nearest')

In [None]:
y_train[ind]

For now, let's treat every pixel as a separate input and build a keras NN to predict the output.

In [None]:
input_shape = x_train[0].shape
input_shape

In [None]:
model = Sequential()
model.add(Flatten(input_shape=input_shape))
model.add(Dense(512, activation="relu"))
model.add(Dense(32, activation="relu"))
model.add(Dense(10, activation='softmax'))

In [None]:
model.summary()

In [None]:
model.compile(optimizer='adam',
                loss='sparse_categorical_crossentropy',
                metrics=['accuracy'])

run_time_string = datetime.datetime.utcnow().isoformat(timespec='minutes')
# define path to save model
model_path = f'nn_results/colombia_nn_{run_time_string}.h5'
print(f"Training ... {model_path}")

logdir = os.path.join("nn_results", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)

reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_accuracy', factor=0.75,
                              patience=2, min_lr=1e-6, verbose=1, cooldown=0)

csv_logger = tf.keras.callbacks.CSVLogger(f'nn_results/training_{run_time_string}.log')

earlystop = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', min_delta=0.001, 
                                             patience=3, \
                                             verbose=1, mode='auto')

model_check = tf.keras.callbacks.ModelCheckpoint(model_path,
        monitor='val_accuracy', 
        save_best_only=True, 
        mode='max',
        verbose=1)

model.fit(x=x_train, 
          y=y_train, 
          epochs=20, 
          validation_data=(x_test, y_test), 
          callbacks=[tensorboard_callback, reduce_lr, csv_logger, earlystop, model_check])

You'll notice above that the accuracy is much higher than the val_accuracy. That is, we overfit on the training data. One way to help protect against this is to introduce `Dropout`

<img src="https://cdn-images-1.medium.com/max/1600/1*iWQzxhVlvadk6VAJjsgXgg.png">

Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from
overfitting”, JMLR 2014

In [3]:
model = Sequential()
model.add(Flatten(input_shape=input_shape))
model.add(Dense(512, activation="relu"))
model.add(Dense(32, activation="relu"))
model.add(Dropout(0.2))  # 20% chance of dropping a node during training
model.add(Dense(10, activation='softmax'))

NameError: name 'Sequential' is not defined

In [None]:
model.summary()

In [None]:
model.compile(optimizer='adam',
                loss='sparse_categorical_crossentropy',
                metrics=['accuracy'])

run_time_string = datetime.datetime.utcnow().isoformat(timespec='minutes')
# define path to save model
model_path = f'nn_results/colombia_nn_{run_time_string}.h5'
print(f"Training ... {model_path}")

logdir = os.path.join("nn_results", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)

reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_accuracy', factor=0.75,
                              patience=2, min_lr=1e-6, verbose=1, cooldown=0)

csv_logger = tf.keras.callbacks.CSVLogger(f'nn_results/training_{run_time_string}.log')

earlystop = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', min_delta=0.001, 
                                             patience=3, \
                                             verbose=1, mode='auto')

model_check = tf.keras.callbacks.ModelCheckpoint(model_path,
        monitor='val_accuracy', 
        save_best_only=True, 
        mode='max',
        verbose=1)

model.fit(x=x_train, 
          y=y_train, 
          epochs=20, 
          validation_data=(x_test, y_test), 
          callbacks=[tensorboard_callback, reduce_lr, csv_logger, earlystop, model_check])

Let's make some predictions

In [None]:
model.predict(x_test)

In [None]:
y_pred = model.predict_classes(x_test)
y_pred

In [None]:
y_test

In [None]:
import numpy as np
from sklearn.metrics import confusion_matrix
conf_mat = confusion_matrix(y_test, y_pred)

import seaborn as sns
sns.set_context("poster")
conf_mat = confusion_matrix(y_test, y_pred)
conf_mat_normalized = conf_mat.astype('float') / conf_mat.sum(axis=1)[:, np.newaxis]
sns.heatmap(conf_mat_normalized)
plt.ylabel('True label')
plt.xlabel('Predicted label')

In [None]:
conf_mat

| Label | Description |
| --- | --- |
| 0 | T-shirt/top |
| 1 | Trouser |
| 2 | Pullover |
| 3 | Dress |
| 4 | Coat |
| 5 | Sandal |
| 6 | Shirt |
| 7 | Sneaker |
| 8 | Bag |
| 9 | Ankle boot |

In [None]:
lookup = {0: "T-shirt/top",
          1: "Trouser",
          2: "Pullover",
          3: "Dress",
          4: "Coat",
          5: "Sandal",
          6: "Shirt",
          7: "Sneaker",
          8: "Bag",
          9: "Ankle boot"}

In [None]:
ind_wrong = []
for i, (pred, actual) in enumerate(zip(model.predict_classes(x_test),y_test)):
    if pred != actual:
        ind_wrong.append((i,pred, actual))

In [None]:
ind = 100
plt.imshow(x_test[ind_wrong[ind][0]], cmap=plt.cm.gray_r, interpolation='nearest')
plt.axis("off")
plt.title(f"pred={lookup[ind_wrong[ind][1]]} true={lookup[ind_wrong[ind][2]]}")

How'd we do? The creators of the dataset maintain a website with results using `sklearn`:

http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/#