# Image classification with TensorFlow 

Based on an original notebook by the TensorFlow authors, licensed under Apache 2.0.

Use **Shift + Enter** to run the cells. When prompted, click **Run anyway** then **Yes**. Try it on this cell...

## Overview

In this notebook, we're going to classify some images of fossils... in fewer than 80 lines of code.

A 'notebook' is an interactive coding and note-taking environment. We're going to be using some cutting edge technology, right in your browser. We will see:

- A deep neural network in action.
- Google's TensorFlow deep learning library.

There are fewer than 80 lines of code altogether.

## Load the data

We'll begin by downloading the dataset. Run this cell:

In [None]:
import requests
import numpy as np
from io import BytesIO

X_ = requests.get("https://s3.amazonaws.com/agilegeo/geocomp/image_X.npy")
y_ = requests.get("https://s3.amazonaws.com/agilegeo/geocomp/integer_y.npy")

X = np.load(BytesIO(X_.content))
y = np.load(BytesIO(y_.content))

print("Data loaded!")

In [None]:
X.shape

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=186)

X_val.shape

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

plt.imshow(X_train[0, :, :, 0])

In [None]:
y_val

## Shallow learning model

In [None]:
X_train.shape, X_val.shape

In [None]:
from sklearn.svm import SVC



Evaluating the model, we see that we got about 60% accuracy:

## Deep learning model

The following example uses a standard conv-net that has 3 layers with drop-out and batch normalization between each layer.

In [None]:
import tensorflow as tf

model = tf.keras.models.Sequential()

model.add(tf.keras.layers.BatchNormalization(input_shape=X_train.shape[1:]))
model.add(tf.keras.layers.Conv2D(256, (5, 5), padding='same', activation='elu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(3, 3)))
model.add(tf.keras.layers.Dropout(0.25))

model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Conv2D(128, (5, 5), padding='same', activation='elu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(3, 3), strides=(2,2)))
model.add(tf.keras.layers.Dropout(0.25))

model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Conv2D(128, (5, 5), padding='same', activation='elu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(3, 3)))
model.add(tf.keras.layers.Dropout(0.25))

model.add(tf.keras.layers.Flatten())

model.add(tf.keras.layers.Dense(32))
model.add(tf.keras.layers.Activation('elu'))
model.add(tf.keras.layers.Dropout(0.25))

model.add(tf.keras.layers.Dense(3))
model.add(tf.keras.layers.Activation('softmax'))

model.summary()

## Train on the TPU

To begin training, construct the model on the TPU and then compile it.

The following code demonstrates the use of a generator function and `fit_generator` to train the model.  Alternately, you can pass in `x_train` and `y_train` to `tpu_model.fit()`.

In [None]:
import os

model.compile(
    optimizer="Adam",
    loss=tf.keras.losses.sparse_categorical_crossentropy,
    metrics=['sparse_categorical_accuracy']
)

def train_gen():
    """Training, no batches.
    """
    while True:
        yield X_train, y_train

In [None]:
X_train.shape

In [None]:
model.fit(
    X_train, y_train,
    epochs=32,
    batch_size=50,
    validation_data=(X_val, y_val),
    validation_freq=32,
)

## Check the results (inference)

Now that you are done training, see how well the model can predict fossil types.

In [None]:
LABEL_NAMES = ['ammonites', 'fish', 'trilobites']

y_pred = model.predict(X_val)

In [None]:
%matplotlib inline
from matplotlib import pyplot

import random

def visualize(X_val, y_val, y_prob, cutoff=0.5, ncols=6, nrows=3, figsize=(12, 8), classes=None, shape=None):
    """
    Visualize some random samples from the prediction results.
    Colours: green for a good prediction, red for a wrong one. If the
    probability was less than some cutoff (default 0.5), we'll mute the colour.

    Args:
        X_val (ndarray): The validation features, n_samples x n_features.
        y_val (ndarray): The validation labels, n_samples x 1.
        y_prob (ndarray): The predicted probabilities, n_samples x n_classes.
        cutoff (float): the cutoff for 'uncertain'.
        ncols (int): how many plots across the grid.
        nrows (int): how many plots down the grid.
        figsize (tuple): tuple of ints.
        classes (array-like): the classes, in order. Will be inferred if None.
        shape (tuple): Shape of each instance, if it needs reshaping.
    """
    idx = random.sample(range(X_val.shape[0]), ncols*nrows)
    sample = X_val[idx]

    if classes is None:
        classes = np.unique(y_val)
    else:
        y_val = np.asarray(classes)[y_val]

    fig, axs = plt.subplots(figsize=figsize, ncols=ncols, nrows=nrows)
    axs = axs.ravel()

    for ax, img, actual, probs in zip(axs, sample, y_val[idx], y_prob[idx]):

        pred = classes[np.argmax(probs)]
        prob = np.max(probs)
        if shape is not None:
            img = img.reshape(shape)

        ax.imshow(np.squeeze(img), cmap='gray')
        ax.set_title(f"{pred} - {prob:.3f}\n[{actual}]")
        ax.set_xticks([])
        ax.set_yticks([])

        if prob > cutoff:
            c = 'limegreen' if (actual == pred) else 'red'
        else:
            c = 'y' if (actual == pred) else 'lightsalmon'

        for spine in ax.spines.values():
            spine.set_edgecolor(c)
            spine.set_linewidth(4)

    return

In [None]:
visualize(X_val, y_val, y_pred, classes=LABEL_NAMES)

In [None]:
# Incorrectly classified only.
wrong_idx = np.argmax(y_pred, axis=-1) != y_val
y_pred_ = y_pred[wrong_idx]
y_val_ = y_val[wrong_idx]
X_val_ = X_val[wrong_idx]

visualize(X_val_, y_val_, y_pred_, classes=LABEL_NAMES)