# Bird Classification with Tensorflow on Amazon SageMaker - Directly in your notebook

1. [Introduction](#Introduction)
2. [Data Preparation](#Data-Preparation)
3. [Train the model](#Train-the-model)
4. [Test the model](#Test-the-model)

## Introduction

Image classification is an increasingly popular machine learning technique, in which a trained model predicts which of several classes is represented by a particular image. This technique is useful across a wide variety of use cases from manufacturing quality control to medical diagnosis. To create an image classification solution, we need to acquire and process an image dataset, and train a model from that dataset. The trained model is then capable of identifying features and predicting which class an image belongs to. Finally, we can make predictions using the trained model against previously unseen images.

This notebook is an end-to-end example showing how to build an image classifier using TensorFlow and Keras, simply using Amazon SageMaker's hosted Jupyter notebook directly. This is an easy transition from traditional machine learning development you may already be doing on your laptop or on an Amazon EC2 instance. Subsequent notebooks in this workshop will demonstrate how to take full advantage of SageMaker's training service, hosting service, and automatic model tuning. Note that for complex large scale machine learning models, training directly in a notebook can be cost prohibitive.

For each of the labs in this workshop, we use a publicly available set of bird images based on the [Caltech Birds (CUB 200 2011)](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html) dataset. We demonstrate transfer learning by leveraging pretrained ImageNet weights for a MobileNet V2 network architecture.

For a quick demonstration, pick a small handful of bird species (set `SAMPLE_ONLY = True` and choose a few classes / species). For a more complete model, you can train against all 200 bird species in the dataset. For anything more than a few classes, be sure to upgrade your notebook instance type to one of SageMaker's GPU instance types (e.g., ml.p2, ml.p3).

## Data Preparation

The [Caltech Birds (CUB 200 2011)](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html) dataset contains 11,788 images across 200 bird species (the original technical report can be found [here](http://www.vision.caltech.edu/visipedia/papers/CUB_200_2011.pdf)).  Each species comes with around 60 images, with a typical size of about 350 pixels by 500 pixels.  Bounding boxes are provided, as are annotations of bird parts.  A recommended train/test split is given, but image size data is not.

![](./cub_200_2011_snapshot.png)

The dataset can be downloaded [here](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html). Note that the file size is around 1.2 GB, and can take many minutes to download depending on your network connection. As an alternative, we have included `CUB_MINI.tar` in this workshop repo. It contains images and metadata for 8 of the 200 species, and can be used effectively for testing and learning how to use SageMaker and TensorFlow together on computer vision use cases.

### Download and unpack the full dataset

Here we download the birds dataset from CalTech. You can do this once and keep the unpacked dataset in your notebook instance.

In [None]:
import os 
import urllib.request

def download(url):
    filename = url.split('/')[-1]
    if not os.path.exists(filename):
        urllib.request.urlretrieve(url, filename)

In [None]:
# %%time
# download('http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz')

In [None]:
# %%time
# # Clean up prior version of the downloaded dataset if you are running this again
# !rm -rf CUB_200_2011  

# # Unpack and then remove the downloaded compressed tar file
# !gunzip -c ./CUB_200_2011.tgz | tar xopf - 
# !rm CUB_200_2011.tgz

### Unpack the small subset of 8 bird species (CUB_MINI.tar)
Here we unpack the small dataset, included with the repo.

In [None]:
!tar xvf CUB_MINI.tar

### Set some parameters for the rest of the notebook to use
Here we define a few parameters that help drive the rest of the notebook.  For example, `SAMPLE_ONLY` is defaulted to `True`. This will force the notebook to train on only a handful of species.  Setting `SAMPLE_ONLY` to false will make the notebook work with the entire dataset of 200 bird species.  This makes the training a more difficult challenge, and you will need to tune parameters and run more epochs.

An `EXCLUDE_IMAGE_LIST` is defined as a mechanism to address any corrupt images from the dataset and ensure they do not disrupt the process.

In [None]:
import pandas as pd
import json

import matplotlib.pyplot as plt
%matplotlib inline

# To speed up training and experimenting, you can use a small handful of species.
# To see the full list of the classes available, look at the content of CLASSES_FILE.
SAMPLE_ONLY  = True
CLASSES = [13, 17] #, 35, 36, 47, 68, 73, 87]

# Otherwise, you can use the full set of species
if (not SAMPLE_ONLY):
    CLASSES = []
    for c in range(200):
        CLASSES += [c + 1]

BASE_DIR   = 'CUB_MINI/' # or use 'CUB_200_2011/' if you downloaded the full dataset
IMAGES_DIR = BASE_DIR + 'images/'

CLASSES_FILE = BASE_DIR + 'classes.txt'
IMAGE_FILE   = BASE_DIR + 'images.txt'
LABEL_FILE   = BASE_DIR + 'image_class_labels.txt'

SPLIT_RATIOS = (0.7, 0.2, 0.1)

CLASS_COLS      = ['class_number','class_id']

EXCLUDE_IMAGE_LIST = ['087.Mallard/Mallard_0130_76836.jpg']

## Understand the dataset
Show the list of bird species or dataset classes.

In [None]:
classes_df = pd.read_csv(CLASSES_FILE, sep=' ', names=CLASS_COLS, header=None)
criteria = classes_df['class_number'].isin(CLASSES)
classes_df = classes_df[criteria]

class_name_list = sorted(classes_df['class_id'].unique().tolist())
print(class_name_list)

For each species, there are dozens of images of various shapes and sizes. By dividing the entire dataset into individual named (numbered) folders, the images are in effect labelled for supervised learning using image classification and object detection algorithms. 

The following function displays a grid of thumbnail images for all the image files for a given species.

In [None]:
def show_species(species_id):
    _im_list = !ls $IMAGES_DIR/$species_id

    NUM_COLS = 4
    IM_COUNT = len(_im_list)

    print('Species ' + species_id + ' has ' + str(IM_COUNT) + ' images.')
    
    NUM_ROWS = int(IM_COUNT / NUM_COLS)
    if ((IM_COUNT % NUM_COLS) > 0):
        NUM_ROWS += 1

    fig, axarr = plt.subplots(NUM_ROWS, NUM_COLS)
    fig.set_size_inches(12.0, 20.0, forward=True)

    curr_row = 0
    for curr_img in range(IM_COUNT):
        # fetch the url as a file type object, then read the image
        f = IMAGES_DIR + species_id + '/' + _im_list[curr_img]
        a = plt.imread(f)

        # find the column by taking the current index modulo 3
        col = curr_img % NUM_ROWS
        # plot on relevant subplot
        axarr[col, curr_row].imshow(a)
        if col == (NUM_ROWS - 1):
            # we have finished the current row, so increment row counter
            curr_row += 1

    fig.tight_layout()       
    plt.show()
        
    # Clean up
    plt.clf()
    plt.cla()
    plt.close()

In [None]:
show_species('017.Cardinal')

### Create train/val/test dataframes from our dataset
Here we split our dataset into training, testing, and validation datasets, each in their own Pandas dataframe.

In [None]:
def split_to_train_val_test(df, label_column, splits=(0.7, 0.2, 0.1), verbose=False):
    train_df, val_df, test_df = pd.DataFrame(), pd.DataFrame(), pd.DataFrame()

    labels = df[label_column].unique()
    for lbl in labels:
        lbl_df = df[df[label_column] == lbl]

        lbl_train_df        = lbl_df.sample(frac=splits[0])
        lbl_val_and_test_df = lbl_df.drop(lbl_train_df.index)
        lbl_test_df         = lbl_val_and_test_df.sample(frac=splits[2]/(splits[1] + splits[2]))
        lbl_val_df          = lbl_val_and_test_df.drop(lbl_test_df.index)

        if verbose:
            print('\n{}:\n---------\ntotal:{}\ntrain_df:{}\nval_df:{}\ntest_df:{}'.format(lbl,
                                                                        len(lbl_df), 
                                                                        len(lbl_train_df), 
                                                                        len(lbl_val_df), 
                                                                        len(lbl_test_df)))
        train_df = train_df.append(lbl_train_df)
        val_df   = val_df.append(lbl_val_df)
        test_df  = test_df.append(lbl_test_df)

    # shuffle them on the way out using .sample(frac=1)
    return train_df.sample(frac=1), val_df.sample(frac=1), test_df.sample(frac=1)

def get_train_val_dataframes():
    images_df = pd.read_csv(IMAGE_FILE, sep=' ',
                            names=['image_pretty_name', 'image_file_name'],
                            header=None)
    image_class_labels_df = pd.read_csv(LABEL_FILE, sep=' ',
                                names=['image_pretty_name', 'orig_class_id'], header=None)

    # Merge the metadata into a single flat dataframe for easier processing
    full_df = pd.DataFrame(images_df)
    full_df = full_df[~full_df.image_file_name.isin(EXCLUDE_IMAGE_LIST)]

    full_df.reset_index(inplace=True, drop=True)
    full_df = pd.merge(full_df, image_class_labels_df, on='image_pretty_name')

    if SAMPLE_ONLY:
        # grab a small subset of species for testing
        criteria = full_df['orig_class_id'].isin(CLASSES)
        full_df = full_df[criteria]
        print('Using subset of total images based on sample class list. subtotal: {}'.format(full_df.shape[0]))

    unique_classes = full_df['orig_class_id'].drop_duplicates()
    sorted_unique_classes = sorted(unique_classes)
    id_to_one_based = {}
    i = 1
    for c in sorted_unique_classes:
        id_to_one_based[c] = str(i)
        i += 1

    full_df['class_id'] = full_df['orig_class_id'].map(id_to_one_based)
    full_df.reset_index(inplace=True, drop=True)

    def get_class_name(fn):
        return fn.split('/')[0]
    full_df['class_name'] = full_df['image_file_name'].apply(get_class_name)
    full_df = full_df.drop(['image_pretty_name'], axis=1)

    train_df = []
    test_df  = []
    val_df   = []

    # split into training and validation sets
    train_df, val_df, test_df = split_to_train_val_test(full_df, 'class_id', SPLIT_RATIOS)

    print('num images total: ' + str(images_df.shape[0]))
    print('\nnum train: ' + str(train_df.shape[0]))
    print('num val: ' + str(val_df.shape[0]))
    print('num test: ' + str(test_df.shape[0]))
    return train_df, val_df, test_df

In [None]:
train_df, val_df, test_df = get_train_val_dataframes()

In [None]:
train_df.head()

## Train the model 
In this section of the notebook, we train an image classification model to predict the bird species. In many cases, you are able to leverage a technique called [transfer learning](https://www.tensorflow.org/tutorials/images/transfer_learning), which uses pretrained models to dramatically simplify the process. Highly accurate classification models can be built using relatively small datasets and very few epochs, since you are starting with pretrained weights. In this notebook, we use pretrained models from [Tensorflow's model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md).

In [None]:
HEIGHT     = 224
WIDTH      = 224
BATCH_SIZE = 16

In [None]:
import tensorflow as tf
print(tf.version.VERSION)
tf.__version__

In [None]:
from tensorflow.keras.preprocessing import image

In [None]:
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2, preprocess_input
LAST_FROZEN_LAYER = 20

from tensorflow.keras.preprocessing.image import ImageDataGenerator

### Prepare image data generators from our dataframes
In this section, we use Tensorflow's [ImageDataGenerator](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator) class to give a consistent way to access batches of our training, testing, and validation images. Tensorflow training will use these generators to pull sets of images as it makes its way through each training epoch. The generators are also performing image augmentation to reduce overfitting. Random adjustments are made to image brightness, rotation, width, and height, and some images will be flipped along the horizontal axis (for a bird facing left, provide an equivalent image with the bird facing to the right instead).

In [None]:
train_datagen =  ImageDataGenerator(
      preprocessing_function=preprocess_input,
      rotation_range=60,
      brightness_range=(0.8, 1.0),
      width_shift_range=0.2,
      height_shift_range=0.2,
      shear_range=0.2,
      zoom_range=0.2,
      horizontal_flip=True,
      vertical_flip=False
    )
val_datagen  = ImageDataGenerator(preprocessing_function=preprocess_input)
test_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)

train_gen = train_datagen.flow_from_dataframe(train_df, directory=IMAGES_DIR,
                                              x_col='image_file_name', y_col='class_id',
                                              target_size=(HEIGHT, WIDTH), 
                                              batch_size=BATCH_SIZE)
val_gen = train_datagen.flow_from_dataframe(val_df, directory=IMAGES_DIR,
                                              x_col='image_file_name', y_col='class_id',
                                              target_size=(HEIGHT, WIDTH), 
                                              batch_size=BATCH_SIZE)
test_gen = train_datagen.flow_from_dataframe(test_df, directory=IMAGES_DIR,
                                              x_col='image_file_name', y_col='class_id',
                                              target_size=(HEIGHT, WIDTH), 
                                              batch_size=1,
                                              shuffle=False) # need predictable order for test

### Define the model

In [None]:
from tensorflow.keras.layers import Dense, Activation, Flatten, Dropout, GlobalAveragePooling2D
from tensorflow.keras.models import Sequential, Model

In [None]:
base_model = MobileNetV2(weights='imagenet', 
                      include_top=False, 
                      input_shape=(HEIGHT, WIDTH, 3))

In [None]:
def build_finetune_model(base_model, dropout, fc_layers, num_classes):
    # Freeze all base layers
    for layer in base_model.layers:
        layer.trainable = False

    x = base_model.output
    x = Flatten()(x)
    for fc in fc_layers:
        x = Dense(fc, activation='relu')(x) 
        if (dropout != 0.0):
            x = Dropout(dropout)(x)

    # New softmax layer
    predictions = Dense(num_classes, activation='softmax', name='output')(x) 
    
    finetune_model = Model(inputs=base_model.input, outputs=predictions)

    return finetune_model

# Here we extend the base model with additional fully connected layers, dropout for avoiding
# overfitting to the training dataset, and a classification layer
num_classes = len(class_name_list)
model = build_finetune_model(base_model, 
                              dropout=0.5, 
                              fc_layers=[1024], 
                              num_classes=num_classes)

### Perform training and save the model

In [None]:
from tensorflow.keras.optimizers import SGD, RMSprop

NUM_EPOCHS = 10
INITIAL_EPOCHS = 2

num_train_images = len(train_gen.filepaths)
num_val_images   = len(val_gen.filepaths)

opt = RMSprop(lr=0.00001) # or Adam

model.compile(opt, loss='categorical_crossentropy', metrics=['accuracy'])

model.fit_generator(train_gen, epochs=INITIAL_EPOCHS, workers=8, 
                                   steps_per_epoch=num_train_images // BATCH_SIZE, 
                                   validation_data=val_gen, validation_steps=num_val_images // BATCH_SIZE,
                                   shuffle=True)

for layer in model.layers[LAST_FROZEN_LAYER:]:
    layer.trainable = True

model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit_generator(train_gen, epochs=NUM_EPOCHS, workers=8, 
                                   steps_per_epoch=num_train_images // BATCH_SIZE, 
                                   validation_data=val_gen, validation_steps=num_val_images // BATCH_SIZE,
                                   shuffle=True)

In [None]:
if not os.path.exists('./checkpoints'):
    os.mkdir('./checkpoints')
model.save('./checkpoints/' + 'MobileNetV2' + '_bird_model_weights.h5')

In [None]:
print(model.summary())

In [None]:
!ls ./checkpoints

### Plot accuracy and loss across epochs

In [None]:
def plot_training(history):
    acc = history.history['acc']
    val_acc = history.history['val_acc']
    loss = history.history['loss']
    val_loss = history.history['val_loss']
    epochs = range(len(acc))

    plt.plot(epochs, acc, 'r.')
    plt.plot(epochs, val_acc, 'r')
    plt.title('Training accuracy')

    # plt.figure()
    # plt.plot(epochs, loss, 'r.')
    # plt.plot(epochs, val_loss, 'r-')
    # plt.title('Training and validation loss')
    plt.show()

    plt.savefig('acc_vs_epochs.png')
    
plot_training(history)

### Calculate model metrics

In [None]:
eval_preds = model.evaluate_generator(test_gen, steps=test_df.shape[0])
print('Loss: {:.2f}, Accuracy: {:.2f}'.format(eval_preds[0], eval_preds[1]))

## Test the model

In [None]:
from numpy import argmax

In [None]:
from IPython.display import Image, display
def predict_bird_from_file(fn, verbose=True):
    img = image.load_img(fn, target_size=(HEIGHT, WIDTH))
    x = image.img_to_array(img)
    x = x.reshape((1,) + x.shape)
    x = preprocess_input(x)
    
    results = model.predict(x)
    predicted_class_idx = argmax(results)
    predicted_class = class_name_list[predicted_class_idx]
    confidence = results[0][predicted_class_idx]
    if verbose:
        display(img)
        print('Class: {}, confidence: {:.2f}'.format(predicted_class, confidence))
    del img, x
    return predicted_class_idx, confidence

In [None]:
fname = IMAGES_DIR + '/' + test_df.iloc[0]['image_file_name']
predict_bird_from_file(fname)

### Now take a look at how well the model performs against the validation dataset

In [None]:
i = 0
predictions = []
labels = []

val_gen.reset()

for inputs_batch, labels_batch in val_gen:
    preds = model.predict(inputs_batch)
    
    predictions[i * BATCH_SIZE : (i + 1) * BATCH_SIZE] = preds
    labels[i * BATCH_SIZE : (i + 1) * BATCH_SIZE] = labels_batch

    i += 1
    if i * BATCH_SIZE > num_val_images:
        break
        
print('predicted {} batches of size {} for total of {} images'.format(i - 1, BATCH_SIZE, (i - 1) * BATCH_SIZE))

In [None]:
import numpy as np
predicted_classes = np.zeros(len(predictions), dtype=int)
actual_classes    = np.zeros(len(predictions), dtype=int)
for i in range(len(predictions)):
    predicted_classes[i] = predictions[i].argmax(axis=-1)
    actual_classes[i]    = argmax(labels[i])
predicted_classes = predicted_classes.tolist()
actual_classes    = actual_classes.tolist()

In [None]:
errors = np.where(np.asarray(predicted_classes) != np.asarray(actual_classes))[0]
print('Encountered {} incorrect predictions: {}'.format(len(errors), errors))

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import itertools

def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.GnBu):
    plt.figure(figsize=(7,7))
    plt.grid(False)

    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)
    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), 
                                  range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")
    plt.tight_layout()
    plt.gca().set_xticklabels(class_name_list)
    plt.gca().set_yticklabels(class_name_list)
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

In [None]:
from sklearn.metrics import confusion_matrix
def create_and_plot_confusion_matrix(actual, predicted):
    cnf_matrix = confusion_matrix(actual, np.asarray(predicted),labels=range(len(class_name_list)))
    plot_confusion_matrix(cnf_matrix, classes=range(len(class_name_list)))

In [None]:
create_and_plot_confusion_matrix(actual_classes, predicted_classes)

### Assess prediction performance against validation and test datasets

In [None]:
from IPython.display import Image, display

# Iterate through entire dataframe, tracking predictions and accuracy. For mistakes, show the image, and the predicted and actual classes to help understand
# where the model may need additional tuning.

def test_image_df(df):
    print('Testing {} images'.format(df.shape[0]))
    num_errors = 0
    preds = []
    acts  = []
    for i in range(df.shape[0]):
        fname = df.iloc[i]['image_file_name']
        act   = int(df.iloc[i]['class_id']) - 1
        acts.append(act)
        pred, conf = predict_bird_from_file(IMAGES_DIR + '/' + fname, verbose=False)
        preds.append(pred)
        if (pred != act):
            num_errors += 1
            print('ERROR on image index {} -- Pred: {} {:.2f}, Actual: {}'.format(i, 
                                                                   class_name_list[pred], conf, 
                                                                   class_name_list[act]))
            img = Image(filename=f'{IMAGES_DIR}/{fname}', width=WIDTH, height=HEIGHT)
            display(img)
    return num_errors, preds, acts

In [None]:
num_images = train_df.shape[0]
num_errors, preds, acts = test_image_df(train_df)
print('\nAccuracy: {:.2f}, {}/{}'.format(1 - (num_errors/num_images), num_images - num_errors, num_images))

In [None]:
num_images = val_df.shape[0]
num_errors, preds, acts = test_image_df(val_df)
print('\nAccuracy: {:.2f}, {}/{}'.format(1 - (num_errors/num_images), num_images - num_errors, num_images))

In [None]:
create_and_plot_confusion_matrix(acts, preds)

In [None]:
num_images = test_df.shape[0]
num_errors, preds, acts = test_image_df(test_df)
print('\nAccuracy: {:.2f}, {}/{}'.format(1 - (num_errors/num_images), num_images - num_errors, num_images))

In [None]:
create_and_plot_confusion_matrix(acts, preds)

### Alternatively use the Keras predict_generator for dataset evaluation
Is convenient, but doesn't easily give access to the prediction mistakes.

In [None]:
test_gen.reset()
test_preds = model.predict_generator(test_gen, steps=test_df.shape[0], verbose=0)

In [None]:
preds = np.argmax(test_preds, axis=1)
acts  = np.asarray(test_gen.classes)

In [None]:
create_and_plot_confusion_matrix(acts, preds)

### Test model against previously unseen images
Here we download images that the algorithm has not yet seen.

In [None]:
!wget -q -O northern-flicker-1.jpg https://upload.wikimedia.org/wikipedia/commons/5/5c/Northern_Flicker_%28Red-shafted%29.jpg
!wget -q -O northern-cardinal-1.jpg https://cdn.pixabay.com/photo/2013/03/19/04/42/bird-94957_960_720.jpg
!wget -q -O blue-jay-1.jpg https://cdn12.picryl.com/photo/2016/12/31/blue-jay-bird-feather-animals-b8ee04-1024.jpg
!wget -q -O blue-jay-2.jpg https://www.pennington.com/-/media/Images/Pennington-NA/US/blog/Wild-Bird/Blue-Jays/Blue-Jay-Eating-Peanuts.jpg
!wget -q -O hummingbird-1.jpg http://res.freestockphotos.biz/pictures/17/17875-hummingbird-close-up-pv.jpg
!wget -q -O northern-cardinal-2.jpg https://www.allaboutbirds.org/guide/assets/photo/63667291-480px.jpg
!wget -q -O american-goldfinch-1.jpg https://download.ams.birds.cornell.edu/api/v1/asset/59574291/medium
!wget -q -O purple-finch-1.jpg https://indianaaudubon.org/wp-content/uploads/2016/04/PurpleFinchRyanSanderson-e1463792335814.jpg
!wget -q -O purple-finch-2.jpg https://www.singing-wings-aviary.com/wp-content/uploads/2018/06/Purple-Finch.jpg
!wget -q -O mallard-1.jpg https://www.herefordshirewt.org/sites/default/files/styles/node_hero_default/public/2018-01/Mallard%20%C2%A9%20Mark%20Hamblin.jpg
!wget -q -O bobolink.jpg https://upload.wikimedia.org/wikipedia/commons/9/9a/Bobolink_at_Lake_Woodruff_-_Flickr_-_Andrea_Westmoreland_%281%29.jpg

In [None]:
predict_bird_from_file('northern-cardinal-2.jpg')
predict_bird_from_file('bobolink.jpg')