# Transfer Learning is for the Birds

Patrick Wagstrom <patrick@wagstrom.net>

June 2018

## Project Abstract

One of the most popular applications of deep learning is for image classification. To get a truly great classifier, most applications require networks with dozen of different layers that may take weeks to train on a distributed cluster of GPUs. This cost makes the prospect of an individual training a high accuracy custom classifier daunting, at best. However, there is another way. Transfer learning with deep learning neural networks allows you to take advantage of most of the feature detection inherent in a complex deep learning neural network without the extensive training times. In this talk I'll share the beginning of training and collecting data for a custom classifier in a different domain - identification of common birds. It will cover the basics of neural networks, transfer learning, and show how to apply transfer learning to pre-trained image classification networks from Google to achieve high precision classifiers with small amounts of training time.


## Project Background
This is a simple project that I'm using to try and generate some models to automatically classify the different birds that visit the various feeders in my yard.

Useful references:
* https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10
* https://github.com/tzutalin/labelImg
* https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/#3
* https://github.com/datitran/raccoon_dataset

<img src="images/amazon deeplens.png" width="90%">

<center><img src="images/tasks_2x.png" width="20%"></center>

<center>
<p style="border: 1px solid #ccc; background-color: #ffe; font-size: 80%; padding: 1ex; text-align: center;">In the 60s, Marvin Minsky assigned a couple of undergrads to spend the summer programming a computer to use a camera to identify objects in a scene. He figured they'd have the problem solved by the end of the summer. Half a century later, we're still working on it.</p>
<p>From: https://xkcd.com/1425/</p>
</center>

# Four General Problems in AI

* Regression
* Clustering
* Dimensionality Reduction
* Classification

# Two General Data Strategies

* Supervised
* Unsupervised

# This Problem is Supervised Classification

We are trying to classify an image according to the species of bird(s) in the image.

It is likely that all of the feeder bird species in Connecticut have already been discovered, so we are working with an established taxonomy. Our classifiecations should fit into this established taxonomy, therefore we will use supervised (labeled) training data.

This is a perfect use case for a neural network.

# Inception

Google has been really good about releasing their models. One of those robust is called "Inception", which, if I were to train myself would cost nearly $100,000. I'm not *that interested* in seeing the birds at my feeders. Nor do I have the individual skill to build such a network.

<img src="images/inception architecture.png" style="margin-left: auto; margin-right: auto;">
<!-- image from: https://hackathonprojects.files.wordpress.com/2016/09/74911-image03.png via https://hacktilldawn.com/2016/09/25/inception-modules-explained-and-implemented/ -->

Sometimes the art of discovering appropriate neural network topologies is called "graduate student descent". It's not trivial and it's not always claer how to architect these networks.

# Quick Inception Recognizer Demo

# Generally Available Data Sets


Wah et. al. compiled the [Caltech-UCSD Birds 200-2011 dataset](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html). It contains 200 different tagged bird species with bounding boxes! This should kickstart my model. Plus, it's well studied and people have built classifiers on the data before.

**Citation:** Wah C., Branson S., Welinder P., Perona P., Belongie S. “The Caltech-UCSD Birds-200-2011 Dataset.” Computation & Neural Systems Technical Report, CNS-TR-2011-001. [[PDF Download](http://www.vision.caltech.edu/visipedia/papers/CUB_200_2011.pdf)]


In [2]:
from tqdm import tqdm_notebook as tqdm
import os
from typing import Mapping, Any

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from PIL import Image
import numpy as np
import ipywidgets as widgets
from ipywidgets import interact

CUB_DATA_DIR = "CUB/CUB_200_2011"

test_train = pd.read_table(os.path.join(CUB_DATA_DIR, "train_test_split.txt"), names=["id", "set"], sep=" ")
image_classes = pd.read_table(os.path.join(CUB_DATA_DIR, "image_class_labels.txt"), names=["id", "class_id"], sep=" ")
classes = pd.read_table(os.path.join(CUB_DATA_DIR, "classes.txt"), names=["class_id", "class_name"], sep=" ")
images = pd.read_table(os.path.join(CUB_DATA_DIR, "images.txt"), names=["id", "filename"], sep=" ")
bounding_boxes = pd.read_table(os.path.join(CUB_DATA_DIR, "bounding_boxes.txt"), names=["id", "bb_x", "bb_y", "bb_width", "bb_height"], sep=" ")
images = images.merge(test_train, on="id")
images = images.merge(image_classes, on="id")
images = images.merge(bounding_boxes, on="id")
images = images.merge(classes, on="class_id")
images["filename"] = images["filename"].map(lambda x: os.path.join(CUB_DATA_DIR, "images", x))

In [5]:
def show_row(row: Mapping[str, Any]):
    im = np.array(Image.open(row["filename"]), dtype=np.uint8)

    # Create figure and axes
    fig,ax = plt.subplots(1)

    # Display the image
    ax.imshow(im)

    # Create a Rectangle patch
    rect = patches.Rectangle((row["bb_x"], row["bb_y"]),
                             row["bb_width"], row["bb_height"],
                             linewidth=3,edgecolor='r',facecolor='none')

    # Add the patch to the Axes
    ax.add_patch(rect)
    ax.set_axis_off()
    
    plt.title(row["class_name"])
    plt.show()

In [6]:
images.head()

Unnamed: 0,id,filename,set,class_id,bb_x,bb_y,bb_width,bb_height,class_name
0,1,CUB/CUB_200_2011/images/001.Black_footed_Albat...,0,1,60.0,27.0,325.0,304.0,001.Black_footed_Albatross
1,2,CUB/CUB_200_2011/images/001.Black_footed_Albat...,1,1,139.0,30.0,153.0,264.0,001.Black_footed_Albatross
2,3,CUB/CUB_200_2011/images/001.Black_footed_Albat...,0,1,14.0,112.0,388.0,186.0,001.Black_footed_Albatross
3,4,CUB/CUB_200_2011/images/001.Black_footed_Albat...,1,1,112.0,90.0,255.0,242.0,001.Black_footed_Albatross
4,5,CUB/CUB_200_2011/images/001.Black_footed_Albat...,1,1,70.0,50.0,134.0,303.0,001.Black_footed_Albatross


# Generally Available Data Sets

Using widgets in a Jupyter notebook, we can browse around through the images in the dataset. What we quickly see is that the CUB dataset isn't going to be super useful. Although there are good classifiers for it, I don't have many white pelicans or green kingfishers visiting my feeders in Connecticut.

In [11]:
def slider_plot(i):
    show_row(images.iloc[[i]].squeeze())

slider = interact(slider_plot, i=(0, images.shape[0]-1))

interactive(children=(IntSlider(value=5893, description='i', max=11787), Output()), _dom_classes=('widget-inte…

# Capturing Your Own Data

<center><video id="sampleMovie" src="data/video/IMG_6885.m4v" width="85%" controls style="margin-left: auto; margin-right: auto; margin-top: 2em;"></video></center>

# Capturing Your Own Data

One of the hardest things about doing machine learning is training data. There are entire companies built around making this process easier, such as Figure Eight. However, this is a small project, so we can _try_ to collect some of the data ourselves. In a completely unscientific manner, I took my iPhone and took a video of my bird feeders when I saw there were birds at the feeder.

On my iPhone I capture these videos at 1080p60. There's probably a case to be made for capturing at 4k30 (which is the highest resolution my iPhone 7 supports) as we're not going to label each and every frame of the data. There's too much similarity between the frames. The first step is to extract the still frames from the video. To do this let's use the standby of `ffmpeg` and convert the video into PNG files, which gives a higher quality image. This script will take all of the `MOV` and `m4v` files in a directory and slice them up for you. 

```bash
for x in *.{MOV,m4v}; do
    EXT="${x##*.}"
    BASENAME="${x%.*}"
    ffmpeg -i $x -vf "select=not(mod(n\,50))"\
                 -vsync vfr -f image2 ${BASENAME}_%05d.png
done
```

# Labeling the Data

Most image classifiers are _supervised_, which means I need to label my data. This is where the [labelImg](https://github.com/tzutalin/labelImg) project comes in handy by providing a GUI to quickly label data. I did most of this over about an hour.

<center>
<img src="images/labeling.png" width="70%">
</center>


# Splitting Test/Train

Typically you should split your data into something like a 70/30 split for test train. This gave me 149 training images and 77 test images. A pretty small dataset.

```bash
for x in *.png; do
  if [ $(( $RANDOM % 10)) -ge 7 ]; then
    BASENAME="${x%.*}"
    mv $BASENAME.png ../test
    mv $BASENAME.xml ../test
  fi
done
```

# A Tiny Bit About Transfer Learning

Transfer Learning allows us to use knowledge (i.e. parameters) from one problem an apply it to another related problem. This is particularly useful when the related problem has very little training data, which is the case with my bird classifier.

# Three Different Model Iterations

# Summary


Working off high resolution images isn't a better thing. Here it caused challenges with computation time and model detection. The model could only detect 300 objects in an image, and when looking for very small items in an image, this caused a lot of false positives.

Variation of training data was also problematic. I only had a limited number of pieces of training video and most frames were similar to one another. Need an automated camera to capture more data.

Sometimes, you just need to sleep on it. Go back again later, look at your results, and you'll get insight into what went wrong.