# BirdID
#### Project #1
#### 4152 Computer Vision - Course Project - Micah Thomas

![BirdGif1](demo_assets/one.gif "one")

# Introduction

**BirdID is a machine learning project used to predict the species of local birds in my backyard.**

Team Members: Micah Thomas

# Problem & Motivation

This past summer, as the coronavirus pandemic forced people to stay home, I became more interested in outdoor hobbies, especially gardening. While working in my garden, I began to notice many different species of birds that I have never paid attention to before.

For my computer vision course project, I decided it would be a great chance to combine many of interests and build a ML model for bird species classification.

Goals:
* Build a bird feeder and collect data of birds
* Train a ML model to accuractly label bird species
* Predict bird species in real time

I began the project by first searching for datasets online:
* The Cornell Lab of Ornithology offers a free-to-use North America bird dataset, featuring 400 bird species and 48,000 photos with labels.
* The California Institute of Technology offers a free-to-use *mainly* North America bird dataset, featuring 200 species and 6,033 images with bounding boxes and labels.

However, starting this project in late Fall moving into Winter, I noticed there were only a handful of birds in my backyard. I began taking pictures and was able to identify the following 8 winter birds:

### Blue Jay

<img src="demo_assets/blue_jay.png" alt="Blue Jay" width="400"/>

### Brown-headed Nuthatch

<img src="demo_assets/brown_headed_nuthatch.jpg" alt="Brown-headed Nuthatch" width="400"/>

### Cardinal

<img src="demo_assets/cardinal.jpg" alt="Cardinal" width="400"/>

### Carolina Chickadee

<img src="demo_assets/carolina_chickadee.jpg" alt="Carolina Chickadee" width="400"/>

### Carolina Wren

<img src="demo_assets/carolina_wren.jpg" alt="Carolina Wren" width="400"/>

### Downy Woodpecker

<img src="demo_assets/downy_woodpecker.jpg" alt="Downy Woodpecker" width="400"/>

### Red-bellied Woodpecker

<img src="demo_assets/red_bellied_woodpecker.jpg" alt="Red Bellied Woodpecker" width="400"/>

### Tufted Titmouse

<img src="demo_assets/tufted_titmouse.jpg" alt="Tufted Titmouse" width="400"/>

***

These 8 birds were the only birds in my area, and some of these birds, such as the woodpeckers and the blue jays, are very rare to see. 

If only 8 birds are visiting the backyard, I realized that training the model on 400 bird species would create a more general model, but I rather limit the model to make its predictions of only the 8 birds that I know are present. 

Assuming a model that purely predicts classes randomly (i.e. no learning), the probability of guessing a bird correctly from 400 species is 0.25%, while the probability of guessing a bird correctly from 8 species is 12.5%.

This means that I could simply use the Caltech and Cornell datasets and **only** select those 8 birds, but then I would only have between 18 and 180 pictures of each bird and I wanted to train this model with lots and lots of data.

So, I decided to make this project a whole lot more difficult for me and to collect all of my data myself.

In order to predict the species of a bird, I knew I needed to setup an environment outside that could attract birds as well as take photos of them in a consistent manner.

With this intuition, I decided to build a bird feeder and mount a camera that is always streaming to my computer. The goal now was to create an algorithm that would detect objects (the object hopefully being a bird) and then save continous captures of that object to my local computer. I would then use those images to train a convolutionary neural network using FastAI.

# Dataset

### Building the bird feeder

<img src="demo_assets/bird_feeder.gif" alt="Bird Feeder" width="600" align="left"/>

First, I bought a simple platform bird feeder from a store. I then heavily modified it. As you can see in the animation above, the roof was lifted and the central support beam in pink was shifted to the back. I then found a waterproof box and mounted it to the back of the bird feeder. A webcam was then placed inside the bird feeder and I ran a 180ft cable from the bird feeder to my upstairs computer.

A few notes about the bird feeder:
Before implementing the webcam and cable solution, I originally was using a raspberry pi and camera module to wirelessly stream the video captures from the bird feeder to my computer. Although this solution was more compact, the frame rate drop from spotty network connectivity was horrible.

This prompted me to switch to a wired solution. The webcam is a standard USB webcam and something I found interesting while building this is that USB 2.0 can only send data a lenth of 5 meters (16 ft) and USB 3.0 is actually slightly worse at 3 meters (10 ft). There is a solution though, it turns out you can send USB data over ethernet cables, and this is purely because the data wires inside of the cable are twisted at a precise angle and this drastically reduces interference and noise.

With a Cat6 ethernet cable, you can send USB data up to 200ft and my computer is about 172 feet away from the bird feeder.

<table>
   <tr>
      <td><img src="demo_assets/side_view_feeder.jpg" alt="Side View Bird Feeder" width="600"/></td>
      <td><img src="demo_assets/feeder_angle_back_view.jpg" alt="Back Angle View Bird Feeder" width="600"/></td>
   </tr>
</table>

So the images above show the final build. I know it's not pretty, and I was worried the birds were going to be scared of it, but thankfully they didn't seem to mind.

### Capturing data

You may be wondering why there is a large white wooden back drop behind the feeder, and that has everything to do with how my image capturing algorithm works.

In order to capture images of birds, I need the webcam to be continuosly streaming footage to my computer. Whenever a bird comes to the feeder, I want the webcam to take a bunch of photos, but if no bird is present, I definitely don't want it to take photos. 

So how do I detect the precense of birds without building a machine learning model? And the answer is to simplify the problem. Instead of recognizing birds and taking photos, I need to simply recognize objects that were not present when the algorithm started. So if you compare the current frame of the webcam to the first frame of the webcam, you can take the difference and the result will provide some information. If there is no difference, there is no object. If there is a difference, there is likely an object. 

<img src="demo_assets/diff_thresh_still.png" alt="Difference and Thresholding - Still" width="600"/>

<img src="demo_assets/diff_thresh.gif" alt="Difference and Thresholding - GIF" width="600"/>

* On the upper left you see the raw footage, granted with some bounding boxes overlaid but that is actually drawn later.

* On the upper right you see the grayscale footage, which is then, for lack of better words, subtracted from the first frame in grayscale.

* The difference is displayed in the lower left and you can see for the most part is just looks like the ghost of a bird on a black background.

* On the lower right you see the threshold frame, which is made from the difference frame and simply makes pixels above a certain threshold value pure white and everything else pure black.

***

Now purely in an effort to save time, I'm not going to dive deep onto the object detection algorithm but in short:

OpenCV makes it easy to work with something called a contour. A contour is an outline that represents the shape of something and it is mathematically different than an edge. 

So I apply OpenCV contour filtering to the threshold frame on the lower right, and when a certain amount of contours are found, I draw a bounding box that represents the max height and width of the contours found in a given frame, and that box is seen overlaid on the upper left frame.

At this point the algorithm is quite trivial, if a bounding box exists- its because an object is in the frame, so start taking a bunch of photos and save them locally.

When I first wrote this script I encountered a bunch of issues, mainly that the bird feeder is hung from a cable. So if any wind blows it even slightly, technically the current frame is different than the initial frame, so it will take photos. Furthermore, if a shadow appears on a tree in the background, it will take photos. 

After taking about 14,000 photos of nothingness, I placed the white board behind the bird feeder and reoriented the entire system so that the sun wouldn't cast shadows and this worked great. 

At this point I could now run the algorithm for hours at a time and it would capture photos of objects that entered the scene. 

Apart from a few squirrels, this system allowed me to collect over 60,000 images of birds, of which I used about 20,000 of.

### Verifying the Data

In [11]:
from fastai.vision.all import *
path = Path(os.getcwd())

train_df = pd.read_csv(path/"train_df.csv")
test_df = pd.read_csv(path/"test_df.csv")

At this point, it's time to label my data. As mentioned earlier I have 8 classes and **20,398** usable images. Unfortunately there is no shortcut here, I manually labeled all of these photos by drag and dropping them into their corresponding folder.

I also took photos from the Caltech and Cornell datasets, but this only added +60 images per bird species.

***

The first step here is to go through each folder (remember, each folder contains images related to that class) and create a legend that will inform FastAI of what images belong to what class.

This is what that looks like. Each row is an entry. The column `fname` is the name of the photo, which happens to be the time the photo was captured. The column `rpath` is the relative path of where the photo is located and the column `label` is the photo's corresponding label.

In [12]:
train_df.head()

Unnamed: 0,fname,label,fpath,rpath
0,2021-11-07--10-28-16-767.jpg,Blue Jay,C:\Users\micah\rig-uni\bird-ID-2\data\all_data\Blue Jay\2021-11-07--10-28-16-767.jpg,Blue Jay/2021-11-07--10-28-16-767.jpg
1,2021-11-07--10-28-17-311.jpg,Blue Jay,C:\Users\micah\rig-uni\bird-ID-2\data\all_data\Blue Jay\2021-11-07--10-28-17-311.jpg,Blue Jay/2021-11-07--10-28-17-311.jpg
2,2021-11-08--11-58-29-841.jpg,Blue Jay,C:\Users\micah\rig-uni\bird-ID-2\data\all_data\Blue Jay\2021-11-08--11-58-29-841.jpg,Blue Jay/2021-11-08--11-58-29-841.jpg
3,2021-11-08--11-58-30-925.jpg,Blue Jay,C:\Users\micah\rig-uni\bird-ID-2\data\all_data\Blue Jay\2021-11-08--11-58-30-925.jpg,Blue Jay/2021-11-08--11-58-30-925.jpg
4,2021-11-08--11-58-31-465.jpg,Blue Jay,C:\Users\micah\rig-uni\bird-ID-2\data\all_data\Blue Jay\2021-11-08--11-58-31-465.jpg,Blue Jay/2021-11-08--11-58-31-465.jpg


**Class Distributions**

| Index | Class                  | Count |
|-------|------------------------|-------|
| 0     | Blue Jay               | 92    |
| 1     | Brown-headed Nuthatch  | 353   |
| 2     | Cardinal               | 2519  |
| 3     | Carolina Chickadee     | 14675 |
| 4     | Carolina Wren          | 625   |
| 5     | Downy Woodpecker       | 221   |
| 6     | Red-bellied Woodpecker | 86    |
| 7     | Tufted Titmouse        | 1827  |

***

<img src="demo_assets/class_distrubutions.png" alt="Class Distributions" width="600" align="left"/>

As you can see, the dataset is highly imbalanced.

* **Blue Jay** accounts for **0.017%** of the total images.
* **Brown-headed Nuthatch** accounts for **1.487%** of the total images.
* **Cardinal** accounts for **12.499%** of the total images.
* **Carolina Chickadee** accounts for **74.155%** of the total images.
* **Carolina Wren** accounts for **2.702%** of the total images.
* **Downy Woodpecker** accounts for **0.509%** of the total images.
* **Tufted Titmouse** accounts for **8.632%** of the total images.

***

If I were to train the model right now, it's possible it would learn but it's much more likely that it would predict that a bird is a Carolina Chickadee 74% of the time, so the dataset will have to be over and under sampled. But first, we need to split the dataset into a train and test set. The validation set will be created from FastAI itself.

### Test Train Split

In [17]:
%%capture

def test_train_split(df, test_split_percent, label_helper):
    frames = []
    for i in range(len(label_helper)):
        frames.append(df.groupby(['label']).get_group(label_helper[i][0]).reset_index())
    test_frames = []
    train_frames = []
    for i in range(len(frames)):
        dff = frames[i]
        x = math.floor(dff.shape[0] * test_split_percent)
        indices = np.random.choice(dff.index, x, replace=False)
        test_frames.append(dff.iloc[indices].reset_index().drop(['level_0', 'index'], axis=1))
        train_frames.append(dff.drop(indices).reset_index().drop(['level_0', 'index'], axis=1))
    test_df = pd.concat(test_frames)
    train_df = pd.concat(train_frames)
    return test_df, train_df

First, I wrote a function to split the dataset. You can define a `test_split_percent`, which will simply randomly choose that percentage of images from each class and move them to a separate dataset. I chose 0.2, meaning 20 percent of each class will become the test set. 

In [8]:
test_df.groupby(['label']).count()

Unnamed: 0_level_0,fname,fpath,rpath
label,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Blue Jay,18,18,18
Brown-headed Nuthatch,70,70,70
Cardinal,503,503,503
Carolina Chickadee,2935,2935,2935
Carolina Wren,131,131,131
Downy Woodpecker,44,44,44
Red-bellied Woodpecker,17,17,17
Tufted Titmouse,365,365,365


Above is the test set, notice that there is still a data imbalance here but unlike in the training set, this is perfectly acceptable.

### Over & Under Sampling

In [16]:
%%capture

## Deletes 'remove_n' random rows
def undersample(df, count, target):
    remove_n = count - target
    drop_indices = np.random.choice(df.index, remove_n, replace=False)
    df_subset = df.drop(drop_indices)
    return df_subset

## Duplicates 'duplicate_n' random rows
def oversample(df, count, target):
    duplicate_n = math.ceil(target / count)
    df_over = pd.concat([df]*duplicate_n)
    over_count = df_over.shape[0]
    if over_count > target:
        remove_n = over_count - target
        df_over = df_over.iloc[:-remove_n]
    return df_over

def over_under_sample(df, target, labels):
    frames = []
    for i, label in enumerate(labels):
        dff = df[df.label == labels[i]].reset_index()
        dff_count = dff.shape[0]
        if dff_count > target:
            dff = undersample(dff, dff_count, target)
        elif dff_count < target:
            dff = oversample(dff, dff_count, target)
        dff.reset_index()
        frames.append(dff)
    balanced_df = pd.concat(frames)
    return balanced_df.reset_index().drop(['level_0', 'index'], axis=1)

Next, I wrote the dataset balancer. You can pass a number, `target`, and it will either oversample (randomly duplicate images) or undersample (randomly delete images) until each class has the exact same amount of images.

In [9]:
train_df.groupby(['label']).count()

Unnamed: 0_level_0,fname,fpath,rpath
label,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Blue Jay,1000,1000,1000
Brown-headed Nuthatch,1000,1000,1000
Cardinal,1000,1000,1000
Carolina Chickadee,1000,1000,1000
Carolina Wren,1000,1000,1000
Downy Woodpecker,1000,1000,1000
Red-bellied Woodpecker,1000,1000,1000
Tufted Titmouse,1000,1000,1000


<img src="demo_assets/over_under_sample.png" alt="Over Under Sample" width="600" align="left"/>

After balancing the dataset, each class has 1000 samples and this is where we want to be. There is just one more thing we need to take care of before we can start training.

The class `Carolina Chickadee` was undersampled, so over 13,000 images are not being used but the 1000 images that are being used are entirely unique. 

The class `Blue Jay` was oversampled. It only had 92 images to begin with and we pretty much duplicated images until there were 1000, so 908 images are not unique. If we trained the model with this, it could become overfit, so the solution is to do some data augmentation.

### Data Augmentation and Transformation

Below is a sample of one batch of data after performing some data augmentations and transforms using FastAI. 

Each image in a given class has a random probability of having one or multiple transformations applied to it, and these transformations are:
* +- Contrast
* +- Saturation
* +- Brightness
* +- Zoom
* +- Rotated
* +- Warped

After these transformations, there are no unique images in any class. Every image that the network trains off of is different, even if the difference is slight.

Additionally, all images are reszed to 240px by 320px, which is half their original resolution of 480px by 640px. 

<img src="demo_assets/one_batch.png" alt="One Batch" width="800"/>

Here you can see some of these images are rotated and zoomed in.

# Methodology

The position of the birds will generally be in the same spot, the bottom center, but this is not always the case. Blue jays and Red-bellied Woodpeckers are so large that they barely fit inside the frame, and Carolina Wrens often get really close to the camera and appear dark. All of the birds move horizontally and it's important that their relative location is not the identifying feature that the model learns.

For most use cases, it is very important in image classification to use convolutional neural networks because they summarize all of the features seen in an image. This is in contrast to a traditional fully connected layer where the location of a feature is the actual input to the model, here a feature is the input.

I trained 15 models total- the majority of these models were pretrained ResNet models, but a few were built using PyTorch's `nn.Sequential` class method. The model that performed the best was an implementation of ResNet50, so I will focus on that.

***

This problem was solved using FastAI 2 and a pretrained convolution model called **ResNet50**.

ResNet50 has 48 convolution layers, 1 maxpool and 1 average pool layer.

The convolution layers create feature maps that *hopefully* identify features such as lines and edges. These maps, however, record the location of these features as well so the pooling layers are responsible for making it such that the position of a feature is irrelevant. 

ResNet50 was originally built for classifying animals into 1000 categories, so my belief is that the model already has the ability to recognize many different features such as colors and lines and angles. I can utilize transfer learning and retrain the model to work for my problem, which is classifying 8 species of birds.

<img src="demo_assets/resnet50.jpg" alt="Resnet50 Architecture" width="800"/>

> https://stackoverflow.com/questions/54943307/create-cnn-model-architecture-diagram-in-keras

> Optimized Deep Convolutional Neural Networks for Identification of Macular Diseases from Optical Coherence Tomography Images - Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/Left-ResNet50-architecture-Blocks-with-dotted-line-represents-modules-that-might-be_fig3_331364877 [accessed 27 Nov, 2021]

**Architecture**  
ResNet50

**Parameters**  
Total parameters: 21,816,128

**Hyperparameters**  
Learning rate is determined using FastAI's `learner.lr_find()` class method, which helps find the learning rate whose slope has the greatest negative value. On average, this value was `10E-4.5`.   `24` epochs over 4 training cycles.

**Loss Function**  
ResNet50 uses `FlattenedLoss of CrossEntropyLoss` as its loss function. Cross-entropy calculates the different between two probability distributions, which is the output of the model.

**Performance Metric**  
I chose `Error Rate` as my metric, which is really just `1 - Accuracy`.

***

### Training

In the interest of saving time, I'm just going to quickly scroll through the training cycles. 

High level- there are 4 distinct training cycles using FastAI's `fit_one_cycle` and `fine_tune`. 

***

##### Training Cycle 1

**Epochs: `5`**  
**Alpha: `10E-2.5`**  

<table>
   <tr>
      <td><img src="demo_assets/lrfind_1.png" alt="LR Find 1" width="600"/></td>
      <td><img src="demo_assets/lrresult_1.png" alt="LR Result 1" width="600"/></td>
   </tr>
</table>

<img src="demo_assets/cm_1.png" alt="Confusion Matrix 1" width="400" align="left"/>

***

##### Training Cycle 2

**Epochs: `8`**  
**Alpha: `10E-3`**  

<table>
   <tr>
      <td><img src="demo_assets/lrresult_2.png" alt="LR Result 2" width="600"/></td>
      <td><img src="demo_assets/cm_2.png" alt="Confusion Matrix 2" width="600"/></td>
   </tr>
</table>

***

##### Training Cycle 3

**Epochs: `3`**  
**Alpha: `10E-6`**  

<table>
   <tr>
      <td><img src="demo_assets/lrfind_3.png" alt="LR Find 3" width="600"/></td>
      <td><img src="demo_assets/lrres_3.png" alt="LR Result 3" width="600"/></td>
   </tr>
</table>

<img src="demo_assets/cmatrix_3.png" alt="Confusion Matrix 3" width="400" align="left"/>

***

##### Training Cycle 4

**Epochs: `6`**  
**Alpha: `10E-6.15`**  

<table>
   <tr>
      <td><img src="demo_assets/lrfind_4.png" alt="LR Find 4" width="600"/></td>
      <td><img src="demo_assets/lrres_4.png" alt="LR Result 4" width="600"/></td>
   </tr>
</table>

<img src="demo_assets/cm_4.png" alt="Confusion Matrix 4" width="400" align="left"/>

As you can see, by the fourth training cyle we have a `train_loss` of 0.027 and a `valid_loss` of 0.023. On our first training cycle `train_loss` was 0.81 and `valid_loss` was 0.69.

# Results

<img src="demo_assets/final_report.png" alt="Final Report" width="800" align="left"/>

**Precision:** ratio `tp / (tp + fp)`

**Recall:** ratio `tp (tp + fn)`

**F1-score:** weighted mean of precision and recall, 1 is best, 0 is worst.

As you can see from the report generated from FastAI, the model has great results from the train and valid set. It appears that the Tufted Titmouse is our worse performer, but it still performed very well.

### Making Predictions with the Test Set

To make sure the model is not overfit and to evaluate its performance before making live predictions, we must test the model by making predictions on the test set.

**Accuracy**: 96.72  
**Score**: 1915/1980  

Out of 1980 images the model has not yet seen, it predicted 1915 of them correctly.

### Live Predictions

<table>
   <tr>
      <td><img src="demo_assets/pred2.png" alt="Pred 2" width="600"/></td>
      <td><img src="demo_assets/pred3.png" alt="Pred 3" width="600"/></td>
   </tr>
</table>

<table>
   <tr>
      <td><img src="demo_assets/pred4.png" alt="Pred 4" width="600"/></td>
      <td><img src="demo_assets/pred5.png" alt="Pred 5" width="600"/></td>
   </tr>
</table>

<table>
   <tr>
      <td><img src="demo_assets/pred6.png" alt="Pred 6" width="600"/></td>
      <td><img src="demo_assets/pred7.png" alt="Pred 7" width="600"/></td>
   </tr>
</table>

<table>
   <tr>
      <td><img src="demo_assets/pred1.gif" alt="Pred 1" width="600"/></td>
      <td><img src="demo_assets/pred8.png" alt="Pred 8" width="600"/></td>
   </tr>
</table>

Eventually, squirrels starting becoming aware of the bird feeder and became a problem. As you can see, the model is quite certain that this squirrel is a Tufted Titmouse, and honestly I can't blame it. Of the 8 birds it knows of, I too think that a squirrel looks most similar to a Tufted Titmouse.

<img src="demo_assets/sqi.gif" alt="Squirrel Trouble" width="600"/>

Due to squirrels using the white board to climb up to, I moved the bird feeder away from the board.It is encouraging to note that the model still performed well even when this familiar background was removed.

<img src="demo_assets/sanity_check.gif" alt="Sanity Check" width="600"/>

# Conclusion

To conclude, **BirdID** is a machine learning project used to predict the species of local birds in my backyard. The ML model was trained via transfer learning using the ResNet50 convolutional neural network with 8000 images of 8 local bird species collecting using object detection techniques.

The model is set up to easily accommodate more classes as I collect more data.

This was a difficult project, mainly because I chose to collect my own dataset, but I learned a lot about computer vision, machine learning, and birds in the process. I graduate in December and I plan to start working on BirdID version 2 in my free time.

# Q&A

Any questions?