# Backyard Bird Classifier

## 1. Business Understanding

Project FeederWatch, operated by the Cornell Lab of Ornithology and Birds Canada, is a community science program that encourages individuals across North America to monitor and report bird species visiting their feeders. The project has been running since 1987 and provides a valuable long-term dataset to scientists studying the distribution, abundance, and migration patterns of wintering bird species.

Participants submit bird counts from their homes, nature centers, schools, or community spaces using a flexible schedule between November and April. These data have been used in numerous scientific studies, revealing trends in species ranges, population shifts, and feeder use. However, because observations are made by people of all skill levels—many of them amateur birders—identification errors are common. This introduces noise into the dataset, potentially obscuring long-term trends and biological signals.

To mitigate this limitation, this project proposes a computer vision approach to improve the consistency and accuracy of bird species identification from images. By training a machine learning model to classify feeder birds, we can provide an automated or semi-automated tool to assist human observers and validate their reports.

Cornell’s NABirds dataset contains over 48,000 annotated images of 555 bird species commonly observed in North America. This project will subset that dataset using Cornell’s list of the 100 most common feeder birds and train a deep learning model to recognize them from photographs. The long-term goal is to support systems that can automatically identify feeder birds from camera traps or assist FeederWatch participants in double-checking their own identifications.

### Objective

Build and evaluate an image classification model that can:

- Automatically identify bird species from a curated list of the 100 most common feeder birds
- Achieve greater than 80% overall classification accuracy
- Provide per-class metrics such as precision, recall, and F1-score to assess individual species performance
- Incorporate interpretability tools (e.g., LIME and Grad-CAM) to explain model predictions
- Be exportable to lightweight formats (e.g., ONNX) for deployment on edge devices like the Raspberry Pi

### Data Sources

- NABirds Dataset: [https://dl.allaboutbirds.org/nabirds](https://dl.allaboutbirds.org/nabirds)
- FeederWatch Program: [https://feederwatch.org](https://feederwatch.org)
- 100 Common Feeder Birds: [https://feederwatch.org/learn/common-feeder-birds/](https://feederwatch.org/learn/common-feeder-birds/)

### Data Understanding

#### Dataset Access

This notebook uses a public subset of the NABirds dataset hosted on Kaggle:

**Dataset:** [Backyard Feeder Birds (NABirds Subset)](https://www.kaggle.com/datasets/jakemccaig/backyard-feeder-birds-nabirds-subset)  
**Owner:** [jakemccaig](https://www.kaggle.com/jakemccaig)  
**Structure:** ImageNet-style `train/`, `val/`, and `class_labels.txt`


In [1]:
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Data Prep/Cleaning

### Modeling

### Evaluation/Interpretability

### Deployment