<h2><center> Image Classification of Birds using the iNaturalist Dataset</center></h2>
<h3><center>Springboard | Capstone 2: In-Depth Analysis/Modeling </center></h3>
<h4><center>By: Lauren Broussard</center></h4>

In [None]:
# !pip install tensorflow
# !pip install keras

First, we'll import the necessary packages. 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split # to split into training and testing data 

import keras
from keras.callbacks import EarlyStopping
from keras.preprocessing.image import ImageDataGenerator # for image processing
from keras.models import Sequential,load_model #used to build initial model 
from keras.layers import Activation, Dense, Dropout, Flatten #Dense used to add layers to model
from keras.layers import Conv2D, MaxPooling2D
from sklearn.metrics import classification_report, confusion_matrix
from keras.optimizers import SGD #this is for optimization on learning rate

----

We will run our project from end to end with this number of images to account for run time. 

In [None]:
IMG_COUNT = int(input("Total Images To Use: ")) 

----

### PRE-PROCESSING

Since the computing power on my personal laptop is limited, I will run the project end to end with a smaller amount of images, and will gradually increase the number of images over time. 

To this end, we'll import our original file, and grab a random subset of images to train, validate, and test with. Further, to balance out our dataset, we'll take an equal number of images showing birds and an equal number that are not birds. 

In [None]:
# import clean annotation file
df = pd.read_csv('../data/interim/wildlife_interim.csv')

In [None]:
# get random sample of images -- half birds, and half not birds
df_bird = df[df.is_bird == 1].sample(n=int(IMG_COUNT/2), random_state=42) # half birds
df_no_bird = df[df.is_bird == 0].sample(n=int(IMG_COUNT/2),random_state=42) # half not birds
df = pd.concat([df_bird, df_no_bird], axis=0) #merge two groups together
df.is_bird = df.is_bird.astype(str) # change bird indicator to string to fit model specs

In [None]:
# split dataframe into training, validation, and test dataframes
df_train, df_val = train_test_split(df,test_size=0.2, random_state=42)
df_train, df_test = train_test_split(df_train,test_size=0.3, random_state=42)

print("")
print("Number of Training Images: ",len(df_train))
print("Number of Validation Images: ", len(df_val))
print("Number of Test Images: ", len(df_test),'\n')

### SET UP 

We'll set some constant values for our model below. 

In [None]:
# Model configuration
img_dir = "../data/raw/" # where images are stored
img_width, img_height = 128,128 # resize images to account for smallest img size
batch_size = 128
no_epochs = 50
no_classes = 2 # choices of bird or not bird
patience = 7 # for Early Stopping callback 

print('img_dir = ', img_dir)
print('img_width, img_height = ',img_width, img_height)
print('batch_size = ', batch_size)
print('no_epochs = ', no_epochs)
print('no_classes = ', no_classes)
print('patience = ', patience, '\n')

#how many epochs to go without model improving 
early_stopping_monitor = EarlyStopping(patience=patience) 

#### Augment Images - Training

We'll do random augmentation of our each of our image subgroups, which will help rescale and shift them, and can help prevent overfitting for our training data.

In [None]:
%%time
print("")
print("Augment Images - Training")
train_datagen = ImageDataGenerator(
                    rescale=1./255, 
                    horizontal_flip = True,    
                    zoom_range = 0.3,
                    width_shift_range = 0.15,
                    height_shift_range=0.15)


train_generator = train_datagen.flow_from_dataframe(
                    dataframe= df_train,
                    directory=img_dir,
                    x_col="file_name",
                    y_col="is_bird",
                    batch_size=batch_size,
                    shuffle=True,
                    class_mode="binary",    
                    target_size=(img_height,img_width))


#### Augment Images - Validation and Testing

In [None]:
%%time
print("")
print("Augment Images - Validation and Testing")
val_datagen = ImageDataGenerator(rescale=1./255)

valid_generator=val_datagen.flow_from_dataframe(
                    dataframe=df_val,
                    directory=img_dir,
                    x_col="file_name",
                    y_col="is_bird",
                    batch_size=batch_size,
                    class_mode="binary",    
                    target_size=(img_height,img_width))

#### Define Functions for Plotting:

In [None]:
def plot_accuracy_and_loss(model):
    '''Create graphs of training and validation accuracy and loss.'''
    
    acc = model.history['accuracy']
    val_acc = model.history['val_accuracy']

    loss = model.history['loss']
    val_loss = model.history['val_loss']

    plt.figure(figsize=(10, 10))
    plt.subplot(1, 2, 1)
    plt.plot(acc, label='Training Accuracy')
    plt.plot(val_acc, label='Validation Accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Score')
    plt.legend(loc='lower right')
    plt.title('Training and Validation Accuracy')

    plt.subplot(1, 2, 2)
    plt.plot(loss, label='Training Loss')
    plt.plot(val_loss, label='Validation Loss')
    plt.xlabel('Epoch')
    plt.legend(loc='upper right')
    plt.title('Training and Validation Loss')
    plt.show()

----