# Bird Vocalization Classifier
 Transfer learning approach to classifying species via bird vocalization audio spectrograms.
 
![bird spectrograms](bird_spect.png)

## Abstract
Deep learning is an emerging field that has shown promising results for image classification.  The focus of this study was to analyze the performance of modern machine learning algorithms in the application of classifying bird species from recorded calls.  Bird songs are specific to each species and have a distinct signature, which was the aspect of the data being leveraged.  The data set was collected by various bird enthusiasts and uploaded to xeno-canto.org where the files are open to the public. Statistical analysis was performed on these calls and predictions were made on which species produced the audio which were limited to American Robins and Mourning Doves.  An audio spectrogram served as the input image to the neural networks while the raw audio signal was used for template matching.  The technique that had the highest accuracy was the transfer learning approach, which utilized the pre-existing neural network known as AlexNet.  The shallow neural net had a slightly lower accuracy, while the rudimentary Spectral Angle Mapper (SAM) classifier performed at the lowest accuracy.  All the classification techniques utilized have associated trade-offs which are explored in the conclusion of this study.


In [1]:

from __future__ import absolute_import, division, print_function, unicode_literals

import os
import pathlib
import numpy as np
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import IPython.display as display
from PIL import Image

import tensorflow as tf
keras = tf.keras

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout, Flatten, Conv2D, MaxPooling2D


## Define AlexNet
![AlexNet](AlexNet-1.png)
https://www.learnopencv.com/understanding-alexnet/


In [2]:

def compileAlexNet(numClasses,learningRate) :

    # Define base model (AlexNet)
    IMG_SHAPE = (224, 224, 3)

    #Instantiate an empty model
    model = Sequential()

    # 1st Convolutional Layer
    model.add(Conv2D(filters=96, input_shape=(IMG_SHAPE), kernel_size=(11,11), strides=(4,4), padding='valid'))
    model.add(Activation('relu'))
    # Max Pooling
    model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))

    # 2nd Convolutional Layer
    model.add(Conv2D(filters=256, kernel_size=(11,11), strides=(1,1), padding='valid'))
    model.add(Activation('relu'))
    # Max Pooling
    model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))

    # 3rd Convolutional Layer
    model.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='valid'))
    model.add(Activation('relu'))

    # 4th Convolutional Layer
    model.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='valid'))
    model.add(Activation('relu'))

    # 5th Convolutional Layer
    model.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='valid'))
    model.add(Activation('relu'))
    # Max Pooling
    model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))

    # Passing it to a Fully Connected layer
    model.add(Flatten())
    # 1st Fully Connected Layer
    model.add(Dense(4096, input_shape=(IMG_SHAPE[0]*IMG_SHAPE[1]*IMG_SHAPE[2],)))
    model.add(Activation('relu'))
    model.add(Dropout(.4))

    # 2nd Fully Connected Layer
    model.add(Dense(4096))
    model.add(Activation('relu'))
    model.add(Dropout(.4))

    # Output Layer
    model.add(Dense(numClasses))
    model.add(Activation('sigmoid'))

    #model.summary()

    # Compile the model
    model.compile(loss      = keras.losses.categorical_crossentropy,
                  optimizer = keras.optimizers.Adam(lr=learningRate),
                  metrics   = ["accuracy"])

    return model
    

# Function for Defining Data Generators

In [3]:

def defineGenerators(dataSet, targSize, shear=0.2, zoom=0.2, horzFlip=True) :
    
    if targSize[2] == 1 :
        colorType = "grayscale"
    else :
        colorType = "rgb"
    
    # Define Generators
    train_datagen = keras.preprocessing.image.ImageDataGenerator(
                    rescale=1./255,
                    shear_range=shear,
                    zoom_range=zoom,
                    horizontal_flip=horzFlip)

    valid_datagen = keras.preprocessing.image.ImageDataGenerator(rescale=1./255)

    test_datagen = keras.preprocessing.image.ImageDataGenerator(rescale=1./255)

    # Define Data Flow
    train_generator = train_datagen.flow_from_directory(
        directory="C:/Users/Nathan Kueterman/Documents/bird_GAN_python/2 Class/"+dataSet+"/Train/",
        target_size=(targSize[0:2]),
        color_mode=colorType,
        batch_size=32,
        class_mode="categorical",
        shuffle=True,
        seed=42)

    valid_generator = valid_datagen.flow_from_directory(
        directory="C:/Users/Nathan Kueterman/Documents/bird_GAN_python/2 Class/"+dataSet+"/Valid/",
        target_size=(targSize[0:2]),
        color_mode=colorType,
        batch_size=32,
        class_mode="categorical",
        shuffle=True,
        seed=42)

    test_generator = test_datagen.flow_from_directory(
        directory="C:/Users/Nathan Kueterman/Documents/bird_GAN_python/2 Class/"+dataSet+"/Test/",
        target_size=(targSize[0:2]),
        color_mode=colorType,
        batch_size=1,
        class_mode=None,
        shuffle=False,
        seed=42)
    
    return train_generator,valid_generator,test_generator


# Test Performance Using Only Real Data

In [4]:

# Train model - Only Real Data
train_generator,valid_generator,test_generator = defineGenerators("Gray", [224,224,3])

# Instantiate Model
model1 = compileAlexNet(train_generator.num_classes,1e-5)

# Train
STEP_SIZE_TRAIN = train_generator.n//train_generator.batch_size
STEP_SIZE_VALID = valid_generator.n//valid_generator.batch_size
model1.fit_generator(generator=train_generator,
                    steps_per_epoch=STEP_SIZE_TRAIN,
                    validation_data=valid_generator,
                    validation_steps=STEP_SIZE_VALID,
                    epochs=20)

model1.evaluate_generator(generator=valid_generator,
                         steps=STEP_SIZE_VALID)

STEP_SIZE_TEST = test_generator.n//test_generator.batch_size
test_generator.reset()
pred = model1.predict_generator(test_generator,
                             steps=STEP_SIZE_TEST,
                             verbose=1)

predicted_class_indices = np.argmax(pred,axis=1)

labels = (train_generator.class_indices)
labels = dict((v,k) for k,v in labels.items())

predictions = [labels[k] for k in predicted_class_indices]

# display test results
confMat1 = confusion_matrix(test_generator.classes, predicted_class_indices)
print("Confusion matrix:\n%s" % confMat1)


Found 918 images belonging to 2 classes.
Found 80 images belonging to 2 classes.
Found 80 images belonging to 2 classes.
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Confusion matrix:
[[36  4]
 [ 3 37]]


# Test Performance Using Only Synthetic Data
![gan spectrograms](robin_dove_comparison.png)
<center>Synthetic Spectrograms: American Robin (left) vs Mourning Dove (right)</center>


In [5]:

# Train model - Only Synthetic Data
train_generator,valid_generator,test_generator = defineGenerators("GAN", [224,224,3])

# Instantiate Model
model2 = compileAlexNet(train_generator.num_classes,1e-5)

# Train
STEP_SIZE_TRAIN = train_generator.n//train_generator.batch_size
STEP_SIZE_VALID = valid_generator.n//valid_generator.batch_size
model2.fit_generator(generator=train_generator,
                    steps_per_epoch=STEP_SIZE_TRAIN,
                    validation_data=valid_generator,
                    validation_steps=STEP_SIZE_VALID,
                    epochs=20)

model2.evaluate_generator(generator=valid_generator,
                         steps=STEP_SIZE_VALID)

STEP_SIZE_TEST = test_generator.n//test_generator.batch_size
test_generator.reset()
pred = model2.predict_generator(test_generator,
                                steps=STEP_SIZE_TEST,
                                verbose=1)

predicted_class_indices=np.argmax(pred,axis=1)

labels = (train_generator.class_indices)
labels = dict((v,k) for k,v in labels.items())

predictions = [labels[k] for k in predicted_class_indices]

# display test results
confMat2 = confusion_matrix(test_generator.classes, predicted_class_indices)
print("Confusion matrix:\n%s" % confMat2)


Found 1996 images belonging to 2 classes.
Found 80 images belonging to 2 classes.
Found 80 images belonging to 2 classes.
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Confusion matrix:
[[37  3]
 [ 3 37]]


# Test Performance Using Both Real and Synthetic Data

In [6]:

# Train model - Both Real and Synthetic Data
train_generator,valid_generator,test_generator = defineGenerators("Combined", [224,224,3])

# Instantiate Model
model3 = compileAlexNet(train_generator.num_classes,1e-5)

# Train
STEP_SIZE_TRAIN = train_generator.n//train_generator.batch_size
STEP_SIZE_VALID = valid_generator.n//valid_generator.batch_size
model3.fit_generator(generator=train_generator,
                    steps_per_epoch=STEP_SIZE_TRAIN,
                    validation_data=valid_generator,
                    validation_steps=STEP_SIZE_VALID,
                    epochs=20)

model3.evaluate_generator(generator=valid_generator,
                          steps=STEP_SIZE_VALID)

STEP_SIZE_TEST = test_generator.n//test_generator.batch_size
test_generator.reset()
pred = model3.predict_generator(test_generator,
                                steps=STEP_SIZE_TEST,
                                verbose=1)

predicted_class_indices = np.argmax(pred,axis=1)

labels = (train_generator.class_indices)
labels = dict((v,k) for k,v in labels.items())

predictions = [labels[k] for k in predicted_class_indices]

# display test results
confMat3 = confusion_matrix(test_generator.classes, predicted_class_indices)
print("Confusion matrix:\n%s" % confMat3)


Found 1759 images belonging to 2 classes.
Found 80 images belonging to 2 classes.
Found 80 images belonging to 2 classes.
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Confusion matrix:
[[40  0]
 [ 2 38]]
