### Training a CNN to Identify Birdsong Snippets, February 28, 2019

This notebook contains code for designing and training a convolutional neural network to identify snippets chosen from birdsong samples. Then, snippets and WAV files are identified using the trained model from a training set, a validation set, and a test set of WAV files. This notebook follows the processes described in the book chapter "Using Neural Networks to Identify Bird Species from Birdsong Samples" by Russell Houpt, Mark Pearson, Paul Pearson, Taylor Rink, Sarah Seckler, Darin Stephenson, and Allison VanderStoep. This work was done at Hope College between 2016 and 2018.

First, we load some useful packages and options.

In [None]:
import numpy as np
np.set_printoptions(suppress = True, precision = 8)
import pandas as pd
import h5py
import random

df = pd.read_csv('fileListdf.csv')
speciesList = ["Bananaquit","Roadside Hawk","Green Violetear","Buff-breasted Wren"]
df_train = pd.read_csv('traindf.csv')
df_valid = pd.read_csv('validdf.csv')
df_test = pd.read_csv('testdf.csv')

import keras
from keras.models import Sequential
from keras.layers import Dense, Convolution1D, MaxPooling1D, Dropout, Flatten
from keras.callbacks import History
from keras.layers.convolutional import Conv2D
from keras.layers.pooling import MaxPooling2D
from keras import regularizers

saveDirectory = '/home/stephenson/Documents/birdsongs/Publication Notebooks/'

The function <b>get_onehot</b> will take a snippet file id and return the one-hot encoded correct species for that snippet.

The <b>generator</b> is used to select batches of snippets for training and validation purposes.

In [None]:
def get_onehot(fileid,slist=speciesList,df=df):
    idn = int(fileid.split('-')[0])
    return np.identity((len(slist)))[slist.index(df[df.id == idn]['name'].values[0])].astype(int)

def generator(filenames,datafile, batch_size,slist=speciesList):
    batch_features = np.zeros((batch_size, 64, 1323, 1))
    batch_labels = np.zeros((batch_size,len(slist)))
    while True:
        for i in range(batch_size):
            index= random.choice(range(filenames.shape[0]))
            trainfile = filenames[index]
            batch_features[i] = np.array(datafile[trainfile]).reshape(64,1323,1)
            batch_labels[i] = get_onehot(trainfile)
        yield (batch_features, batch_labels)         

        
h5_file = h5py.File(saveDirectory+"snippets.hdf5","r")
filesTrain = np.array([name for name in h5_file if h5_file[name].attrs['speciesName'] in speciesList 
                      and int(name.split('-')[0]) in df_train['id'].values])
filesValid = np.array([name for name in h5_file if h5_file[name].attrs['speciesName'] in speciesList 
                      and int(name.split('-')[0]) in df_valid['id'].values])
filesTest = np.array([name for name in h5_file if h5_file[name].attrs['speciesName'] in speciesList 
                      and int(name.split('-')[0]) in df_test['id'].values])

Below, we set up a convolutional neural network using Keras as a front end for TensorFlow. This simple CNN is a Keras sequential model, consisting of, in order,
<UL>
    <LI> a $9\times 9$ convolutional layer with 16 channels, 
    <LI> a $4\times 4$ Max Pooling layer, 
    <LI> a $3\times 3$ convolutional layer with 16 channels, 
    <LI> a $4\times 4$ Max Pooling layer, 
    <LI> a dense layer 100 nodes,
    <LI> a second dense layer with 100 notes, 
    <LI> an output layer with 4 nodes. 
        </UL>
Each of the convolutional or dense layers is followed by a ReLU non-linearity, except for the output layer, which is followed by a softmax function. 

In [None]:
history = History()
total = []
epochs = 3
train_batch=100
train_steps=1000
val_batch=100
val_steps=200

model = Sequential()

model.add(Conv2D(filters = 16, kernel_size = 9, 
                 strides = 1, padding='same',
                 activation = 'relu',use_bias=True,
                 kernel_initializer='random_uniform',
                 input_shape =(64,1323,1)))

model.add(MaxPooling2D(pool_size=4, strides=None, 
                       padding='valid', data_format=None))

model.add(Conv2D(filters = 16, kernel_size = 3, 
                 strides = 1, padding='same',
                 activation = 'relu',use_bias=True,
                 kernel_initializer='random_uniform'))

model.add(MaxPooling2D(pool_size=4, strides=None, 
                       padding='valid', data_format=None))

model.add(Flatten())

model.add(Dense(100, activation = 'relu'))

model.add(Dense(100, activation = 'relu'))


model.add(Dense(len(speciesList), activation = 'softmax'))
model.compile(optimizer = 'adadelta',
              loss ='categorical_crossentropy',
              metrics =['accuracy'])
keras.optimizers.Adadelta()
model.summary()

The following code is for training the model and recording the training results.

In [None]:
hist=model.fit_generator(generator(filenames=filesTrain,
                                   datafile=h5_file,batch_size=train_batch), 
                         steps_per_epoch=train_steps, epochs=epochs,
                         callbacks=[history],
                         validation_data=generator(filenames=filesValid, 
                                                   datafile=h5_file,
                                                   batch_size=val_batch),
                         validation_steps = val_steps)
total.append(hist.history)

train_accuracy     =     total[0]['acc']
train_accuracy     =     np.asarray(train_accuracy)
train_loss         =     total[0]['loss']
train_loss         =     np.asarray(train_loss)

model.save("model.h5")

The code below will use the trained model to predict the identity of each snippet in the training, validation, and testing set, and will also use snippet voting to predict the identity of each WAV file in each set.

In [None]:
model=keras.models.load_model("model.h5")

# mode = "Training Set"
# mode = "Validation Set"
mode = "Test Set"

if mode == "Training Set":
    info = df_train[['id','filename','name']].values
    files = filesTrain
elif mode == "Validation Set":
    info = df_valid[['id','filename','name']].values
    files = filesValid
elif mode == "Test Set":
    info = df_test[['id','filename','name']].values
    files = filesTest


d = {}
snippet_conf = np.zeros((len(speciesList),len(speciesList))).astype(int)
species_conf = np.zeros((len(speciesList),len(speciesList))).astype(int)
for sample in info:
    idn=sample[0]
    d[idn] = np.zeros(len(speciesList)).astype(int)
    files_idn = [x for x in files if int(x.split('-')[0])==idn]
    cor = get_onehot(files_idn[0])
    cornum = np.argmax(cor)
    for file in files_idn:
        f = model.predict(np.array(h5_file[file]).reshape(1,64,1323,1))
        d[idn][np.argmax(f)]+=1
    snippet_conf[cornum,:] += d[idn]
    species_conf[cornum,np.argmax(d[idn])] += 1
correct = np.sum(np.diag(species_conf))
total =  np.sum(species_conf)
snip_correct = np.sum(np.diag(snippet_conf))
snip_total = np.sum(snippet_conf)
print(mode+" Accuracy:")
print("Snippets: "+str(snip_correct)+" correct out of "+str(snip_total)
      +". "+str(np.round(100*snip_correct/snip_total,1))+" percent correct.")
print("WAV files: "+str(correct)+" correct out of "+str(total)+". "
      +str(np.round(100*correct/total,1))+" percent correct.")
print("Snippet Confusion Matrix:")
print(snippet_conf)
print("WAV File Confusion Matrix:")
print(species_conf)