Bird Classifier by Call or Song

It's widely agreed there are between 9,000-10,000 bird species. Some sources double that number (sources).

When outdoors a birder seems more likely to hear a bird than see it. This results in difficulty identifying bird species in rural and urban centers alike since we visual location can be difficult at times. However, by using a bird's call or song we can identify the species and decide if we wish pursue visual identification. If the sound can be used to classify a bird species, researchers and amateur birders alike can have a better idea of what species are present and ideally map their location.

Automated classification allows for hobbyists to easily retrieve information on the bird species heard. Additionally, automated recordings could be stored and compared to future sound events determining if species depopulation has occurred.

Overview Process Flow

The data was gathered from the xeno-centro website.
40 top species determined and their landing pages scraped for obtaining mp3 urls.
Mp3s audio scraped and stored on AWS S3
Librosa used to transform the sound into a spectrogram.
Processed spectrogram with CNN using Keras to classify bird song or calls.

Data Sources

http://www.xeno-canto.org/

Data

40 classes of bird species.
33,567 separate audio files.
85G of audio data.
Used 60/40% split data, all input images were 138 x 138 x 1 pixels
- 20,156 training samples
- 13,438 validation samples

Features

A challenge with using sound is taking the input to a CNN is an image.

Visualizing sound can be done with a spectrogram. It's a 2 dimensional representation of sound. The horizontal axis is time, vertical axis is frequency (Hx) or pitch if you prefer. The darkness of a spot is the loudness of the sound in decibels.

Here is an example:

Once spectrograms are the choice of input, deciding what features from the audio will be extracted and placed into the spectrogram was my next step.

After investigating several libraries to assist in audio feature sampling and extraction, I decided to use librosa. It's well documented and is written in python so it works well with my project. The drawback was processing speed. It is slower than some of the other libraries out there due to it's entirely python architecture.

Now feature choices, I could take the low or percussive sounds, the high frequency spectrums, the Mel-frequency cepstrum, or STSF.

STSF (Short-time Fourier transform) was chosen because it yielded the best results on a well balanced test/validation set of around 1,000 samples in 5 classes. Accuracy of this baseline was 38%.

Convolutional Neural Network

A multi layer CNN with PReLU activation and 1 Dense layer with 1 output sigmoid layer.
Input was 138x138x1 grayscale spectrogram.

Problems (maybe as separate or grouped topics)

Getting Data. I was able to scrape and then convert mp3 files to spectrograms. After a bit of experimenting the STFT was found to give very good results.

Deciding with limited time how to preprocess for the best data to present to the models.

Now that the project is over and is just mine I wish to redo the preprocessing and see if I can get better results. It is my opinion that better results will happen if I change the preprocessing.

Model Architecture

Final model was a CNN implemented with Keras arrived at after testing various architectures and activation layers. The final activation leading to best results was PReLU.

I have since redone the model, simplified and runs much faster. I included regularization. This helped the model from overfitting resulting in better validation accuracy. Total number of epochs was also reduced since model trains faster.

The overall keras CNN model summary is as follows:

Results

With newer faster model:
With 18 epochs:
model.evaluate_generator(X_test_gen, steps=40)
[3.426939141750336, 0.61171875]

61% accuracy in a much faster model

model.summary()

Layer (type) Output Shape Param #

conv2d_1 (Conv2D) (None, 136, 136, 32) 320

p_re_lu_1 (PReLU) (None, 136, 136, 32) 591872

max_pooling2d_1 (MaxPooling2 (None, 68, 68, 32) 0

dropout_1 (Dropout) (None, 68, 68, 32) 0

conv2d_2 (Conv2D) (None, 66, 66, 64) 18496

p_re_lu_2 (PReLU) (None, 66, 66, 64) 278784

conv2d_3 (Conv2D) (None, 64, 64, 64) 36928

p_re_lu_3 (PReLU) (None, 64, 64, 64) 262144

max_pooling2d_2 (MaxPooling2 (None, 32, 32, 64) 0

dropout_2 (Dropout) (None, 32, 32, 64) 0

conv2d_4 (Conv2D) (None, 30, 30, 128) 73856

p_re_lu_4 (PReLU) (None, 30, 30, 128) 115200

conv2d_5 (Conv2D) (None, 28, 28, 128) 147584

p_re_lu_5 (PReLU) (None, 28, 28, 128) 100352

max_pooling2d_3 (MaxPooling2 (None, 14, 14, 128) 0

dropout_3 (Dropout) (None, 14, 14, 128) 0

flatten_1 (Flatten) (None, 25088) 0

dense_1 (Dense) (None, 128) 3211392

p_re_lu_6 (PReLU) (None, 128) 128

dropout_4 (Dropout) (None, 128) 0

dense_2 (Dense) (None, 40) 5160

activation_1 (Activation) (None, 40) 0

Total params: 4,842,216
Trainable params: 4,842,216
Non-trainable params: 0

If using bird_model_orig:

Using a sample set of 33,567

60/40% split of data, all input images were 138 x 138 x 1 pixels
- 20,156 training samples
- 13,438 validation samples

Accuracy: 63.5%

Future Plans

A website
Smartphone app
Free access to use and upload data adding to data store

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
data/xeno-canto		data/xeno-canto
images		images
presentation		presentation
src		src
.gitignore		.gitignore
CapstoneProjectProposal.docx		CapstoneProjectProposal.docx
CapstoneProjectProposal.pdf		CapstoneProjectProposal.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bird Classifier by Call or Song

Table of Contents

Overview Process Flow

Data Sources

Data

Features

Convolutional Neural Network

Problems (maybe as separate or grouped topics)

Model Architecture

Results

Layer (type) Output Shape Param #

activation_1 (Activation) (None, 40) 0

Future Plans

Acknowledgements

About

Releases

Packages

Languages

lippertr/bird-call-classifier

Folders and files

Latest commit

History

Repository files navigation

Bird Classifier by Call or Song

Table of Contents

Overview Process Flow

Data Sources

Data

Features

Convolutional Neural Network

Problems (maybe as separate or grouped topics)

Model Architecture

Results

Layer (type) Output Shape Param #

activation_1 (Activation) (None, 40) 0

Future Plans

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages