Skip to content


Repository files navigation

alt text

Polyphonic Bird Sound Event Detection

This repository contains the data and code for reproducing the results reported in "Polyphonic Bird Sound Event Detection With Convolutional Recurrent Neural Networks".

The network architecture was inspired by sednet.
The dataset is called UvABirds and contains 3200+ annotations of bird songs.

Download links

annotations: Google drive

Species Number of songs annotated
Chiffchaff 883
Great tit 796
Blackbird 566
Warbler 503
Wren 500
Total 3248

audio files: Google drive

trained network: Google drive
This network achieves a frame-wise F-score of 0.94 and an error rate of 0.11 on three concurrent sound events.
The networks input feature hyperparameters are as follows:
sequence length = 512 frames (approximately 12 seconds).
sample rate = 44100Hz
stft window length = 512 wave points

An audible example of the networks predictions can be found here (Youtube).

How to train

Install the libraries in the requirements.txt. When using GPUs, make sure to use the right CUDA and CUDNN versions for tensorflow 1.13.1.

Create feature files by running:

python3 --audios_directory ~/audio_dir_here \
                         --annotations_directory' ~/annotation_dir_here \
                         --output_directory ~/output_dir_here \
                         --sequence_length 512 \
                         --sample_rate 44100 \
                         --window_length 512 \
                         --hop_length 256

Then, set the feature directory in and train the network by running: python3

Model inference

To use a trained model, use the SEDgenerator to prepare samples and feed them to the model by using
prediction = model.predict(sample)

How to create your own dataset

If you wish to create a large dataset and require a fast annotating interface with collaboration capabilities, consider using CrowdCurio's annotator frontend. If you cannot create a backend for this frontend, consider dynilib's dynitag, which is both a CrowdCurio frontend and a backend running in Docker. For this work I used dynitag. Note that getting dynitag running requires some tinkering.

If this all seems like too much trouble, one can use the audio editor Audacity, which can annotate recordings and export these annotations to the start, end, class format used in this work (see