This repository contains the data and code for reproducing the results reported in "Polyphonic Bird Sound Event Detection With Convolutional Recurrent Neural Networks".
The network architecture was inspired by sednet.
The dataset is called UvABirds and contains 3200+ annotations of bird songs.
annotations: Google drive
Species | Number of songs annotated |
---|---|
Chiffchaff | 883 |
Great tit | 796 |
Blackbird | 566 |
Warbler | 503 |
Wren | 500 |
Total | 3248 |
audio files: Google drive
trained network:
Google drive
This network achieves a frame-wise F-score of 0.94 and an error rate of 0.11 on three concurrent sound events.
The networks input feature hyperparameters are as follows:
sequence length = 512 frames (approximately 12 seconds).
sample rate = 44100Hz
stft window length = 512 wave points
An audible example of the networks predictions can be found here (Youtube).
Install the libraries in the requirements.txt. When using GPUs, make sure to use the right CUDA and CUDNN versions for tensorflow 1.13.1.
Create feature files by running:
python3 features.py --audios_directory ~/audio_dir_here \
--annotations_directory' ~/annotation_dir_here \
--output_directory ~/output_dir_here \
--sequence_length 512 \
--sample_rate 44100 \
--window_length 512 \
--hop_length 256
Then, set the feature directory in train_sed.py
and train the network by running:
python3 train_sed.py
.
To use a trained model, use the SEDgenerator to prepare samples and feed them to the model by using
prediction = model.predict(sample)
If you wish to create a large dataset and require a fast annotating interface with collaboration capabilities, consider using CrowdCurio's annotator frontend. If you cannot create a backend for this frontend, consider dynilib's dynitag, which is both a CrowdCurio frontend and a backend running in Docker. For this work I used dynitag. Note that getting dynitag running requires some tinkering.
If this all seems like too much trouble, one can use the audio editor Audacity, which can annotate recordings and export these annotations to the start, end, class
format used in this work (see features.py
).