Polyphonic Bird Sound Event Detection

This repository contains the data and code for reproducing the results reported in "Polyphonic Bird Sound Event Detection With Convolutional Recurrent Neural Networks".

The network architecture was inspired by sednet.
The dataset is called UvABirds and contains 3200+ annotations of bird songs.

Download links

annotations: Google drive

Species	Number of songs annotated
Chiffchaff	883
Great tit	796
Blackbird	566
Warbler	503
Wren	500
Total	3248

audio files: Google drive

trained network: Google drive
This network achieves a frame-wise F-score of 0.94 and an error rate of 0.11 on three concurrent sound events.
The networks input feature hyperparameters are as follows:
sequence length = 512 frames (approximately 12 seconds).
sample rate = 44100Hz
stft window length = 512 wave points

An audible example of the networks predictions can be found here (Youtube).

How to train

Install the libraries in the requirements.txt. When using GPUs, make sure to use the right CUDA and CUDNN versions for tensorflow 1.13.1.

Create feature files by running:

python3 features.py --audios_directory ~/audio_dir_here \
                         --annotations_directory' ~/annotation_dir_here \
                         --output_directory ~/output_dir_here \
                         --sequence_length 512 \
                         --sample_rate 44100 \
                         --window_length 512 \
                         --hop_length 256

Then, set the feature directory in train_sed.py and train the network by running: python3 train_sed.py.

Model inference

To use a trained model, use the SEDgenerator to prepare samples and feed them to the model by using
prediction = model.predict(sample)

How to create your own dataset

If you wish to create a large dataset and require a fast annotating interface with collaboration capabilities, consider using CrowdCurio's annotator frontend. If you cannot create a backend for this frontend, consider dynilib's dynitag, which is both a CrowdCurio frontend and a backend running in Docker. For this work I used dynitag. Note that getting dynitag running requires some tinkering.

If this all seems like too much trouble, one can use the audio editor Audacity, which can annotate recordings and export these annotations to the start, end, class format used in this work (see features.py).

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
LICENSE		LICENSE
README.md		README.md
SEDgenerator.py		SEDgenerator.py
activation_patterns.py		activation_patterns.py
features.py		features.py
fetch_audio.py		fetch_audio.py
metrics.py		metrics.py
plot_callback.py		plot_callback.py
predictions.png		predictions.png
requirements.txt		requirements.txt
search_xeno_canto.py		search_xeno_canto.py
train_sed.py		train_sed.py
transform_annotations.py		transform_annotations.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Polyphonic Bird Sound Event Detection

Download links

How to train

Model inference

How to create your own dataset

About

Releases

Packages

Languages

License

maxcrous/bsed

Folders and files

Latest commit

History

Repository files navigation

Polyphonic Bird Sound Event Detection

Download links

How to train

Model inference

How to create your own dataset

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages