Skip to content

Bioacoustic Source Separator

Devdoot Chatterjee edited this page Sep 12, 2022 · 1 revision

GSoC 2022 Final Report

Objective-

The aim of this project was to build a robust bioacoustic source separator to separate Southern Resident killer whale vocalizations from other background noises in hydrophone recordings. I explored two major open-source libraries for audio source separation- Spleeter and Zero-shot audio source separator.
Here is a link to my Blog post.

Source separation models-

Spleeter-

Spleeter is an open-source audio source separation library. Spleeter uses a U-Net architecture for source separation. The U-Net takes the spectrogram of the original audio as input and performs a series of 2-D convolutions and deconvolutions. It finally outputs a mask which when multiplied with the original spectrogram returns us the spectrogram of the isolated source. One can easily separate orca vocalization from any hydrophone recording using the GUI.
Additionally, one can use this Python module to create an instance of the class ‘Separator.’ It takes the waveform of the original audio as input and its member function ‘return_source_directory’ can be called which returns a dictionary containing names of the sources and waveforms as NumPy arrays corresponding to those sources.

Zero-shot audio separator-

Unlike Spleeter, the Zero-shot source separator is a query-based U-Net model. Hence, this model also takes a query file (example of the audio source of interest) as input. The zero-shot audio source separator was also tested but it was outperformed by the Spleeter model. However, one can still use the Zero-shot model to separate orca vocals. One can use the ‘instance’ function defined in this Python module which takes the path to the pre-trained zero-shot model checkpoint, and the pre-trained Htsat model checkpoint. And it returns a dictionary the same as the Spleeter model.

Separation dataset-

The dataset used to train the model was extracted from Orcasound’s PodCast rounds. The sound separation dataset was generated by randomly overlapping these orca vocalization sounds with other background noises consisting of noise from sea waves, ships, boats, etc.
Additionally, I also used CosmoDB’s API to get recordings tagged as squeaky, (that sound like orca vocals) to make the model more robust. One can create their own ‘separation’ dataset if they have their own data by using this module. It consists of a class named “DataGenerator” which generates a ‘separation’ dataset that can be used to train the Spleeter model.

Future Work-

  • Build a web app for real-time implementation of the audio source separator.
  • Work on BioCPPNet- a lightweight Deep Learning architecture optimized for bioacoustic source separation.
Clone this wiki locally