In [1]:
## Opensoundscape imports
from opensoundscape.annotations import BoxedAnnotations
from opensoundscape.torch.models.cnn import CNN

# general purpose packages
import pandas as pd
import numpy as np
from pathlib import Path
import re # for regex matching of annotation and audio files
import random 

random.seed(0)
np.random.seed(0)

# Turn annotated audio into labels we can use to train a model

If you have listened to some of your field recordings and annotated them for the presence of your sounds of interest, it's easy to use this as training data to train a classifier using OpenSoundscape. This notebook shows the data processing steps used to turn annotations of audio  into the data format used for training models in OpenSoundscape. In this example we are using a set of recordings that were annotated using the software Raven Pro:

<i>An annotated set of audio recordings of Eastern North American birds containing frequency, time, and species information. </i><br>
Lauren M. Chronister,  Tessa A. Rhinehart,  Aidan Place,  Justin Kitzes <br>
https://doi.org/10.1002/ecy.3329 


## Download instructions
Download the datasets to your current working directory and unzip them. You can do so by downloading both `annotation_Files.zip` and `wav_Files.zip` from the url below or by executing the cell below. 

https://datadryad.org/stash/dataset/doi:10.5061/dryad.d2547d81z

In [2]:
!wget -O annotation_Files.zip https://datadryad.org/stash/downloads/file_stream/641805
!wget -O wav_Files.zip https://datadryad.org/stash/downloads/file_stream/641808
!unzip annotation_Files.zip
!unzip wav_Files.zip

--2023-03-14 09:42:42--  https://datadryad.org/stash/downloads/file_stream/641805
Resolving datadryad.org (datadryad.org)... ^C
--2023-03-14 09:42:43--  https://datadryad.org/stash/downloads/file_stream/641808
Resolving datadryad.org (datadryad.org)... 44.225.200.72, 52.12.151.55
Connecting to datadryad.org (datadryad.org)|44.225.200.72|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://dryad-assetstore-merritt-west.s3.us-west-2.amazonaws.com/ark%3A/13030/m5799nzg%7C5%7Cproducer/wav_Files.zip?response-content-type=application%2Fzip&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEB0aCXVzLXdlc3QtMiJHMEUCIQCvPLiCasLDbsiII8XqnqTWuHpYQF0BaATbvo74OwefFgIgQA7rJ3uzeT%2BfkWlryrou79PBsOS8dxekKV5ROn%2BCxZ0quwUI1v%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARAAGgw0NTE4MjY5MTQxNTciDN11ovueSLGz6z2%2B7iqPBbZFjKqy1xCcSYtz76i7OrKCD0TIbtRBbu5xQInBXmttRxG8413Rx12lg%2Bc1Kawza%2FsW6yphalnsM4dG4iJcfr8SDyrvQl%2F8FvOVrPBwq13e1Vgchq1XhEgmpPVTPDvmcVue5tR37UQfkN%2BBRcJC5KNaca9IY23O120VW7JV5EBPm81h5rs

## Data munging ##
The below shows the data munging process of reading in raven files, and using them to create dataframes we can use for training and validation sets for training our model. We will take the annotation files and turn them into a dataframe with 1-hot labels for each 3 second interval - one hot labels that are 1 if a species is present in the audio and 0 if the species is not present in that.

In [2]:
dataset_path = Path(".").resolve() # set the current directory to where the dataset is downloaded
selections = list(dataset_path.glob("*/*selections.txt")) # make a list of all of the selection table files

In [3]:
# example to show what a raven annotation file looks like. This is how our data is stored
pd.read_csv(selections[0], sep="\t").head()

Unnamed: 0,Selection,View,Channel,Begin Time (s),End Time (s),Low Freq (Hz),High Freq (Hz),Species
0,1,Spectrogram 1,1,0.24588,0.69425,1950.0,9600.0,NOCA
1,2,Spectrogram 1,1,0.250701,1.374037,5512.5,8887.5,BWWA
2,3,Spectrogram 1,1,0.597827,0.906382,3112.5,4968.7,EATO
3,4,Spectrogram 1,1,0.887098,1.123335,2700.0,3768.7,EATO
4,5,Spectrogram 1,1,1.340289,2.632366,2400.0,9281.2,EATO


In [4]:
# example to show what the one-hot labels look like.
BoxedAnnotations.from_raven_file(selections[0], annotation_column = "Species").one_hot_clip_labels(
    full_duration=300, # The duration of the entire audio file
    clip_duration=3,
    clip_overlap=0,
    min_label_overlap=0.25).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,NOCA,BWWA,EATO,COYE,BCCH
start_time,end_time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,3,1.0,1.0,1.0,0.0,0.0
3,6,1.0,0.0,1.0,0.0,0.0
6,9,1.0,1.0,1.0,1.0,0.0
9,12,1.0,0.0,1.0,0.0,0.0
12,15,1.0,1.0,1.0,0.0,0.0


Do it for all the files. We will train a model on the files in folders `Recording_1`, `Recording_2` and `Recording_3` and test its performance on recordings in the folder `Recording_4`, so we exclude all recordings from `Recording_4`.

In [5]:
%%capture --no-stdout --no-display

training_files = [] # container for each selection table
test_files = []
for file in selections:
    annot_object = BoxedAnnotations.from_raven_file(file, annotation_column = "Species")
    truth_df = annot_object.one_hot_clip_labels(
        full_duration=300, # The duration of the entire audio file
        clip_duration=3,
        clip_overlap=0,
        min_label_overlap=0.25)
    truth_df["file"] = str(file) # add a column to keep track of which file this came from
    if "Recording_4" in str(file): # exclude our 'test set' of Recording_4
        test_files.append(truth_df)
    else:
        training_files.append(truth_df)

training_set = pd.concat(training_files)
test_set = pd.concat(test_files)

To turn the dataframes into the exact right format for use with OpenSoundscape we need to make sure they have a multi-index with (audio_filepath, start_time, end_time) as index.

In [7]:
def process_dataframe(df):
    """
    Helper function to turn our dataframes into the right format needed for training in opensoundscape.
    They must have a multi-index of (audio_filepath, start_time, end_time)
    """
    df = df.fillna(0) # because this dataset was fully annotated, any NANs for a species are the same as fully absent.
    df["file"] = [re.sub(r'Table.*', "wav", x) for x in df["file"]] # make the audio file name from the selection table filename
    
    # ensure the dataframe has a multi-index of (audio_filepath, start_time, end_time)
    df = df.set_index('file', append=True, inplace=False)
    df = df.reorder_levels(['file', 'start_time', 'end_time'])
    return df
training_set = process_dataframe(training_set)
test_set = process_dataframe(test_set)

In [8]:
training_set.to_csv("training_set.csv")
test_set.to_csv("test_set.csv")