# Train a CNN to detect bird vocalizations

This notebook demonstrates how to train a CNN deep learning model with OpenSoundscape. We will train the CNN to recognize bird vocalizations in spectrogram representations of audio data.

The notebook has two sections:
1. [Prepare CNN training data from Raven-annotated audio](#prep)
2. [Train a CNN](#train)

The subsequent notebook evaluates the performance of the CNN

#### package imports

In [1]:
## Opensoundscape imports
from opensoundscape import BoxedAnnotations, CNN

# general purpose packages
import pandas as pd
import numpy as np
from pathlib import Path
import re # for regex matching of annotation and audio files
import random 
from glob import glob

random.seed(0)
np.random.seed(0)

from matplotlib import pyplot as plt
plt.rcParams['figure.figsize']=[10,3] #set default graphic size
%config InlineBackend.figure_format = 'retina'

<a id="prep"></a>

## Step 1: Prepare CNN training data from Raven-annotated audio
If you have listened to some of your field recordings and annotated them for the presence of your sounds of interest, it's easy to use them as training data to train a classifier using OpenSoundscape. This notebook shows the data processing steps used to turn annotations of audio into the data format used for model training in OpenSoundscape. In this example we are using a set of recordings that were annotated using the software Raven Pro 1.6.4 (Bioacoustics Research Program 2022):

<i>An annotated set of audio recordings of Eastern North American birds containing frequency, time, and species information. </i><br>
Lauren M. Chronister,  Tessa A. Rhinehart,  Aidan Place,  Justin Kitzes <br>
https://doi.org/10.1002/ecy.3329 


## Download instructions
Download the datasets to your current working directory and unzip them. You can do so by running the cell beelow OR
- downloading and unzipping both `annotation_Files.zip` and `mp3_Files.zip` from the https://datadryad.org/stash/dataset/doi:10.5061/dryad.d2547d81z  
- Move the unziped contents of each into the folder `./resources/02/annotated_data/` (a subfolder of the current folder)

In [None]:
!wget -O annotation_Files.zip https://datadryad.org/stash/downloads/file_stream/641805
!wget -O wav_Files.zip https://datadryad.org/stash/downloads/file_stream/641808
!unzip annotation_Files.zip -d ./resources/02/annotated_data/Annotation_Files
!unzip wav_Files.zip -d ./resources/02/annotated_data/Recordings

--2023-06-28 13:56:08--  https://datadryad.org/stash/downloads/file_stream/641805
Resolving datadryad.org (datadryad.org)... 34.213.241.98, 54.69.41.28
Connecting to datadryad.org (datadryad.org)|34.213.241.98|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://dryad-assetstore-merritt-west.s3.us-west-2.amazonaws.com/ark%3A/13030/m5799nzg%7C1%7Cproducer/annotation_Files.zip?response-content-type=application%2Fzip&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEBEaCXVzLXdlc3QtMiJHMEUCIQComuw1jITwY9hiLf5gaG8IHL3GWOUmqLRPOz3UmXut%2BAIgNjI%2FrOY1Gd65EXfD5hMKybfqU7KYuSGWjH8QnKyKPhMqswUIehAAGgw0NTE4MjY5MTQxNTciDEu1ByMbkifYUWJwLSqQBeK0TVzEnjhmR%2FtCY4StoX8fuaurd4yLVfnJHvP46o8lbsqDWnV%2Bol3eJOFD42ITm05CeAMskTu85lteGI3z9RGy8YdAN9YF9%2BTlNFThF97FjwxdV%2FOtRp7h8W%2Fs0ePcKbxPYmFd%2F3c%2Bp3yY1hv5%2Bl7GzimUr5A7QSV0sIeBkqkKTAjE4AEqdXdexH0qJwS9CvM%2F7DN%2FcT8ScQ1NR0hUdUK7lxO9LOFGH6GtlB7MVK3uE%2F0QXstXq1P533kBdxTGyiSDcEHeLfwfygmfntxTP5iBaGKldUKTdLPnPFb1IG9r4pQ8WLC59J7IXNhArnvRo

### Load Raven annotations and create label dataframes
The below shows the data munging process of reading in raven files, and using them to create dataframes we can use for training and tset sets for training our model. We will take the annotation files and turn them into a dataframe with 1-hot labels for each 3 second interval - one hot labels that are 1 if a species is present in the audio and 0 if the species is not present in that.

In [26]:
# set the current directory to where the dataset is downloaded
dataset_path = Path("./resources/02/annotated_data/").resolve() 

# make a list of all of the selection table files
selections = glob(f"{dataset_path}/Annotation_Files/*/*.txt")

# Audio files have the same names as selection files
audio_files = [f.replace('Annotation_Files','Recordings').replace('.Table.1.selections.txt','.mp3') for f in selections]

selections Loading raven annotations 
The BoxedAnnotations class stores frequency-time annotations in a table. It can parse and load Raven formatted selection tables with the `from_raven_files()` method. We pass the method a list of raven files and the corresponding list of audio files. 


In [28]:
all_annotations = BoxedAnnotations.from_raven_files(selections,audio_files)
all_annotations.df.head(2)

Unnamed: 0,audio_file,raven_file,annotation,start_time,end_time,low_f,high_f,View,Selection,Channel
0,/Users/SML161/demos-for-opso/resources/02/anno...,/Users/SML161/demos-for-opso/resources/02/anno...,BTNW,0.913636,2.202273,4635.1,7439.0,Spectrogram 1,1,1
1,/Users/SML161/demos-for-opso/resources/02/anno...,/Users/SML161/demos-for-opso/resources/02/anno...,EATO,2.236363,2.693182,3051.9,4101.0,Spectrogram 1,2,1


This table contains one row per annotation created in Raven pro. 
We can easily convert this annotation format to a table of 0 (absent) or 1 (present) labels for a series of time-regions in each audio file. Each class will be a separate column. We can specify a list of classes or let the function automatically create one class for each unique annotation in the Raven selection tables. 

Here, we need to make some choices: first, how many seconds is each audio "clip" that we want to generate a label for (clip_duration), and how many seconds of overlap should there be between consecutive clips (clip_overlap)? Here, we'll choose 3 second clips with zero overlap. 

Second, how much does an annotation need to overlap with a clip for us to consider the annotation to apply to the clip (min_label_overlap)? For example, if an annotation spans 1-3.02 seconds, we might not want to consider it a part of a clip that spans 3-6 seconds, since only 0.02 seconds of that annotation overlap with the clip. Here, we'll choose a min_label_overlap of 0.25 seconds. 

In [9]:
%%capture cap --no-stdout
clip_labels = all_annotations.one_hot_clip_labels(
    clip_duration=3,
    clip_overlap=0,
    min_label_overlap=0.25)

### choose classes

Let's choose 7 classes from the annotated data and train our CNN to recognize vocalizations of these species. The annotations in this dataset use four-letter "Alpha codes" for each bird species:

- NOCA: Northern Cardinal
- EATO: Eastern Towhee
- SCTA: Scarlet Tanager
- BAWW: Black-and-white Warbler
- BCCH: Black-capped Chickadee
- AMCR: American Crow
- NOFL: Northern Flicker

In [14]:
species_of_interest = ["NOCA", "EATO", "SCTA", "BAWW", "BCCH", "AMCR", "NOFL"]
clip_labels = clip_labels[species_of_interest]

### split annotated data into training and validation sets

Our plan is to train a machine learning model on the files in folders `Recording_1`, `Recording_2` and `Recording_3` and test its performance on recordings in the folder `Recording_4`. Let's separate the labels into two sets called `train` and `validation`. We'll use the train set to train the CNN, and the validation set to check how it performs on data that it has not seen during training. 


In [16]:
# select all files from Recording_4 as a test set
mask = clip_labels.reset_index()['file'].apply(lambda x: 'Recording_4' in x).values
test_set = clip_labels[mask]

# all other files will be used as a training set
training_set = clip_labels.drop(test_set.index)

# Save .csv tables of the training and validation sets for use in training a model
# training_set.to_csv("./resources/03/training_set.csv")
# test_set.to_csv("./resources/03/test_set.csv")

Alternatively, load the training and testing set from saved csv files:

In [13]:
# training_set = pd.read_csv('./resources/03/training_set.csv',index_col=[0,1,2])
# test_set = pd.read_csv('./resources/03/test_set.csv',index_col=[0,1,2])

<a id="train"></a>
## Train the CNN
Now that we have prepared and split our labeled data into training and testing sets, we can train a CNN to recognize the labeled classes. 

In [17]:
# Split our training data into training and validation sets
train_df, valid_df = sklearn.model_selection.train_test_split(training_set, test_size=0.1, random_state=0)

### Resample data for even class representation

Before training, we will balance the number of samples of each class in the training set. This helps the model learn all of the classes, rather than paying too much attention to the classes with the most labeled annotations. 

In [18]:
from opensoundscape.data_selection import resample

# upsample (repeat samples) so that all classes have 800 samples
balanced_train_df = resample(train_df,n_samples_per_class=800,random_state=0)

Create the model object. We're using a resnet34 architecture CNN.

In [19]:
# create a CNN object designed to recognize 3-second samples
# we use the resnet34 architecture, 
model = CNN('resnet34',classes=species_of_interest,sample_duration=3.0, single_target=False)

move the model to GPU if available, to accelerate training

In [20]:
import torch
if torch.backends.mps.is_available():
    model.device='mps' #Apple Silicon
elif torch.cuda.is_available():
    model.device='cuda' #CUDA GPU  
print(f'model.device is: {model.device}')

model.device is: mps


#### initialize Weights and Biases logging session

Note: to use wandb logging, you will need to create an account on the [wandb website](https://wandb.ai/). The first time you use wandb, you'll be asked for an authentication key which can be found in your wandb profile. 

In [19]:
import wandb
try:
    wandb_session = wandb.init(
        entity='entity_name', #replace with your entity/group name
        project='opensoundscape training demo',
        name='Notebook 02: Train CNN',
    )
except: #if wandb.init fails, don't use wandb logging
    print('failed to create wandb session. wandb session will be None')
    wandb_session = None

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33msamlapp[0m ([33mkitzeslab[0m). Use [1m`wandb login --relogin`[0m to force relogin


### Train
train the model for 30 epochs

We use default training parameters, but many aspects of CNN training can be customized (see [this tutorial](http://opensoundscape.org/en/latest/tutorials/cnn_training_advanced.html) for examples)

Training can be a slow process, and the speed of training will depend on your computer. If you wish to skip this step, you can simply load the model that this cell would create by uncommenting the subsequent cell and running it instead

In [20]:
%%capture --no-stdout --no-display

model.train(
    balanced_train_df, 
    valid_df, 
    epochs = 30, 
    batch_size= 64, 
    log_interval=100, #log progress every 100 batches
    num_workers = 32, #32 parallelized cpu tasks for preprocessing
    wandb_session=wandb_session,
    save_interval = 10, #save checkpoint every 10 epochs
    save_path = './resources/02/' #location to save checkpoints
)


Training Epoch 0
Epoch: 0 [batch 0/88, 0.00%] 
	DistLoss: 0.794
Metrics:
Metrics:
	MAP: 0.488

Validation.
Metrics:
	MAP: 0.476

Training Epoch 1
Epoch: 1 [batch 0/88, 0.00%] 
	DistLoss: 0.376
Metrics:
Metrics:
	MAP: 0.752

Validation.
Metrics:
	MAP: 0.626

Training Epoch 2
Epoch: 2 [batch 0/88, 0.00%] 
	DistLoss: 0.278
Metrics:
Metrics:
	MAP: 0.852

Validation.
Metrics:
	MAP: 0.757

Training Epoch 3
Epoch: 3 [batch 0/88, 0.00%] 
	DistLoss: 0.246
Metrics:
Metrics:
	MAP: 0.902

Validation.
Metrics:
	MAP: 0.745

Training Epoch 4
Epoch: 4 [batch 0/88, 0.00%] 
	DistLoss: 0.216
Metrics:
Metrics:
	MAP: 0.931

Validation.
Metrics:
	MAP: 0.806

Training Epoch 5
Epoch: 5 [batch 0/88, 0.00%] 
	DistLoss: 0.175
Metrics:
Metrics:
	MAP: 0.950

Validation.
Metrics:
	MAP: 0.856

Training Epoch 6
Epoch: 6 [batch 0/88, 0.00%] 
	DistLoss: 0.154
Metrics:
Metrics:
	MAP: 0.957

Validation.
Metrics:
	MAP: 0.842

Training Epoch 7
Epoch: 7 [batch 0/88, 0.00%] 
	DistLoss: 0.135
Metrics:
Metrics:
	MAP: 0.966

V

As training progresses, performance metrics will be plotted to the wandb logging platform and visible on this run's web page. One this cell completes, you have trained the CNN. 

This [wandb web page](https://wandb.ai/kitzeslab/opensoundscape%20training%20demo/runs/w1xyk7zr/workspace?workspace=user-samlapp) shows the content logged to wandb when this notebook was run by the Kitzes Lab. By default, OpenSoundscape + WandB integration creates several pages with information about the model:
- Overview: hyperparameters, run description, and hardware available during the run
- Charts: "Samples" panel with audio and images of preprocessed samples (useful for checking that your preprocessing performs as expected and your labels are correct)
- Charts: graphs of each class's performance metrics over training time
- Model: summary of model architecture
- Logs: standard output of training script
- System: computational performance metrics including memory, CPU use, etc

When training several models and comparing performance, the "Project" page of WandB provides comparisons of metrics and hyperparameters across training runs.

In the next notebook, we will use the CNN to predict on the test set and evaluate its performance. 