# Train a CNN

Convolutional neural networks (CNNs) are popular tools for developing automated machine learning classifiers on images or image-like samples. By converting audio into a two-dimensional frequency vs. time representation such as a spectrogram, we can generate image-like samples that can be used to train CNNs. 

This tutorial demonstrates the basic use of OpenSoundscape's `preprocessors` and `cnn` modules for training CNNs and making predictions using CNNs.

Under the hood, OpenSoundscape uses Pytorch for machine learning tasks. By using the class `opensoundscape.ml.cnn.CNN`, you can train and predict with PyTorch's powerful CNN architectures in just a few lines of code. 

## Run this tutorial

This tutorial is more than a reference! It's a Jupyter Notebook which you can run and modify on Google Colab or your own computer.

|Link to tutorial|How to run tutorial|
| :- | :- |
| [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kitzeslab/opensoundscape/blob/master/docs/tutorials/train_cnn.ipynb) | The link opens the tutorial in Google Colab. Uncomment the "installation" line in the first cell to install OpenSoundscape. |
| [![Download via DownGit](https://img.shields.io/badge/GitHub-Download-teal?logo=github)](https://minhaskamal.github.io/DownGit/#/home?url=https://github.com/kitzeslab/opensoundscape/blob/master/docs/tutorials/train_cnn.ipynb) | The link downloads the tutorial file to your computer. Follow the [Jupyter installation instructions](https://opensoundscape.org/en/latest/installation/jupyter.html), then open the tutorial file in Jupyter. |

In [None]:
# Uncomment the next line to download OpenSoundscape for use in Google Colab
#!pip install opensoundscape

## Setup

### Import needed packages

In [14]:
# the cnn module provides classes for training/predicting with various types of CNNs
from opensoundscape import CNN

#other utilities and packages
import torch
import pandas as pd
from pathlib import Path
import numpy as np
import pandas as pd
import random 
import subprocess
from glob import glob
import sklearn

#set up plotting
from matplotlib import pyplot as plt
plt.rcParams['figure.figsize']=[15,5] #for large visuals
%config InlineBackend.figure_format = 'retina'

### Set random seeds

Set manual seeds for Pytorch and Python. These ensure the training results are reproducible. You probably don't want to do this when you actually train your model, but it's useful for debugging.

In [2]:
torch.manual_seed(0)
random.seed(0)
np.random.seed(0)

### Download files

Training a machine learning model requires some pre-labeled data. These data, in the form of audio recordings or spectrograms, are labeled with whether or not they contain the sound of the species of interest. 

These data can be obtained from online databases such as Xeno-Canto.org, or by labeling one's own ARU data using a program like Cornell's Raven sound analysis software. In this example we are using a set of annotated avian soundscape recordings that were annotated using the software Raven Pro 1.6.4 (Bioacoustics Research Program 2022):

<blockquote><i>An annotated set of audio recordings of Eastern North American birds containing frequency, time, and species information. </i> Lauren M. Chronister,  Tessa A. Rhinehart,  Aidan Place,  Justin Kitzes.
https://doi.org/10.1002/ecy.3329 
</blockquote>

These are the same data that are used by the annotation and preprocessing tutorials, so you can skip this step if you've already downloaded them there.

Download the datasets to your current working directory and unzip them. You can do so by running the cell below OR
- downloading and unzipping both `annotation_Files.zip` and `mp3_Files.zip` from the https://datadryad.org/stash/dataset/doi:10.5061/dryad.d2547d81z  
- Move the unziped contents of each into the folder `./resources/02/annotated_data/` (a subfolder of the current folder)

In [3]:
# Note: the "!" preceding each line below allows us to run bash commands in a Jupyter notebook
# If you are not running this code in a notebook, input these commands into your terminal instead
!wget -O annotation_Files.zip https://datadryad.org/stash/downloads/file_stream/641805
!wget -O mp3_Files.zip https://datadryad.org/stash/downloads/file_stream/641807
!unzip annotation_Files.zip -d ./resources/02/annotated_data/Annotation_Files
!unzip mp3_Files.zip -d ./resources/02/annotated_data/Recordings

--2023-10-06 12:34:19--  https://datadryad.org/stash/downloads/file_stream/641805
Resolving datadryad.org (datadryad.org)... 54.187.107.134, 54.185.232.144
Connecting to datadryad.org (datadryad.org)|54.187.107.134|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://dryad-assetstore-merritt-west.s3.us-west-2.amazonaws.com/ark%3A/13030/m5799nzg%7C1%7Cproducer/annotation_Files.zip?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA2KERHV5E3OITXZXC%2F20231006%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20231006T163420Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&X-Amz-Signature=4b4e8b760e41d2e3695671efea763389a4b7955d0c2a8e6ec5220b9348f989e3 [following]
--2023-10-06 12:34:19--  https://dryad-assetstore-merritt-west.s3.us-west-2.amazonaws.com/ark%3A/13030/m5799nzg%7C1%7Cproducer/annotation_Files.zip?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA2KERHV5E3OITXZXC%2F20231006%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20231006T163420Z&X-Amz-Expires

## Prepare audio data

To prepare audio data for machine learning, we need to convert our annotated data into clip-level labels.

These steps are covered in depth in other tutorials, so we'll just set our clip labels up quickly for this example.

First, get exactly matched lists of audio files and their corresponding selection files:

In [4]:
# Set the current directory to where the dataset is downloaded
dataset_path = Path("./annotated_data/")

# Make a list of all of the selection table files
selection_files = glob(f"{dataset_path}/Annotation_Files/*/*.txt")

# Create a list of audio files, one corresponding to each Raven file
# (Audio files have the same names as selection files with a different extension)
audio_files = [f.replace('Annotation_Files','Recordings').replace('.Table.1.selections.txt','.mp3') for f in selection_files]

Next, convert the selection files and audio files to a `BoxedAnnotations` object, which contains the time, frequency, and label information for all annotations for every recording in the dataset.

In [7]:
from opensoundscape.annotations import BoxedAnnotations
# Create a dataframe of annotations
annotations = BoxedAnnotations.from_raven_files(
    selection_files,
    audio_files)

  all_annotations = pd.concat(all_file_dfs).reset_index(drop=True)


In [8]:
# Parameters to use for label creation
clip_duration = 3
clip_overlap = 0
min_label_overlap = 0.25
species_of_interest = ["NOCA", "EATO", "SCTA", "BAWW", "BCCH", "AMCR", "NOFL"]

# Create dataframe of one-hot labels
clip_labels = annotations.one_hot_clip_labels(
    clip_duration = clip_duration, 
    clip_overlap = clip_overlap,
    min_label_overlap = min_label_overlap,
    class_subset = species_of_interest # You can comment this line out if you want to include all species.
)

In [9]:
clip_labels.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,NOCA,EATO,SCTA,BAWW,BCCH,AMCR,NOFL
file,start_time,end_time,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
annotated_data/Recordings/Recording_1/Recording_1_Segment_07.mp3,0.0,3.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
annotated_data/Recordings/Recording_1/Recording_1_Segment_07.mp3,3.0,6.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0
annotated_data/Recordings/Recording_1/Recording_1_Segment_07.mp3,6.0,9.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0
annotated_data/Recordings/Recording_1/Recording_1_Segment_07.mp3,9.0,12.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0
annotated_data/Recordings/Recording_1/Recording_1_Segment_07.mp3,12.0,15.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0


## Create train, validation, and test datasets

To train and test a model, we use three datasets:
* The **training dataset** is used to fit your machine learning model to the audio data. 
* The **validation dataset** is a held-out dataset that is used to select hyperparameters (e.g. how many epochs to train for) during training
* The **test dataset** is another held-out dataset that we use to check how the model performs on data that were not available at all during training. 

While both the training and validation datasets are used while trained the model, the test dataset is never touched until the model is fully trained and completed.

The training and validation datasets may be gathered from the same source as each other. In contrast, the test dataset is often gathered from a different source to assess whether the model's performance generalizes to a real-world problem. For example, training and validation data might be drawn from an online database like Xeno-Canto, whereas the testing data is from your own field data. 

### Create a test dataset

We'll separate the test dataset first. For a good assessment of the model's generalization, we want the test set to be independent of the training and validation datasets. For example, we don't want to use clips from the same source recording in the training dataset and the test dataset.

For this example, we'll use the recordings in the folders `Recording_1`, `Recording_2` and `Recording_3` as our training and validation data, and use the recordings in folder `Recording_4` as our test data. 

In [11]:
# Select all files from Recording_4 as a test set
mask = clip_labels.reset_index()['file'].apply(lambda x: 'Recording_4' in x).values
test_set = clip_labels[mask]

# All other files will be used as a training set
train_and_val_set = clip_labels.drop(test_set.index)

# Save .csv tables of the training and validation sets to keep a record of them
train_and_val_set.to_csv("./annotated_data/train_and_val_set.csv")
test_set.to_csv("./annotated_data/test_set.csv")

If you wanted, you could load the training and testing set from these saved CSV files.

In [12]:
train_and_val_set = pd.read_csv('./annotated_data/training_set.csv',index_col=[0,1,2])
test_set = pd.read_csv('./annotated_data/test_set.csv',index_col=[0,1,2])

### Split training and validation datasets

Now, separate the remaining non-test data into training and validation datasets.

The idea of keeping a separate validation dataset is that, throughout training, we can 'peek' at the performance on the validation set to choose hyperparameters. (This is in contrast to the test dataset, which we will not look at until we've finished training our model.)

One important hyperparameter is the number of **epochs** to train to, in order to prevent overfitting. Each epoch includes one round of fitting on each training sample. 

If a model's performance on a training dataset continues to improve as it trains, but its performance on the validation dataset plateaus, this could incate the model is **overfitting** on the training dataset, learning information specific to those particular samples instead of gaining the ability to generalize to new data.

In [17]:
# Split our training data into training and validation sets
train_df, valid_df = sklearn.model_selection.train_test_split(train_and_val_set, test_size=0.1, random_state=0)

### Resample data for even class representation

Before training, we will balance the number of samples of each class in the training set. This helps the model learn all of the classes, rather than paying too much attention to the classes with the most labeled annotations. 

In [18]:
from opensoundscape.data_selection import resample

# upsample (repeat samples) so that all classes have 800 samples
balanced_train_df = resample(train_df,n_samples_per_class=800,random_state=0)

## Set up model

Now we create a model object. We have to select several parameters when creating this object: its `architecture`, `classes`, and `sample_duration`. 

Some additional parameters can also be changed at this step, which we cover in a later tutorial ("Customize CNN training"). 

One optional parameter is quite important (e.g. whether the model should be trained to detect one or multiple classes per clip, `single_target=True` or `single_target=False`, so we'll set that manually here as well.

### Choose architecture

The `architecture` is the particular design of the CNN. This option can either be a string matching one of the architectures included in OpenSoundscape, or a custom PyTorch model object.

See what string architectures are available in OpenSoundscape:

In [25]:
import opensoundscape.ml
opensoundscape.ml.cnn_architectures.list_architectures()

['resnet18',
 'resnet34',
 'resnet50',
 'resnet101',
 'resnet152',
 'alexnet',
 'vgg11_bn',
 'squeezenet1_0',
 'densenet121',
 'inception_v3',
 'efficientnet_b0',
 'efficientnet_b4',
 'efficientnet_widese_b0',
 'efficientnet_widese_b4']

We will choose `resnet34` for this example. For more information on choosing architectures, see the "Customize CNN training" tutorial.

In [34]:
architecture = 'resnet34'

### Create CNN object

Now, create a CNN object with this architecture, the classes we put into the dataframe above, and the same sample duration as we selected above.

The first time you run this script for a particular architecture, OpenSoundscape will download the desired architecture.

In [35]:
# Create a CNN object designed to recognize 3-second samples
from opensoundscape import CNN

# Can use this code to get your classes, if needed
class_list = list(train_df.columns)

model = CNN(
    architecture = architecture,
    classes = class_list,
    sample_duration = clip_duration #3s, selected above
)

### Set up GPU

If a GPU is available on your computer, these steps will move the model to the GPU.

In [36]:
import torch
if torch.backends.mps.is_available():
    model.device='mps' #Apple Silicon
elif torch.cuda.is_available():
    model.device='cuda' #CUDA GPU  
print(f'model.device is: {model.device}')

model.device is: cpu


### Set up model logging

While this step is optional, it is very helpful for model training. In this step, we set up model logging on a service called **Weights & Biases** (AKA `wandb`). 

Weights & Biases is a free website you can use to monitor model training. It is integrated with OpenSoundscape to include helpful functions such as checking on your model's training progress in real time, visualizing the spectrograms created for training your model, comparing multiple tries at training the same model, and more. For more information, check out this [blog post](https://wandb.ai/wandb_fc/repo-spotlight/reports/Community-Spotlight-OpenSoundscape--Vmlldzo0MDcwMTI4). 

To use `wandb` logging, you will need to create an account on the [Weights and Biases website](https://wandb.ai/). The first time you use `wandb`, you'll be asked for an authentication key which can be found in your `wandb` profile. 

In [37]:
import wandb
try:
    wandb_session = wandb.init(
        entity='entity_name', #replace with your entity/group name
        project='opensoundscape training demo',
        name='Notebook 02: Train CNN',
    )
except: #if wandb.init fails, don't use wandb logging
    print('failed to create wandb session. wandb session will be None')
    wandb_session = None

[34m[1mwandb[0m: Currently logged in as: [33mrhine3[0m. Use [1m`wandb login --relogin`[0m to force relogin
wandb: ERROR Error while calling W&B API: entity entity_name not found during upsertBucket (<Response [404]>)
[34m[1mwandb[0m: [32m[41mERROR[0m It appears that you do not have permission to access the requested resource. Please reach out to the project owner to grant you access. If you have the correct permissions, verify that there are no issues with your networking setup.(Error 404: Not Found)


Problem at: /var/folders/t6/c2h76xs16cn4jt3vqc68b5lm0000gr/T/ipykernel_73952/377888475.py 3 <module>
failed to create wandb session. wandb session will be None


## Train the CNN

Finally, train the CNN for 30 epochs. Each **epoch** is one pass-through of all of the samples in the training dataset, plus running predictions on the validation dataset.

The samples in the training dataset are predicted on in smaller groups of samples called **batches**. The machine learning model predicts on every sample in the batch, then the model weights are updated based on those samples. Larger batches can increase training speed, but require more memory. If you get a memory error, try reducing the batch size.

We use default training parameters, but many aspects of CNN training can be customized (see the "Customize CNN training" tutorial for examples).

In [39]:
checkpoint_folder = Path("model_training_checkpoints")
checkpoint_folder.mkdir()

In [40]:
%%capture --no-stdout --no-display

model.train(
    balanced_train_df, 
    valid_df, 
    epochs = 30, 
    batch_size = 64, 
    log_interval = 100, #log progress every 100 batches
    num_workers = 32, #32 parallelized cpu tasks for preprocessing
    wandb_session = wandb_session,
    save_interval = 10, #save checkpoint every 10 epochs
    save_path = checkpoint_folder #location to save checkpoints
)


Training Epoch 0


Process Process-32:
Process Process-14:
Process Process-9:
Traceback (most recent call last):
  File "/Users/tessa/opt/anaconda3/envs/opso-dev/lib/python3.9/multiprocessing/process.py", line 316, in _bootstrap
    exitcode = 0
Traceback (most recent call last):
Traceback (most recent call last):
  File "/Users/tessa/opt/anaconda3/envs/opso-dev/lib/python3.9/multiprocessing/process.py", line 318, in _bootstrap
    util._exit_function()
  File "/Users/tessa/opt/anaconda3/envs/opso-dev/lib/python3.9/multiprocessing/util.py", line 360, in _exit_function
    _run_finalizers()
  File "/Users/tessa/opt/anaconda3/envs/opso-dev/lib/python3.9/multiprocessing/util.py", line 300, in _run_finalizers
    finalizer()
  File "/Users/tessa/opt/anaconda3/envs/opso-dev/lib/python3.9/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/Users/tessa/opt/anaconda3/envs/opso-dev/lib/python3.9/multiprocessing/queues.py", line 199, in _finalize_join
    

KeyboardInterrupt: 

**Clean up:** Run the following cell to delete the files created in this tutorial. However, these files are used in other tutorials, so you may wish not to delete them just yet.

In [41]:
import shutil
shutil.rmtree('./annotated_data')
shutil.rmtree('./wandb')
shutil.rmtree('./model_training_checkpoints')
Path('annotation_Files.zip').unlink()
Path('mp3_Files.zip').unlink()