# Customizing CNN Training
This notebook demonstrates how to use Opensoundscape.torch.cnn classes to

- schedule the learning rate decay

- choose from various architectures

- use strategic sampling for imbalanced training data

- train on spectrograms with a bandpassed frequency range

Rather than demonstrating their effects on training (model training is slow!), most examples in this notebook either don't run .train() or "train" the model for 0 epochs for the purpose of demonstration

For introductory demos (basic training, prediction, saving/loading models), see the basic training and prediction notebook tutorial (cnn.ipynb). 

In [1]:
from opensoundscape.preprocess.preprocessors import BasePreprocessor, AudioToSpectrogramPreprocessor, CnnPreprocessor
from opensoundscape.torch.models.cnn import PytorchModel, Resnet18Multiclass, Resnet18Binary, InceptionV3
from opensoundscape.helpers import run_command

import torch
import pandas as pd
from pathlib import Path
import numpy as np
import pandas as pd
import random 

from matplotlib import pyplot as plt
plt.rcParams['figure.figsize']=[15,5] #for big visuals
%config InlineBackend.figure_format = 'retina'

## Prepare audio data

### Download labeled audio files

The Kitzes Lab has created a small labeled dataset of short clips of American Woodcock vocalizations. You have two options for obtaining the folder of data, called `woodcock_labeled_data`:

1. Run the following cell to download this small dataset. These commands require you to have `curl` and `tar` installed on your computer, as they will download and unzip a compressed file in `.tar.gz` format. 

2. Download a `.zip` version of the files by clicking [here](https://pitt.box.com/shared/static/m0cmzebkr5qc49q9egxnrwwp50wi8zu5.zip). You will have to unzip this folder and place the unzipped folder in the same folder that this notebook is in.

If you already have these files, you can skip or comment out this cell

In [2]:
commands = [
    "curl -L https://pitt.box.com/shared/static/79fi7d715dulcldsy6uogz02rsn5uesd.gz -o ./woodcock_labeled_data.tar.gz",
    "tar -xzf woodcock_labeled_data.tar.gz", # Unzip the downloaded tar.gz file
    "rm woodcock_labeled_data.tar.gz" # Remove the file after its contents are unzipped
]
for command in commands:
    run_command(command)

### create one-hot labels
see Basic Training/Prediction notebook for more details

In [3]:
labels = pd.read_csv(Path("woodcock_labeled_data/woodcock_labels.csv"))
labels.filename = ['./woodcock_labeled_data/'+f for f in labels.filename]

labels['negative']=[0 if label=='present' else 1 for label in labels['woodcock']]
labels['positive']=[1 if label=='present' else 0 for label in labels['woodcock']]
classes=['negative','positive']
labels = labels.set_index('filename')[classes]
labels.head()

Unnamed: 0_level_0,negative,positive
filename,Unnamed: 1_level_1,Unnamed: 2_level_1
./woodcock_labeled_data/d4c40b6066b489518f8da83af1ee4984.wav,0,1
./woodcock_labeled_data/e84a4b60a4f2d049d73162ee99a7ead8.wav,1,0
./woodcock_labeled_data/79678c979ebb880d5ed6d56f26ba69ff.wav,0,1
./woodcock_labeled_data/49890077267b569e142440fa39b3041c.wav,0,1
./woodcock_labeled_data/0c453a87185d8c7ce05c5c5ac5d525dc.wav,0,1


### split into train and validation sets

In [4]:
from sklearn.model_selection import train_test_split
train_df,valid_df = train_test_split(labels,test_size=0.2,random_state=0)
# for multi-class need at least a few images for each batch
len(train_df)

23

### Create Preprocessors

In [5]:
from opensoundscape.preprocess.preprocessors import CnnPreprocessor

train_dataset = CnnPreprocessor(train_df, overlay_df=train_df)

valid_dataset = CnnPreprocessor(valid_df,overlay_df=valid_df,return_labels=True)

## Model training parameters
Let's take a peak at the current parameters, stored in a dictionary

In [6]:
model = Resnet18Binary(classes)
model.optimizer_params

created PytorchModel model object with 2 classes


{'feature': {'params': <generator object Module.parameters at 0x14699a850>,
  'lr': 0.001,
  'momentum': 0.9,
  'weight_decay': 0.0005},
 'classifier': {'params': <generator object Module.parameters at 0x14699aa50>,
  'lr': 0.01,
  'momentum': 0.9,
  'weight_decay': 0.0005}}

### Learning Rates
The learning rate determines how much the model changes its weights every time it calculates the loss function. 

In Resnet18Multiclass and Resnet18Binary, we can modify the learning rates for the feature extration and classification blocks of the network separately. 

for example, we can specify a relatively fast learning rate for features and slower one for classifiers (though this might not be a good idea in practice) 

In [7]:
model = Resnet18Binary(classes)
model.optimizer_params['feature']['lr']=0.01
model.optimizer_params['classifier']['lr']=0.001

created PytorchModel model object with 2 classes


### Weight decay
Weight decay performs L2 regularization - that is, it gives an incentive for the model to have small weights rather than large weights. 

In [8]:
model.optimizer_params['feature']['weight_decay']=0.001
model.optimizer_params['classifier']['weight_decay']=0.001

### Learning rate schedule
It's often helpful to decrease the learning rate over the course of training. By default, the learning rates are multiplied by 0.7 once every 10 epochs. 

Let's modify that for a very fast training schedule, where we want to multiply the learning rates by 0.1 every epoch. 

In [9]:
model.lr_cooling_factor = 0.1
model.lr_update_interval = 1

## Changing network architecture
We can swap out the model's CNN architecture as well, or design a custom one. 

A few architectures have custom classes built around them in the `cnn` module (Resnet and Inception, specifically). 

Here we show how to build a model from any of the architectures in `opensoundscape.torch.architectures.cnn_architectures`

### Building a model with a Pytorch stock architecture
Pytorch provides various well known CNN architectures out of the box. 

The `cnn_architectures` modules provides helper functions to generate various CNN architectures in Pytorch. Calling a function such as `alexnet()` will return a cnn architecture that we can use to instantiate PytorchModel. 

We can optionally 

- use pretrained weights provided by Pytorch (trained on Imagenet) - this often speeds up training significantly and is on by default

- freeze the feature extractor if we only want to train the final classification layer of the network but not modify any other weights

Here we'll set `use_pretrained=False` to avoid downloading all of the weights for AlexNet. 

In [10]:
from opensoundscape.torch.architectures.cnn_architectures import alexnet

#initialize the AlexNet architecture
arch = alexnet(num_classes=2,freeze_feature_extractor=False,use_pretrained=False)

#generate a model object with this architecture
model = PytorchModel(architecture=arch, classes=['negative','positive'])

created PytorchModel model object with 2 classes


### Changing the architecture of an existing model
Even after initializing a model with an architecture, we can change it by replacing the model's `.network`

In [11]:
from opensoundscape.torch.architectures.cnn_architectures import densenet121

#initialize the AlexNet architecture
arch = densenet121(num_classes=2,freeze_feature_extractor=False,use_pretrained=False)

# replace the alexnet architecture with the densenet architecture
model.network = arch

### Inception V3 architecture
The Inception architecture requires slightly different training and preprocessing, because

1) the input image shape must be 299x299

2) Inception's forward pass gives output + auxiliary output

The InceptionV3 class in `cnn` handles the necessary modifications in training and prediction for you, but you'll need to make sure to pass images of the correct shape from your Preprocessor. Here's an example:

In [13]:
from opensoundscape.torch.models.cnn import InceptionV3

#generate an Inception model
model = InceptionV3(classes=['negative','positive'],use_pretrained=False)

#create a copy of the training dataset
inception_dataset = train_dataset.sample(frac=1)

#modify the preprocessor to give 299x299 image shape
inception_dataset.actions.to_img.set(shape=[299,299])

#train and validate for 1 epoch
#note that Inception will complain if batch_size=1
model.train(inception_dataset,inception_dataset,epochs=1,batch_size=4)

#predict
preds, _, _ = model.predict(inception_dataset)



created PytorchModel model object with 2 classes
	Jacc: 0.500 Hamm: 0.500 DistLoss: 1.063

Validation.
(23, 2)
	 Precision: 0.391304347826087
	 Recall: 0.5
	 F1: 0.4390243902439025
Saving weights, metrics, and train/valid scores.
Saving to epoch-0.model
Updating best model
Saving to best.model

Best Model Appears at Epoch 0 with F1 0.439.
(23, 2)


### Use a custom-built architecture

You can also build a custom architecture and initialize a PytorchModel model with it, or replace a model's `.network` with your custom architecture. 

For example, we can use the `architectures.resnet` module to build the ResNet50 architecture (just for demonstration - we could also simply create this architecture from the cnn_architectures module).

In [14]:
# import a module that builds ResNet architecture from scratch
from opensoundscape.torch.architectures.resnet import ResNetArchitecture

#initialize the ResNet50 architecture
net=ResNetArchitecture(
            num_cls=2,
            weights_init='ImageNet',
            num_layers=50,
        )

#generate a regular resnet18 object
model = Resnet18Multiclass(classes=['negative','positive'])

#replace the model's network with the ResNet50 architecture
model.network = net

print('number of layers:')
print(model.network.num_layers)

created PytorchModel model object with 2 classes
number of layers:
50


## Sampling for imbalanced training data
The imbalanced data sampler will help to ensure that a single batch contains only a few classes during training, and that the classes will recieve approximately equal representation within the batch. This is useful for _imbalanced_ training data (when some classes have far fewer training samples than others). 

In [15]:
model = Resnet18Binary(classes)
model.sampler = 'imbalanced' #default is None

#...you can now train your model as normal
model.train(train_dataset,valid_dataset,epochs=0)

#once we run train(), we can see that the train_loader is using an ImbalancedDatasetSampler
print('sampler:')
model.train_loader.sampler

created PytorchModel model object with 2 classes

Best Model Appears at Epoch 0 with F1 0.000.
sampler:


<opensoundscape.torch.sampling.ImbalancedDatasetSampler at 0x14c4c0190>

## Training with custom preprocessors
The preprocessing notebook tutorial gives in-depth descriptions of how to customize your preprocessing pipeline.

Here, we'll just give a quick example of tweaking the preprocessing pipeline: providing the CNN with a bandpassed spectrogram object instead of the full frequency range. 

### bandpassed spectrograms

In [16]:
model = Resnet18Binary(classes)

# turn onthe bandpass action of the datasets
train_dataset.actions.bandpass.on()
valid_dataset.actions.bandpass.on()

#specify the min and max frequencies for the bandpass action
train_dataset.actions.bandpass.set(min_f=3000,max_f=5000)
valid_dataset.actions.bandpass.set(min_f=3000,max_f=5000)

#now we can train and validate on the bandpassed spectrograms
#don't forget that you'll need to apply the same bandpass actions to 
#any datasets that you use for predicting on new audio files 
model.train(train_dataset,valid_dataset,epochs=0)

created PytorchModel model object with 2 classes

Best Model Appears at Epoch 0 with F1 0.000.


### clean up
remove files

In [29]:
from glob import glob
for p in glob('./multilabel_train/*'):
    Path(p).unlink()
for p in glob('./woodcock_labeled_data/*'):
    Path(p).unlink()
for p in glob('./*.model'):
    Path(p).unlink()