## Overview
This notebooks illustrates usage of the Tweetynet convolutional neural network for birdsong syllable classification and supporting software. The network processes magnitude spectral windows of raw audio recordings by extracting visual features, downsampling the input via 2D convolution and pooling operations, and finally providing downsampled activation maps as input to an LSTM, which feeds a linear readout unit that classifies discrete time steps as belonging either one of the pre-specified syllable types or a period of non-singing. In this notebook illustrate how to generate magnitude spectrograms for a variety of parameterizations, generate labelvectors from song annotation files, create datasets for training and evalution, and build + train a model.

This software's unique contribution is in its support of "wideband" spectral input to the Tweetynet neural network. More concretely, our implementation of Tweetynet can process a 3D input spectrogram window where depth is created by stacking copies of the same window that are computed with different FFT parameterizations. Tweetynet applies a distinct layer 1 convolution and pooling operation to each channel to appropriately featurize and downsample input to matching dimensions prior to the second convolutional layer. In addition, provided tooling for spectrogram and labelvector generation are easily parameterized with wideband input in mind. 

## Getting Started
Before starting visit and review ```/parameters/param.py```. Said file specifies key parameters of this script and its supporting software. At minimum, you will need to hardcode the following named directory parameters:
- ```audio_dir_path```
- ```annot_dir_path```
- ```spect_dir_path```
- ```windowed_spects_dir_path```
- ```windowed_labelvecs_dir_path```
- ```uncut_spects_dir_path```
- ```uncut_labelvecs_dir_path```

As you work your way through this notebook consider re-referencing ```/parameters/params.py``` along with other ```_params.py``` files

## Setup Directories
Create necessary directories if need be or remove old files from existing directories. Ensure you have already assigned the above listed named directory parameters.

In [1]:
from parameters.params import (
    windowed_labelvecs_dir_path,
    uncut_labelvecs_dir_path,
    windowed_spects_dir_path,
    uncut_spects_dir_path,
)
dirs = [
    windowed_spects_dir_path,
    windowed_labelvecs_dir_path,
    uncut_labelvecs_dir_path,
    uncut_spects_dir_path,
]
from src.utilities import setup_directories
setup_directories(dirs)

## Write Spectrograms
For each provided audio file we compute and save to disk a full duration spectrogram as well as consecutive overlapping windows. Short windows are useful for training. Full duration spectrograms can be used for reference and/or model evalution on song-by-song basis. As discussed, this version of Tweetynet supports processing multiple input spectrograms as "channels". Accordingly, with a single audio file it is easy to generate multiple spectrograms using different parameterizations to render different output dimensions. Our spectrogram generation procedure includes small quirks to produce output images with an identical number of pixels for different FFT setups. For example, with spectrogram A, B, and C, produced with FFT sizes 256, 512, and 1024, the height (frequency dimension) of C will be twice that of B and four times that of A. An identical relationship holds for the time dimension.

NOTE: In addition to actual spectrograms and spectrogram windows this method writes other metadata to disk with each record. Details follow.


In [2]:
from src.spect_writer import SpectWriter
from parameters.spect_writer_params import spect_writer_params
spect_writer = SpectWriter(**spect_writer_params)
spect_writer.write()

## Determine Actual Spectrogram Window Sizes 
In order to instantiate our network, we need to provide input spectrogram shapes. Here, use a utility method to load a sample set of spectrogram windows, extract the shapes, and save them to the 'network_params' dictionary.

In [3]:
from src.utilities import get_spect_window_shapes
from parameters.params import windowed_spects_dir_path, spect_file_fmt
from parameters.net_params import network_params
n_ffts = network_params["n_ffts"]
network_input_shapes = get_spect_window_shapes(windowed_spects_dir_path, spect_file_fmt, n_ffts)
network_params["input_shapes"] = network_input_shapes

## Create Network. Sample 'forward()' Call To Compute Labelvec Length
To create label vectors from audio file anntations we need to know the network's output sizes for predetermined input sizes. Instiate the network with these parameters and save the labelvec length to be used later. NOTE: intuitively, we would expect the length of the labelvector to equal the length of the input spectrogram in the horizontal (time) dimension; however, as discussed, this software supports processing multiple spectrogram "channels" of different aspect ratios (time-frequency resolutions). Typically, we would not downsample in the time dimension, but in some cases we might want to hit specific dimensions across all input channels. Being sensitive to these cases we run a run a test input through the network on instantiation and confirm labelvector size.

In [4]:
from src.network import MultiChannelTweetynet
tweetynet = MultiChannelTweetynet(**network_params)
labelvec_len = tweetynet.labelvec_len

## View Tweetynet Architecture

In [5]:
print(tweetynet)

MultiChannelTweetynet(
  (cnn1_layers): ModuleDict(
    (256): DataParallel(
      (module): Sequential(
        (0): PadSame()
        (1): Conv2d(1, 32, kernel_size=(8, 4), stride=(1, 1))
        (2): ReLU(inplace=True)
        (3): MaxPool2d(kernel_size=(8, 4), stride=(8, 4), padding=0, dilation=1, ceil_mode=False)
      )
    )
    (512): DataParallel(
      (module): Sequential(
        (0): PadSame()
        (1): Conv2d(1, 32, kernel_size=(16, 2), stride=(1, 1))
        (2): ReLU(inplace=True)
        (3): MaxPool2d(kernel_size=(16, 2), stride=(16, 2), padding=0, dilation=1, ceil_mode=False)
      )
    )
    (1024): DataParallel(
      (module): Sequential(
        (0): PadSame()
        (1): Conv2d(1, 32, kernel_size=(32, 1), stride=(1, 1))
        (2): ReLU(inplace=True)
        (3): MaxPool2d(kernel_size=(32, 1), stride=(32, 1), padding=0, dilation=1, ceil_mode=False)
      )
    )
  )
  (cnn2): DataParallel(
    (module): Sequential(
      (0): PadSame()
      (1): Conv2d(96

## Write Labelvectors
For training and evalution, each spectrogram and spectrogram window needs a labelvector. This method writes labelvectors to disk both for complete spectrograms and for shorter training windows. 


NOTE: For complete spectrograms we write two versions of the labelvector to disk. Details follow.

In [6]:
from src.labelvec_writer import LabelVecWriter
from parameters.labelvec_writer_params import labelvec_writer_params
labelvec_writer_params["labelvec_len"] = labelvec_len
labelvec_writer = LabelVecWriter(**labelvec_writer_params)
labelvec_writer.write()

## Create Training and Evaluation Datasets
Try indexing into ```train_dataset``` and ```eval_dataset``` to inspect how training and evalution samples are structured.

In [3]:

from src.dataset import EvalDataset, TrainDataset
from parameters.dataset_params import eval_dataset_params, train_dataset_params
train_dataset = TrainDataset(**train_dataset_params)
eval_dataset = EvalDataset(**eval_dataset_params)


## Load Training Parameters, Instantiate DataLoaders

In [4]:
from parameters.train_params import device, num_epochs, train_batch_size, eval_batch_size, num_workers, eval_step, lr
from torch.utils.data import DataLoader
train_data = DataLoader(train_dataset, batch_size=train_batch_size, num_workers=num_workers, shuffle=True)
eval_data = DataLoader(eval_dataset, batch_size=eval_batch_size, num_workers=num_workers, shuffle=False)   

## Instantiate Model
Note, in ```model_params``` below we include the network (tweetynet) that was setup at an earlier step in this notebook.

In [5]:
from src.model import TweetynetModel
model_params = {
    "network": tweetynet,
    "train_data": train_data,
    "eval_data": eval_data,
    "device": device,
    "lr": lr,
}
model = TweetynetModel(**model_params)

## Start Train Loop

In [6]:
model.train(eval_step=eval_step, num_epochs=num_epochs)

---------------- EPOCH 1 


[0.6184006211180124, 0.5958751393534002, 0.6606280193236715, 0.6065217391304348, 0.625, 0.5795932678821879, 0.5750679347826086, 0.5591032608695652]


0.6025237478074852