# Machine learning: prediction

Machine learning-trained algorithms can be used to predict the identity of sounds in recordings. For instance, an algorithm trained how to detect the sound of a Wood Thrush can be used to figure out where Wood Thrushes vocalize in a set of autonomous recordings. 

The Kitzes Lab, the developers of OpenSoundscape, pre-trained a series of [baseline machine learning models](https://pitt.box.com/s/a6jeamnew098vp5a9a7m1h9j5rce6t6y) that can be used to predict the presence of [506 species of common North American birds](https://pitt.app.box.com/s/d0snd1tyilscksbxc36q2slz6s4aa2ag). These are our "beta" models and are for demonstration purposes only, not for research use. We hope to make our more accurate models available soon. 

If you are interested in using these machine learning models for research, please contact us at the [Kitzes Lab](https://kitzeslab.org).

This tutorial downloads an example model and demonstrates how to use it to predict the identity of birds in recordings. To download the tutorial as a Jupyter Notebook and run it on your own computer, click the "Edit on GitHub" button at the top right of the tutorial. You will have to [install OpenSoundscape](installation.html#installation) to use the tutorial.

### Import relevant packages
Import the following modules to run a pre-trained machine learning learning classifier. First, from OpenSoundscape we will need two classes (`Audio` and `SingleTargetAudioDataset`) and three functions (`run_command`, `lowercase_annotations`, and `predict`).

In [1]:
from opensoundscape.audio import Audio
from opensoundscape.datasets import SingleTargetAudioDataset
from opensoundscape.helpers import run_command
from opensoundscape.raven import lowercase_annotations
from opensoundscape.torch.predict import predict

Import the following machine learning-related modules. OpenSoundscape uses PyTorch to do machine learning.

In [2]:
import torch
import torch.nn
import torchvision.models
import torch.utils.data

Lastly, use a few miscellaneous functions.

In [3]:
import yaml
import os.path
import pandas as pd
from pathlib import Path
from math import floor

## Prepare model

To use the model, it must be downloaded onto your computer and loaded with the same specifications it was created with.

### Download model
Download the example model for Wood Thrush, *Hylocichla mustelina*. First, create a folder called `"prediction_example"` to store the model and its data in.

In [4]:
folder_name = "prediction_example"
folder_path = Path(folder_name)
if not folder_path.exists(): folder_path.mkdir()

Next, download the model from the Box storage site using the following lines. If you prefer, you can also download the model off of [this](https://pitt.app.box.com/s/dslgslmag7y8ojqxv28mwhbnt7irpgeo) webpage. Make sure to move it into the `"prediction_example"` folder and ensure that it is named `"hylocichla-mustelina-epoch-4.model"`.

In [5]:
def download_from_box(link, name):
    run_command(f"curl -L {link} -o ./{name}")

In [6]:
model_filename = folder_path.joinpath("hylocichla-mustelina-epoch-4.model")
download_from_box(
    link = "https://pitt.box.com/shared/static/dslgslmag7y8ojqxv28mwhbnt7irpgeo.model",
    name = model_filename
)

### Load model
At its core, a machine learning model consists of two things: its architecture and its weights. 

The architecture is the complex structure of the model, which in this case, is a convolutional neural network. Convolutional neural networks are a particular set of algorithms especially suited to extracting and interpreting features from images, such as combinations of lines, dots, and edges. In this case, we use a `resnet18` convolutional neural network. After feature extraction, the convolutional neural network's features are passed to a classifier. The classifier decides how to weight each feature in predicting the final class identity. The model was trained with a `Linear` classifier.

Create the architecture of the model. First, designate the model as a `resnet18` CNN.

In [7]:
model = torchvision.models.resnet18(pretrained=False)

Then, add the `fc` layers. "FC" stands for "fully connected". To set up the proper architecture, we need to specify the correct number of input features, output features, and classifier type. 

The number of input features to the FC is equal to the number of features extracted from the convolutional neural network and passed to the the FC layer: `model.fc.in_features`

In [8]:
num_cnn_features = model.fc.in_features

The models were trained to predict two classes (species present and species absent), so the number of output features of the FC layer is 2.

In [9]:
num_classes = 2

Finally, the classifier type is a `torch.nn.Linear` classifier.

In [10]:
model.fc = torch.nn.Linear(
    in_features = num_cnn_features,
    out_features = num_classes)

The weights of the model are distinguished from its architecture because, while the architecture is decided by humans, the weights of the architecture are learned during the machine learning process. When downloading the machine learning model, you downloaded the weights. 

First, use `torch.load` to get the model weights from the filename.

In [11]:
weights = torch.load(model_filename)

Now, load the weights into the architecture we have created. After this the model is almost ready to use.

In [12]:
model.load_state_dict(weights)

<All keys matched successfully>

## Prepare prediction files

To actually use the model, we need to download and prepare a set of recordings. The model was trained to make predictions on spectrograms made from 5 second-long recordings, so we will have to split the recordings up and transform them into spectrograms.

### Download data
As example data, we have provided a 1 minute-long soundscape which contains Wood Thrush vocalizations. 

The following code downloads this audio file into the `"prediction_example"` folder created above. If you prefer, you can also download this file at [this link](https://pitt.box.com/shared/z73eked7quh1t2pp93axzrrpq6wwydx0). Make sure to move it into the `"prediction_example"` folder and ensure that it is named `"1min.wav"`.

In [13]:
data_filename = folder_path.joinpath("1min.wav")
download_from_box(
    link = "https://pitt.box.com/shared/static/z73eked7quh1t2pp93axzrrpq6wwydx0.wav",
    name = data_filename
)

### Split data
The example soundscape must be split up into soundscapes of the same size as the ones the model was trained on. In this case, the soundscapes should be 5s long.

First, create a directory in which to save split files.

In [14]:
split_directory = folder_path.joinpath("split_files")
if not split_directory.exists(): split_directory.mkdir()

In [15]:
base_file = Audio.from_file(data_filename)
base_file.spl

<Audio(samples=(1920000,), sample_rate=32000)>

In [14]:
files_to_split = [data_filename]
split_directory = folder_path.joinpath("split_files")
dataset = SplitterDataset(
    files_to_split,
    overlap=0,
    duration=5,
    output_directory=split_directory,
    include_last_segment=True
)

dataloader = torch.utils.data.DataLoader(
    dataset,
    batch_size=1,
    shuffle=False,
    collate_fn=SplitterDataset.collate_fn,
)

results_csv = folder_path.joinpath("prediction_files.csv")
with open(results_csv, "w") as f:
    if False:
        f.write("Source,Annotations,Begin (s),End (s),Destination,Labels\n")
    else:
        f.write("Source,Begin (s),End (s),Destination\n")
    for idx, data in enumerate(dataloader):
        for output in data:
            f.write(f"{output}\n")

### Create a Dataset

Create a dataset from these data. We create a dictionary that associates numeric labels with the class names: 1 is for predicting a Wood Thrush's presence; 0 is for predicting a Wood Thrush's absence.

In [15]:
files_to_analyze=list(split_directory.glob("*.wav"))
sample_df = pd.DataFrame(columns=['file'],data=files_to_analyze)

In [16]:
label_dict = {0:'hylocichla-mustelina-absent', 1:'hylocichla-mustelina-present'}
test_dataset = SingleTargetAudioDataset(
    sample_df,
    filename_column = "file",
    label_dict = label_dict
)

## Use model on prediction files

In [17]:
model.eval()
prediction_df = predict(model, test_dataset, label_dict=label_dict)
prediction_df

Unnamed: 0,hylocichla-mustelina-absent,hylocichla-mustelina-present
prediction_example/split_files/bc645003351149f4a7e2c7109b22afc1.wav,0.816133,-0.90332
prediction_example/split_files/e36a0f200cdf42a23d49e78445121387.wav,1.480433,-0.927409
prediction_example/split_files/4940c91a1837410240042cf55ccad568.wav,1.940377,-1.725088
prediction_example/split_files/cfc05bd9e1b97eebdca3badc288de0cd.wav,2.629047,-1.988923
prediction_example/split_files/32747f95e81ee34c56ed177c4f7e7df5.wav,2.513747,-2.366485
prediction_example/split_files/369134205221b5a25fac0e264d0a1482.wav,2.351259,-1.628652
prediction_example/split_files/f3d6aeabe7725f649dc56d6db04aa83f.wav,1.570931,-1.124706
prediction_example/split_files/54534197c0768b6bb2a9305013e8c1af.wav,1.744635,-1.055664
prediction_example/split_files/e0c2d4aed1d79d4a6194be948d3292da.wav,1.315882,-1.407135
prediction_example/split_files/9d276a5dd54b631c4aa63da407a1225d.wav,1.766514,-1.096341


This command "cleans up" by deleting all the downloaded files and results.

In [18]:
import shutil
shutil.rmtree(folder_path)