## Use the CNN to detect Rana sierrae vocalizations

This script uses the CNN trained in 03_train_cnn.py to detect Rana sierrae vocalizations in audio recordings.

Note that the github repository does not contain the full (very large) table of CNN outputs nor the full original audio
dataset, and thus does not contain the files needed to reproduce the outputs of this script. 

Instead, this script demonstrates the prediction process by generating CNN scores on the validation set.


This notebook is part of a series of notebooks and scripts in the [repository](https://github.com/kitzeslab/rana-sierrae-cnn):

- `01_explore_annotated_data.ipynb` Explore annotated dataset of Rana sierrae call types

- `02_prep_training_data.ipynb` Prepare annotated files for training a CNN machine learning model

- `03_train_cnn.ipynb` Train a CNN to recognize Rana sierrae vocaliztaions

- `04_cnn_prediction.ipynb` Use the cnn to detect Rana sierrae in audio recordings

- `05_cnn_validation.ipynb` Analyze the accuracy and performance of the CNN

- `06_aggregate_scores.py` Aggregate scores from CNN prediction across dates and times of day

- `07_explore_results.ipynb` Analyze temporal patterns of vocal activity using the CNN detections


### Imports

In [1]:
from opensoundscape.torch.models.cnn import load_model
import wandb
from pathlib import Path
from datetime import datetime
import pandas as pd
from torch import softmax, tensor

### Predict on validation set audio files

load model and files to predict on

In [2]:
# choose output location for saving .csv files of CNN output scores
score_save_dir = './resources/'

# load the OpenSoundscape CNN model object
model = load_model('./resources/rana_sierrae_cnn.model')

# choose audio files to run prediction on
validation_df = pd.read_csv('./resources/validation_set.csv').set_index(['file','start_time','end_time'])

run prediction

In [3]:
# generate predictions using batch size 1024
# select smaller batch size if you have memory issues
scores = model.predict(validation_df,num_workers=12,batch_size=1024,wandb_session=None)#wandb_session)



### Save validation scores along with labels

In [4]:
# compute the softmax score across the two classes
scores['softmax']=softmax(tensor(scores[['rana_sierrae','negative']].values),1)[:,0].numpy()

# save scores to file along with labels
validation_df['score']=scores['softmax']
validation_df[['rana_sierrae','score']].to_csv('./resources/validation_labels_and_scores.csv')

Check validation metrics

In [5]:
from sklearn.metrics import precision_recall_curve, average_precision_score, roc_auc_score, auc

In [6]:
p,r,t = precision_recall_curve(validation_df['rana_sierrae'],validation_df['score'])
print(f"Average precision: {average_precision_score(validation_df['rana_sierrae'],validation_df['score'])}")
print(f"Area under ROC curve: {roc_auc_score(validation_df['rana_sierrae'],validation_df['score'])}")
print(f"Area under P-R curve: {auc(r,p)}")

Average precision: 0.9187549944446871
Area under ROC curve: 0.951881297624947
Area under P-R curve: 0.9186412082012038
