## Transform annotations to CNN usable format

Current annotations from 2024 are in boxed format as annotated in Raven. Create df in which there is a "multi index" of "file,start_time,end_time" and one column for each target class, with a 1 for presence and a 0 for absence of the sound in a particular time segment of an audio file

In [1]:
# Opso inports
from opensoundscape import Audio, Spectrogram
from opensoundscape.annotations import BoxedAnnotations

# General purpose packages
import numpy as np
import pandas as pd
from glob import glob
from pathlib import Path

from matplotlib import pyplot as plt
plt.rcParams['figure.figsize']=[15,5] # for big visuals
%config InlineBackend.figure_format = 'retina'

In [None]:
import os

os.getcwd()

'/Users/SML161/ecco28_rasi_dialects/rasi_active_learning/1_prep_labels'

## Load multiple Raven annotation tables 
- we need to pair all audio files with their Raven annotation files
- and create a df of labels corresponding to short segments of each audio file

In [3]:
# Set directory to where the dataset is downloaded
dataset_path = "../data/"

In [56]:
# Get a list of all the selection table files using glob, which finds all files matching the "wildcard" pattern
selections = glob(f"{dataset_path}/2024/raven_boxed_annotations_2024/*.txt")

In [None]:
# create a list of audio files, one corresponding to each Raven file
# (Audio files have the same names as selection files with a different extension)
audio_files = [
    f.replace(
        "raven_boxed_annotations_2024", "2_annotate_stratified_v2/audio_clips"
    ).replace(".Table.1.selections.txt", ".WAV")
    for f in selections
]

In [None]:
audio_annotation = pd.DataFrame(
    {"audio_file": audio_files, "annotation_file": selections}
)
audio_annotation.to_csv("../annotations/audio_annotation_pairs.csv", index=False)

In [59]:
all_annotations = BoxedAnnotations.from_raven_files(
    selections, annotation_column="Annotation", audio_files=audio_files
)
all_annotations.df.head(2)

Unnamed: 0,audio_file,annotation_file,annotation,start_time,end_time,low_f,high_f,Delta Time (s),Notes,Channel,Delta Freq (Hz),Avg Power Density (dB FS/Hz),Selection,View
0,../data//2024/2_annotate_stratified_v2/audio_c...,../data//2024/raven_boxed_annotations_2024/041...,RASI_?,2.224992,2.591836,344.554,827.958,0.3668,,1,483.404,-71.0,1,Spectrogram 1
1,../data//2024/2_annotate_stratified_v2/audio_c...,../data//2024/raven_boxed_annotations_2024/041...,RASI_A,4.545682,4.904551,390.837,941.095,0.3589,,1,550.258,-72.29,2,Spectrogram 1


In [None]:
all_annotations = all_annotations.convert_labels({"RASI_? ": "RASI_?", "u": "U"})

In [61]:
all_annotations.df.annotation.unique()

array(['RASI_?', 'RASI_A', 'RASI_B', 'RASI_C', 'RASI_D', 'RASI_E', 'U',
       nan], dtype=object)

In [62]:
# How many annotations do we have?
all_annotations.df.annotation.value_counts()

annotation
RASI_?    762
RASI_A    258
RASI_E    253
RASI_C    189
RASI_B     79
RASI_D     75
U          53
Name: count, dtype: int64

## Format annotations for machine learning

To use annotations to train or validate machine learning models, we usually want to split the audio into short audio clips rather than keep it as a long file

We can easily convert this annotation format to a table of 0 (absent) or 1 (present) labels for a series of time-regions or "clips" within each audio file. Each class will be a separate column

### What is multi-hot encoding
Files in this format are sometimes called "multi-hot" encoded labels. this ML term refers to a way to format a table of labels in which:
- each row represents a single sample, like a 5s long clip
- each column represents a single possible class (e.g. one of multiple species)
- A "0" in a row and column means that in that sample, the class is not present
- A "1" is "hot", meaning that in that sample, the class is present

## Create a multi-hot encoded dataframe

# choose clip parameters
- how many seconds is each audio "clip"  that we want to generate a label for (clip_duration)
- how many seconds of overlap should there be between consecutive clips? (clip_overlap)
- how much does an annotation need to overlap with a clip for us to consider the annotation to apply to the clip? (min_label_overlap)

In [63]:
clip_duration = 2
clip_overlap = 0
min_label_overlap = 0.2

In [64]:
# select subset of classes
# call_types = ["RASI_A", "RASI_B", "RASI_C", "RASI_D", "RASI_E"]

In [65]:
labels_df = all_annotations.clip_labels(
    clip_duration=clip_duration,
    clip_overlap=clip_overlap,
    min_label_overlap=min_label_overlap,
    # class_subset=call_types,  # You can comment this line out if you want to include all species.
).astype(int)
labels_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,RASI_?,RASI_A,RASI_B,RASI_C,RASI_D,RASI_E,U
file,start_time,end_time,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
../data//2024/2_annotate_stratified_v2/audio_clips/0418_MSD-1743_20240622_130000.WAV,0.0,2.0,0,0,0,0,0,0,0
../data//2024/2_annotate_stratified_v2/audio_clips/0418_MSD-1743_20240622_130000.WAV,2.0,4.0,1,0,0,0,0,0,0
../data//2024/2_annotate_stratified_v2/audio_clips/0418_MSD-1743_20240622_130000.WAV,4.0,6.0,0,1,0,0,0,0,0
../data//2024/2_annotate_stratified_v2/audio_clips/0418_MSD-1743_20240622_130000.WAV,6.0,8.0,0,0,0,0,0,0,0
../data//2024/2_annotate_stratified_v2/audio_clips/0418_MSD-1743_20240622_130000.WAV,8.0,10.0,0,0,0,0,0,0,0


In [None]:
# max(axis=1) selects for rows, it is the maximum in each row for selected columns (in this case, RASI_? and U)
rows_containing_uncertain = labels_df[["RASI_?", "U"]].max(axis=1).astype(bool)

selection_boolean_mask = (
    ~rows_containing_uncertain
)  # not uncertain (~ inverts true to false and vv)
labels_no_uncertain = labels_df[selection_boolean_mask].copy()

In [67]:
labels_no_uncertain.sum(0), labels_no_uncertain.shape, labels_df.shape

(RASI_?      0
 RASI_A    210
 RASI_B     70
 RASI_C    116
 RASI_D     61
 RASI_E    227
 U           0
 dtype: int64,
 (3281, 7),
 (3975, 7))

In [None]:
# creating column to group together A,B,D,E call types
labels_no_uncertain["RASI_main"] = labels_no_uncertain[
    ["RASI_A", "RASI_B", "RASI_D", "RASI_E"]
].max(axis=1)
labels_no_uncertain[["RASI_C", "RASI_main"]].to_csv(
    "../annotations/rasi_2024_2s_labels.csv"
)

## Repeating above procedure for 2022 data

In [69]:
# Set directory to where the dataset is downloaded
dataset_path = "../data/"

In [None]:
# Get a list of all the selection table files using glob, which finds all files matching the "wildcard" pattern
selections_22 = glob(
    f"{dataset_path}/Rana sierrae annotated aquatic soundscapes 2022/raven_selection_tables/*.txt"
)


# create a list of audio files, one corresponding to each Raven file
# (Audio files have the same names as selection files with a different extension)
audio_files_22 = [
    f.replace("raven_selection_tables", "clips").replace(
        ".Table.1.selections.txt", ".wav"
    )
    for f in selections_22
]

In [None]:
# Saving table with corresponding names of audio and annotations
audio_annotation_22 = pd.DataFrame(
    {"audio_file": audio_files_22, "annotation_file": selections_22}
)
audio_annotation_22.to_csv("../annotations/audio_annotation_pairs_22.csv", index=False)

In [72]:
all_annotations_22 = BoxedAnnotations.from_raven_files(
    selections_22, annotation_column="Annotation", audio_files=audio_files_22
)
all_annotations_22.df.head(2)

  all_annotations_df = pd.concat(all_file_dfs).reset_index(drop=True)


Unnamed: 0,audio_file,annotation_file,annotation,start_time,end_time,low_f,high_f,Selection,View,Channel
0,../data//Rana sierrae annotated aquatic sounds...,../data//Rana sierrae annotated aquatic sounds...,V,2.211668,2.926931,460.7,1193.7,1,Spectrogram 1,1
1,../data//Rana sierrae annotated aquatic sounds...,../data//Rana sierrae annotated aquatic sounds...,C,4.54097,4.785179,358.2,597.0,7,Spectrogram 1,1


In [73]:
# How many annotations do we have?
all_annotations_22.df.annotation.value_counts()

annotation
A     762
V     205
C     130
X     121
D     103
B      34
U       6
?       2
A       1
c       1
Name: count, dtype: int64

In [74]:
all_annotations_22.df.annotation.unique()

array(['V', 'C', 'X', 'A', 'D', nan, 'A ', 'B', '?', 'U', 'c'],
      dtype=object)

In [None]:
all_annotations_22 = all_annotations_22.convert_labels(
    {"U": "?", "A ": "A", "c": "C", "X": "?"}
)

In [76]:
clip_duration = 2
clip_overlap = 0
min_label_overlap = 0.2

In [77]:
labels_df_22 = all_annotations_22.clip_labels(
    clip_duration=clip_duration,
    clip_overlap=clip_overlap,
    min_label_overlap=min_label_overlap,
    # class_subset=call_types,  # You can comment this line out if you want to include all species.
).astype(int)
labels_df_22.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,V,C,?,A,D,B
file,start_time,end_time,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
../data//Rana sierrae annotated aquatic soundscapes 2022/clips/sine2022a_MSD-0558_20220622_081500_0-10s.wav,0.0,2.0,0,0,0,0,0,0
../data//Rana sierrae annotated aquatic soundscapes 2022/clips/sine2022a_MSD-0558_20220622_081500_0-10s.wav,2.0,4.0,1,0,0,0,0,0
../data//Rana sierrae annotated aquatic soundscapes 2022/clips/sine2022a_MSD-0558_20220622_081500_0-10s.wav,4.0,6.0,0,0,0,0,0,0
../data//Rana sierrae annotated aquatic soundscapes 2022/clips/sine2022a_MSD-0558_20220622_081500_0-10s.wav,6.0,8.0,0,0,0,0,0,0
../data//Rana sierrae annotated aquatic soundscapes 2022/clips/sine2022a_MSD-0558_20220622_081500_0-10s.wav,8.0,10.0,0,0,0,0,0,0


In [None]:
# max(axis=1) selects for rows, it is the maximum in each row for selected columns (in this case, RASI_? and U)
rows_containing_uncertain_22 = labels_df_22[["?"]].max(axis=1).astype(bool)

selection_boolean_mask_22 = (
    ~rows_containing_uncertain_22
)  # not uncertain (~ inverts true to false and vv)
labels_no_uncertain_22 = labels_df_22[selection_boolean_mask_22].copy()

In [79]:
labels_no_uncertain_22.sum(0)

V    206
C    127
?      0
A    738
D     71
B     36
dtype: int64

In [None]:
# creating column to group together A,B,D,E call types
labels_no_uncertain_22["RASI_main"] = labels_no_uncertain_22[["A", "B", "D", "V"]].max(
    axis=1
)
labels_no_uncertain_22["RASI_C"] = labels_no_uncertain_22[["C"]].max(axis=1)
labels_no_uncertain_22[["RASI_C", "RASI_main"]].to_csv(
    "../annotations/rasi_2022_2s_labels.csv"
)