# Acoustic signature challenge

This notebook introduces concepts which may be useful for participation in Helsing's acoustic signature challenge.

It aims to provide practical information on how to
- visualise and process acoustic data;
- datasets for machine learning;
- train a simple classifier to detect drones and helicopters in audio data.

### Table of contents
* [Processing audio signals](#processing-audio-signals)
    * [Visualing audio signals](#visualising-audio-signals)
    * [Extracting audio features](#extracting-audio-features)
* [Baseline classification](#baseline-classification)
    * [Creating datasets](#creating-datasets)
    * [Training a basic classifier](#training-a-simple-classifier)
    * [Evaluating on the test set](#evaluating-on-the-validation-set)

## Processing audio signals
### Visualising audio signals

Audio waveforms are time series and as such they are fully described by the combination of raw data and the rate at which they were sampled. To ensure these quantities are matched in our analysis, we provide in `base.py` a custom `AudioWaveform` dataclass to hold audio waveforms:

```python
@dataclass(frozen=True)
class AudioWaveform:
    data: Tensor
    sample_rate: float

    @classmethod
    def load(cls, path: Path) -> AudioWaveform:
        data, samplerate = sf.read(path)
        return AudioWaveform(torch.as_tensor(data, dtype=torch.float32), samplerate)

    @property
    def duration(self) -> float:
        return self.data.shape[-1] / self.sample_rate
```

With this representation in mind, we can start looking at examples waveforms from our dataset.

In [None]:
import matplotlib.pyplot as plt

from hs_hackathon_drone_acoustics import EXAMPLES_DIR
from hs_hackathon_drone_acoustics.base import AudioWaveform  # Custom waveform representation.
from hs_hackathon_drone_acoustics.plot import plot_waveform  # Custom function used to plot waveforms.

example_files = list(EXAMPLES_DIR.glob("*.wav"))
fig, axes = plt.subplots(1, len(example_files), figsize=(16, 3))
for ifile, example_file in enumerate(example_files):
    waveform = AudioWaveform.load(example_file)
    plot_waveform(waveform, axis=axes[ifile])
    axes[ifile].set_title(example_file.name)

From these examples, we can already see differences between the different classes: the helicopter and drones classes appear to have white noise-like contents throughout, while the background has periods of silence.


To verify this observation, it is useful to plot the variations in a waveform's frequency components over time, also known as a **spectrogram**. Spectrograms are obtained by taking the magnitude of a [Short-Time Fourier Transform (STFT)](https://en.wikipedia.org/wiki/Short-time_Fourier_transform). Code to plot a spectrogram from a wavform is provided in the `hs_hackathon_drone_acoustics.plot.plot_spectrogram` function.

In [None]:
from hs_hackathon_drone_acoustics.plot import plot_spectrogram  # Custom function to plot spectrograms.

fig, axes = plt.subplots(1, len(example_files), figsize=(16, 3))
for ifile, example_file in enumerate(example_files):
    waveform = AudioWaveform.load(example_file)
    plot_spectrogram(waveform, axis=axes[ifile])
    axes[ifile].set_title(example_file.name)
    axes[ifile].set_ylim([0, 4])

### Extracting audio features

Clearly, the various classes have different waveform characteristics. It may be useful for classification to extract waveform features which can capture and summarise differentiating characteristics for each class. For example, we may want to extract the waveform's energy.

In [None]:
import numpy as np


def extract_waveform_energy(waveform: AudioWaveform) -> float:
    energy = np.linalg.vector_norm(waveform.data, ord=2)
    return energy


for example_file in example_files:
    waveform = AudioWaveform.load(example_file)
    energy = extract_waveform_energy(waveform)
    print(f"Energy in {example_file.name}: {energy}.")

Another very common set of features to extract in audio processing are [Mel-frequency cepstral coefficients (MFCCs)](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum). They capture frequency information in the signal in a more condensed manner than the spectrogram above.

In [None]:
import librosa
from numpy.typing import NDArray


def extract_mean_mfccs(waveform: AudioWaveform, n_mfcc: int = 20) -> NDArray[np.float64]:
    mfccs = librosa.feature.mfcc(y=waveform.data.numpy(), sr=waveform.sample_rate, n_mfcc=n_mfcc)
    mean_mfccs = np.mean(mfccs, axis=1)
    return mean_mfccs


for example_file in example_files:
    waveform = AudioWaveform.load(example_file)
    mfccs = extract_mean_mfccs(waveform)
    print(f"Mean MFCC[0:5] for {example_file.name}: {mfccs[0:5]}.")

These features can be used to classify audio waveforms: e.g. the energy in the background class is lower than in the helicopter class, which itself is lower than in the drone class. This is the case for the selected examples but may not be true when considering the broader dataset. In the next section, we will show how features can be extracted from entire datasets and used to train a simple classifier.

## Baseline classification

### Creating datasets

To manage your dataset structure, we provide in `base.py` a custom `AudioDataset` class which loads waveforms from files and associates them with the correct label. Make sure you have followed the data download guide in Phase 1: Model Training in the README!

In [None]:
import random

from hs_hackathon_drone_acoustics import CLASSES, RAW_DATA_DIR
from hs_hackathon_drone_acoustics.base import AudioDataset

TRAIN_PATH = RAW_DATA_DIR / "train"
VAL_PATH = RAW_DATA_DIR / "val"

train_dataset = AudioDataset(root_dir=TRAIN_PATH)
val_dataset = AudioDataset(root_dir=VAL_PATH)

# Let's plot some examples from the training dataset
fig, axes = plt.subplots(1, 3, figsize=(16, 3))
for ax in axes:
    idx = random.randint(0, len(train_dataset))
    waveform, label = train_dataset[idx]
    plot_spectrogram(waveform, ax)
    ax.set_title(f"Train item {idx}, class = {CLASSES[label]}")

### Training a simple classifier

In this section we use `sklearn` to setup a classifier using [Stochastic gradient descent (SGD)](https://en.wikipedia.org/wiki/Stochastic_gradient_descent). The classifier learns to separate training samples from our three classes (background, drone, and helicopter) using the features explored above.

To simplify the data processing pipeline, we provide `FeatureExtractors` and an `sklearn` feature extractor pipeline in `feature_extractors.py`.

In [None]:
import warnings

from sklearn.linear_model import SGDClassifier
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

from hs_hackathon_drone_acoustics.feature_extractors import EnergyFeatureExtractor, FeatureExtractorPipeline

warnings.filterwarnings("ignore")

model = make_pipeline(
    FeatureExtractorPipeline(EnergyFeatureExtractor()),  # Extract features
    StandardScaler(),  # Apply normalisation
    SGDClassifier(max_iter=1000, tol=1e-3, random_state=0),
)

train_waveforms, train_labels = train_dataset[:]
print("Fitting model on the training set ...")
model.fit(train_waveforms, train_labels)
print("Done!")

### Evaluating on the validation set

We know want to evaluate the performance of our classifier. We provide an evaluation script in `metrics.py` which computes
- the cross-entropy loss between ground-truth labels and model prediction;
- the model accuracy;
- the model confusion matrix.

We should calculate these metrics for both the training set and a set of unseen data, the validation set. Performance on the training set indicates how much the model has learned from available data, whereas performance on the validation test gives information on how well the model generalises to unseen samples.

In [None]:
import torch
from sklearn.pipeline import Pipeline
from torch import Tensor

from hs_hackathon_drone_acoustics.metrics import evaluate, get_confusion_matrix_str


def predict_and_evaluate(model: Pipeline, dataset: AudioDataset) -> tuple[float, float, Tensor]:
    waveforms, targets = dataset[:]
    all_preds = model.predict(waveforms)
    # Convert predictions to probabilities so we can compute a loss.
    all_probas = torch.zeros(len(all_preds), len(CLASSES))
    for i, pred in enumerate(all_preds):
        all_probas[i, pred] = 1
    return evaluate(all_probas, torch.as_tensor(targets))


train_loss, train_accuracy, train_confusion_matrix = predict_and_evaluate(model, train_dataset)
val_loss, val_accuracy, val_confusion_matrix = predict_and_evaluate(model, val_dataset)

print(f"Training loss: {train_loss:.3f}")
print(f"Validation loss: {val_loss:.3f}")
print(f"Training accuracy: {train_accuracy:.3f}")
print(f"Validation accuracy: {val_accuracy:.3f}")
print(f"Training confusion matrix: \n{get_confusion_matrix_str(train_confusion_matrix)}")
print(f"Validation confusion matrix: \n{get_confusion_matrix_str(val_confusion_matrix)}")