# Examples of using some of Brandon's signal processing algorithms

This notebook contains examples showing how to use some of Brandon's algorithms.  This includes:
- **Signal strength feature computation** -- These are like a spectrogram, but drastically reduce the effects of background noise and differences microphone gain / sensitivity.  (Note: The name "signal strength features" is just a name Brandon made up to refer to them.)
- **Auto-segmentation** -- This tries to locate the beginning and ending of all the sounds in a recording that stick out relative to the background noise.
- **Pitch tracking** -- This tries to track a dominant pitch across the length of a segment.  Primarily intended for use with chirp sounds.
- Utilities and methods to use the above to extract features for segments, reduce their dimensionality, cluster sound types, etc.

Note that while it may appear that there is a lot of code in this notebook, most of it is plotting code.  Generally, the first few lines of each code cell contain all the relevant code, and the remainder of the code in the cell (below a comment saying `Visualization code below`) is just to create and format plots to visualize the results.

# If running in SageMaker, set up Python environment / install dependencies

In [None]:
import os
import sys
import importlib.util

def is_package_installed(package_name):
    if package_name in sys.modules:
        return True
    else:
        return importlib.util.find_spec(package_name) is not None

running_on_aws = True if os.environ.get("AWS_DEFAULT_REGION") else False
if not running_on_aws:
    print("Not running on AWS -- Assume that the Python environment is already set up.")
else:
    print("Running on AWS -- Making sure needed packages are installed...")
    if is_package_installed("soundfile"):
        print("  Soundfile already installed.")
    else:
        print("\n\nInstalling soundfile...")
        #%pip install pysoundfile
        %conda install -c conda-forge pysoundfile
    if is_package_installed("librosa"):
        print("  Librosa already installed.")
    else:
        print("\n\nInstalling librosa (this may take a while)...")
        #%pip install librosa
        %conda install -c conda-forge librosa
    if is_package_installed("audiot"):
        print("  AudioT package already installed.")
    else:
        print("\n\nInstalling AudioT package in development / editable mode...")
        # Use the relative path to the folder containing setup.py, which is the parent folder ../ in this case
        %pip install -e ../

### <font color='red'>**If any new packages were installed above, restart the kernel before running the remainder of this notebook**</font>

# Imports / Setup

In [None]:
import numpy as np
import numpy.random
import scipy as sp
from pathlib import Path
import matplotlib as mpl
import matplotlib.pyplot as plt
from sklearn.cluster import MiniBatchKMeans

from audiot.audio_signal import AudioSignal
from audiot.spectral_features import resize_matrix_by_averaging
from audiot.signal_processing.functions import calc_signal_strength_features, compute_pitch_upsweep_downsweep, compute_chirp_features_for_segments
from audiot.signal_processing.auto_segmenter import AutoSegmenter
from audiot.signal_processing.pitch_tracker import PitchTracker

# Show matplotlib plots inline in Jupyter notebooks without having to call show()
%matplotlib inline

# Import for audio player control
import IPython.display as ipd

# Parameters

Set parameters for the size of the figures (adjust this if needed for your monitor / resolution), the segment of the signal we'll be zooming in on, and for spectrogram computation.

In [None]:
# Adjust the figure size as desired so that the plots fit your monitor / screen resolution
figure_size = (30, 4)
small_figure_size = (figure_size[0]/3, figure_size[1])

# Construct the path to the test file
project_folder = Path("").absolute().parents[0]
file_path = project_folder / "test_data" / "TRF0_mic14_2020-12-17_01.20.00.flac"

# Define the start and end time for the segment we'll be zooming in on
segment_of_interest_start_time = 5
segment_of_interest_end_time = 10
fig_xlim = (segment_of_interest_start_time, segment_of_interest_end_time)

# Parameters for short time fourier transforms (STFTs) -- for spectrogram plots
fft_n_samples_per_window = 256
fft_n_overlap = fft_n_samples_per_window / 2
fft_nfft = 2 ** np.ceil(np.log2(fft_n_samples_per_window))

# Signal strength features

These features represent an estimate of how much energy in each cell of the spectrogram is from a foreground sound versus from a background sound.  The values will always fall in the range [0.0, 1.0], where a value of 0.0 indicates all background noise and a value of 1.0 indicates all foreground noise with zero background noise.  The estimate is weighted towards considering most things as background noise, so only sounds that stick out significantly will show up while most everything else gets zeroed out.  A value of 0.3 would indicate that 30% of the energy in that spot in the spectrogram was estimated to be from a foreground sound, with the remaining 70% of the energy coming from background noise.

Note that the features are returned as a `SpectralFeatures` object instead of as an `AudioFeatures` object.  These are similar classes that I'll probably want to merge (or inherit one from the other), but I didn't want to remove or change the `AudioFeatures` class in the middle of a practicum cohort because it might break people's code.  The `SpectralFeatures` objects can be converted to `AudioFeatures` via the `SpectralFeatures.as_audio_features()` method.

Below, we compute and plot both a spectrogram of the signal and the signal strength features for comparison.  An audio control is also embedded in the output that you can use to listen to the marked segment of the recording.

In [None]:
# Read in the audio file
audio_signal = AudioSignal.from_file(file_path)
# Compute the signal strength features
signal_strength_features = calc_signal_strength_features(audio_signal)

# --- Visualization code below ---------------------------------------------------------------------

# Compute a spectrogram for the audio recording (for visual comparison)
(spectrogram_frequencies, spectrogram_times, spectrogram_magnitude) = sp.signal.spectrogram(
    audio_signal.signal[:, 0],
    fs=audio_signal.sample_rate,
    nperseg=fft_n_samples_per_window,
    noverlap=fft_n_overlap,
    nfft=fft_nfft,
)
spectrogram = np.log(spectrogram_magnitude)

# Plot the spectrogram
fig = plt.figure(figsize=figure_size)
axes = fig.add_subplot(1, 1, 1)
axes.imshow(
    spectrogram,
    aspect="auto",
    origin="lower",
    extent=[
        spectrogram_times[0],
        spectrogram_times[-1],
        spectrogram_frequencies[0],
        spectrogram_frequencies[-1],
    ],
)
# Mark the segment we'll be zooming in on with vertical, red lines
axes.vlines([segment_of_interest_start_time, segment_of_interest_end_time], 0, 8000, color="r")
axes.set_xlabel("Time (seconds)")
axes.set_ylabel("Frequency (Hz)")
axes.set_title("Spectrogram")

# Plot the signal strength features
fig = plt.figure(figsize=figure_size)
axes = fig.add_subplot(1, 1, 1)
axes.imshow(
    signal_strength_features.features,
    aspect="auto",
    origin="lower",
    extent=[
        signal_strength_features.time_axis[0],
        signal_strength_features.time_axis[-1],
        signal_strength_features.frequency_axis[0],
        signal_strength_features.frequency_axis[-1],
    ],
)
# Mark the segment we'll be zooming in on with vertical, red lines
axes.vlines([segment_of_interest_start_time, segment_of_interest_end_time], 0, 8000, color="r")
axes.set_xlabel("Time (seconds)")
axes.set_ylabel("Frequency (Hz)")
axes.set_title("Signal strength features")

# Add a control to listen to the segment we zoomed in on
print("Play the marked segment:")
ipd.Audio(
    audio_signal.signal[
        int(segment_of_interest_start_time * audio_signal.sample_rate) : int(
            segment_of_interest_end_time * audio_signal.sample_rate
        ),
        -1,
    ],
    rate=int(audio_signal.sample_rate),
)

### Zoom in for better view

Here, we make the same plots as above, but adjust the x-axis limits to zoom in on the marked segment so it's easier to see detail.

In [None]:
# --- Visualization code below ---------------------------------------------------------------------

# Plot the spectrogram zoomed in on the segment
fig = plt.figure(figsize=figure_size)
axes = fig.add_subplot(1, 1, 1)
axes.imshow(
    spectrogram,
    aspect="auto",
    origin="lower",
    interpolation="none",
    extent=[
        spectrogram_times[0],
        spectrogram_times[-1],
        spectrogram_frequencies[0],
        spectrogram_frequencies[-1],
    ],
)
axes.set_xlabel("Time (seconds)")
axes.set_ylabel("Frequency (Hz)")
axes.set_xlim(fig_xlim)

# Plot the signal strength features zoomed in on the segment
fig = plt.figure(figsize=figure_size)
axes = fig.add_subplot(1, 1, 1)
axes.imshow(
    signal_strength_features.features,
    aspect="auto",
    origin="lower",
    interpolation="none",
    extent=[
        signal_strength_features.time_axis[0],
        signal_strength_features.time_axis[-1],
        signal_strength_features.frequency_axis[0],
        signal_strength_features.frequency_axis[-1],
    ],
)
axes.set_xlabel("Time (seconds)")
axes.set_ylabel("Frequency (Hz)")
axes.set_xlim(fig_xlim)

# Auto-segmentation

## Segmenting an AudioSignal directly

If you just want to segment a signal and don't need the signal strength features as well, you can pass the audio signal directly to the AutoSegmenter and get the segments out in units of seconds.

The segments returned are a list of 3-tuples, where each tuple specifies that segment's start time, end time, and average signal strength:  `[(seg0_start_seconds, seg0_end_seconds, seg0_strength), (seg1_start_seconds, seg1_end_seconds, seg1_strength), ... ]`.

The code below segments the signal and then plots the spectrogram, marking the beginning and end of each detected segment with purple and red lines (respectively).

In [None]:
segmenter = AutoSegmenter.get_default_segmenter()
segment_time_list = segmenter.segment_signal(audio_signal)
segment_start_times = [seg[0] for seg in segment_time_list]
segment_end_times = [seg[1] for seg in segment_time_list]

print(f"Number of segments detected = {len(segment_time_list)}")

# --- Visualization code below ---------------------------------------------------------------------

# Plot the spectrogram of the file
fig = plt.figure(figsize=figure_size)
axes = fig.add_subplot(1, 1, 1)
axes.imshow(
    spectrogram,
    aspect="auto",
    origin="lower",
    interpolation="none",
    extent=[
        spectrogram_times[0],
        spectrogram_times[-1],
        spectrogram_frequencies[0],
        spectrogram_frequencies[-1],
    ],
)
# Mark the segment start times with purple lines
axes.vlines(segment_start_times, 0, 8000, color="m")
# Mark the segment end times with red lines
axes.vlines(segment_end_times, 0, 8000, color="r")
axes.set_xlim([0, audio_signal.duration])
axes.set_xlabel("Time (seconds)")
axes.set_ylabel("Frequency (Hz)")
axes.set_title("Spectrogram with detected segments marked")

# Plot the spectrogram of the file
fig = plt.figure(figsize=figure_size)
axes = fig.add_subplot(1, 1, 1)
axes.imshow(
    spectrogram,
    aspect="auto",
    origin="lower",
    interpolation="none",
    extent=[
        spectrogram_times[0],
        spectrogram_times[-1],
        spectrogram_frequencies[0],
        spectrogram_frequencies[-1],
    ],
)
# Mark the segment start times with purple lines
axes.vlines(segment_start_times, 0, 8000, color="m")
# Mark the segment end times with red lines
axes.vlines(segment_end_times, 0, 8000, color="r")
axes.set_xlim(fig_xlim)
axes.set_xlabel("Time (seconds)")
axes.set_ylabel("Frequency (Hz)")
axes.set_title("Spectrogram with detected segments marked")

## Segmenting signal strength features

If you do want to use the signal strength features in your processing, it will probably be more convenient to run the segmentation on the signal strength features themselves.  In this case, the start and end of each segment will be returned as indexes along the x-axis of the features array instead of being returned as time values.  This will make it easier to slice out sections of the features corresponding to detected segments since the indexes can be used directly to do the slicing.  Note that the detected segments should be equivalent to the results from running on the signal directly.

So in this case, the returned segments will be a list of 3-tuples in the form:   `[(seg0_start_index, seg0_end_index, seg0_strength), (seg1_start_index, seg1_end_index, seg1_strength), ... ]`.

The code below demonstrates running segmentation directly on the features, and then plots the results using features sample indexes along the x-axis instead of time.  Then it demonstrates how to slice out the features for one of the segments (randomly selected each time the cell is run) and plots that as well.

In [None]:
# Run segmentation on the signal strength features
segment_index_list = segmenter.segment_signal_strength_features(signal_strength_features)
segment_start_indexes = [s[0] for s in segment_index_list]
segment_end_indexes = [s[1] for s in segment_index_list]

print(f"Number of segments detected = {len(segment_index_list)}")

# Select a segment to look at
segment_index = np.random.randint(len(segment_index_list))   # Randomly pick the segment (most are chirps)
#segment_index = 5    # A chirp
#segment_index = 92   # Mechanical noise with a chirp at the end
#segment_index = 94   # Mechanical noise

# Slice out the features for the selected segment
segment = segment_index_list[segment_index]
segment_features = signal_strength_features.features[:, segment[0]:segment[1]]

# --- Visualization code below ---------------------------------------------------------------------

# Plot the results, using sample indexes on the x axis instead of time
fig = plt.figure(figsize=figure_size)
axes = fig.add_subplot(1, 1, 1)
axes.imshow(
    signal_strength_features.features,
    aspect="auto",
    origin="lower",
    interpolation="none",
    extent=[
        0,
        signal_strength_features.n_samples,
        signal_strength_features.frequency_axis[0],
        signal_strength_features.frequency_axis[-1],
    ],
)
axes.vlines(segment_start_indexes, 0, 8000, color="m")
axes.vlines(segment_end_indexes, 0, 8000, color="r")
# Determine the indexes to zoom in on to get the same zoom as above
fig_xlim_idx = (sum(signal_strength_features.time_axis < fig_xlim[0]), sum(signal_strength_features.time_axis <= fig_xlim[1]))
axes.set_xlabel("Sample index")
axes.set_ylabel("Frequency (Hz)")
axes.set_title("Signal strength features with detected segments marked")

# Plot the results, using sample indexes on the x axis instead of time
fig = plt.figure(figsize=figure_size)
axes = fig.add_subplot(1, 1, 1)
axes.imshow(
    signal_strength_features.features,
    aspect="auto",
    origin="lower",
    interpolation="none",
    extent=[
        0,
        signal_strength_features.n_samples,
        signal_strength_features.frequency_axis[0],
        signal_strength_features.frequency_axis[-1],
    ],
)
axes.vlines(segment_start_indexes, 0, 8000, color="m")
axes.vlines(segment_end_indexes, 0, 8000, color="r")
# Determine the indexes to zoom in on to get the same zoom as above
fig_xlim_idx = (sum(signal_strength_features.time_axis < fig_xlim[0]), sum(signal_strength_features.time_axis <= fig_xlim[1]))
axes.set_xlim(fig_xlim_idx)
axes.set_xlabel("Sample index")
axes.set_ylabel("Frequency (Hz)")
axes.set_title("Signal strength features with detected segments marked")

# Plot the features for the randomly selected segment
fig = plt.figure(figsize=small_figure_size)
axes = fig.add_subplot(1, 1, 1)
axes.imshow(
    segment_features,
    aspect="auto",
    origin="lower",
    interpolation="none",
    extent=[
        0,
        segment_features.shape[1],
        signal_strength_features.frequency_axis[0],
        signal_strength_features.frequency_axis[-1],
    ],
)
axes.set_xlabel("Sample index")
axes.set_ylabel("Frequency (Hz)")
axes.set_title(f"Features for segment {segment_index}")


### Converting segments from sample indexes to time values

If you need starting and ending times (in seconds) for the segments in addition to the starting and ending sample indexes, the `AutoSegmenter` class also has a `convert_segments_from_indexes_to_seconds()` function that can be used to do the conversion.  The use of this function is demonstrated below.  Note that the x-axis in the resulting plot is now in units of time (seconds) instead of sample index values.

In [None]:
segment_time_list = segmenter.convert_segments_from_indexes_to_seconds(segment_index_list, signal_strength_features)
segment_start_times = [s[0] for s in segment_time_list]
segment_end_times = [s[1] for s in segment_time_list]

# --- Visualization code below ---------------------------------------------------------------------

# Plot the features and mark the segments, using units of time on the x-axis
fig = plt.figure(figsize=figure_size)
axes = fig.add_subplot(1, 1, 1)
axes.imshow(
    signal_strength_features.features,
    aspect="auto",
    origin="lower",
    interpolation="none",
    extent=[
        signal_strength_features.time_axis[0],
        signal_strength_features.time_axis[-1],
        signal_strength_features.frequency_axis[0],
        signal_strength_features.frequency_axis[-1],
    ],
)
# Mark the start and end of each segment (using units of time)
axes.vlines(segment_start_times, 0, 8000, color="m")
axes.vlines(segment_end_times, 0, 8000, color="r")
# Zoom in the same as for previous plots
axes.set_xlim(fig_xlim)
axes.set_xlabel("Time (seconds)")
axes.set_ylabel("Frequency (Hz)")
axes.set_title("Spectrogram")


# Pitch tracking

The `PitchTracker` class contains a `track_pitch_across_segment()` method for tracking a strong frequency peak (e.g. a chirp sound) across a segment.  Note that if there is not any strong frequency peak present in the segment, it will still generate a track going across the entire segment, trying to fit it to any peaks it finds as best as it can.

The `track_pitch_across_segment()` method returns a tuple containing two lists, `(pitch_indexes, peak_strengths)`.  The `pitch_indexes` list contains the index of the detected pitches for each time window in the segment, and the `peak_strengths` list contains numbers representing the relative strength of the detected frequency peak for each time window.  It will generally be larger if there is more energy at the detected pitch, and less energy in the frequency bands a little ways above and below the detected pitch.

The code below demonstrates running the pitch tracking on the extracted segment features, and visualizes the result.

In [None]:
# Run pitch tracking on features we sliced out for the segment we randomly selected above
pitch_tracker = PitchTracker()
pitch_indexes, peak_strength = pitch_tracker.track_pitch_across_segment(segment_features)
# Convert the pitch indexes to the corresponding frequency values (in Hz)
pitch_frequencies = pitch_tracker.pitch_indexes_to_frequencies(pitch_indexes, signal_strength_features.frequency_axis)
print(f"pitch_indexes = {pitch_indexes}")
print(f"pitch_frequencies = {pitch_frequencies}")

# --- Visualization code below ---------------------------------------------------------------------

# Plot the features with the pitch track drawn on top
fig = plt.figure(figsize=small_figure_size)
axes = fig.add_subplot(1, 1, 1)
axes.imshow(
    segment_features,
    aspect="auto",
    origin="lower",
    interpolation="none",
    extent=[
        0,
        segment_features.shape[1],
        signal_strength_features.frequency_axis[0],
        signal_strength_features.frequency_axis[-1],
    ],
)
# Draw a skinny line showing the tracked pitch (add 0.5 to center each point horizontally on the features heatmap)
axes.plot(np.arange(len(pitch_frequencies)) + 0.5, pitch_frequencies, color="r", linewidth=1)
# Draw a very fat, mostly transparent line to highlight the pitch track / make it more visible
axes.plot(np.arange(len(pitch_frequencies)) + 0.5, pitch_frequencies, color="r", linewidth=40, alpha=0.1)
axes.set_xlabel("Time window index")
axes.set_ylabel("Frequency (Hz)")
axes.set_title(f"Pitch tracked across segment {segment_index}")

# Plot the peak_strength
fig = plt.figure(figsize=small_figure_size)
axes = fig.add_subplot(1,1,1)
# Add 0.5 to the x-coordinates here to to line it up with the above plot better.
axes.plot(np.arange(len(pitch_frequencies)) + 0.5, peak_strength)
axes.set_xlim((0, len(pitch_frequencies)))
axes.set_xlabel("Time window index")
axes.set_ylabel("Peak strength")
axes.set_title("Peak strength")

### Compute total upsweep and downsweep of the pitch track

I think that the total amounts that the pitch track rises and falls in frequency (Hz) could be useful features in characterizing bird chirp sounds.  I've arbitrarily decided to call these values "upsweep" and "downsweep", and have provided a simple utility function that can compute them from the list of tracked pitches.  Other values like the minimum pitch, maximum pitch, and median pitch might also be useful features, but can easily be extracted using built-in numpy functions.

In [None]:
total_upsweep, total_downsweep = compute_pitch_upsweep_downsweep(pitch_frequencies)
print(f"total_upsweep   = {total_upsweep:8.2f} Hz  (total amount the pitch rose over the segment)")
print(f"total_downsweep = {total_downsweep:8.2f} Hz  (total amount the pitch fell over the segment)")

# Segment-level features and dimensionality reduction

## Dimensionality reduction along the frequency axis

The signal strength features shown above tend to be very high dimensional (in the same way that spectrograms and/or images are high dimensional), and thus might not be suitable for some types of ML algorithms.  For log mel energy features, the dimensionality along the frequency axis is reduced by applying a triangular mel filter bank to the spectrogram values (a matrix multiplication operation).  Since these signal strength features match the same format of a spectrogram, mel filter banks can be applied to them in the same way to reduce the dimensionality of the frequency axis.

The `SpectralFeatures` class contains an `apply_mel_filter_bank()` method that can be used to do this, as shown below:


In [None]:
n_mels = 6 # Desired dimensionality along the frequency axis
min_frequency = 1000  # Start of frequency range covered by the filter bank
max_frequency = 6000  # End of frequency range covered by the filter bank
mel_signal_strength_features = signal_strength_features.apply_mel_filter_bank(n_mels, min_frequency, max_frequency)

# --- Visualization code below ---------------------------------------------------------------------

# Plot the features and mark the segments, using units of time on the x-axis
fig = plt.figure(figsize=figure_size)
axes = fig.add_subplot(1, 1, 1)
axes.imshow(
    mel_signal_strength_features.features,
    aspect="auto",
    origin="lower",
    interpolation="none",
    extent=[
        mel_signal_strength_features.time_axis[0],
        mel_signal_strength_features.time_axis[-1],
        0,
        n_mels,
    ],
)
# Manually set up the y-axis labels since the frequency spacing is non-linear (add 0.5 to center the labels vertically on each row)
axes.set_yticks(ticks=np.arange(n_mels) + 0.5)
axes.set_yticklabels([f"{freq:.0f}" for freq in mel_signal_strength_features.frequency_axis])
# Zoom in the same as for previous plots
axes.set_xlim(fig_xlim)
axes.set_xlabel("Time (seconds)")
axes.set_ylabel("Center Frequency (Hz)")
axes.set_title("Mel signal strength features")

Another approach could be to take various summary statistics (mean, variance, support, median, max, etc) over the columns of the signal strength features, thus reducing the dimensionality along that axis to the number of statistics taken.  Since those summary statistics are usually dependant on the distribution of values in the column but are invariant to the actual vertical locations of those values, this approach may particularly be attractive for sound types that may shift up or down in frequency.

In [None]:
# Compute some summary statistics along the columns
column_statistics = np.vstack((
    np.mean(signal_strength_features.features, axis=0),
    np.var(signal_strength_features.features, axis=0),
    np.max(signal_strength_features.features, axis=0),
    np.median(signal_strength_features.features, axis=0),
))

# --- Visualization code below ---------------------------------------------------------------------

# Plot the column statistics
fig = plt.figure(figsize=figure_size)
axes = fig.add_subplot(1, 1, 1)
axes.imshow(
    column_statistics ** (1/3),
    aspect="auto",
    origin="lower",
    interpolation="none",
    extent=[
        mel_signal_strength_features.time_axis[0],
        mel_signal_strength_features.time_axis[-1],
        0,
        column_statistics.shape[0],
    ],
)
# Manually set up the y-axis labels to label each statistic type (add 0.5 to center the labels vertically on each row)
axes.set_yticks(ticks=np.arange(column_statistics.shape[0]) + 0.5)
axes.set_yticklabels(["mean", "variance", "max", "median"])
# Zoom in the same as for previous plots
axes.set_xlim(fig_xlim)
axes.set_xlabel("Time (seconds)")
axes.set_ylabel("Column statistics")
axes.set_title("Column statistic features")

## Segment-level features / dimensonality reduction along the time axis

Assuming the auto-segmentation works well, a good approach would be to use it as the first step in processing, and then try to classify and/or analyze the types of sound present in each detected segment.  The above code demonstrates how to slice out signal strength features for specific segments.  However, one problem with this is that the sliced out features will have different numbers of dimensions along the time axis depending on how long each detected segment is.  There will be more features (more columns of data) for long segments, and fewer features for short segments.  But most ML algorithms / models require the dimensionality of the input features to be the same for all samples.

One approach for dealing with this is to use summary statistics to squash the variable-length time axis into a known number of dimensions.  For instance, you could split each segment up into thirds and then compute the mean values of various features over each third, yielding a consistent number of features regardless of how long the segment is.  The `resize_features_by_averaging()` function from `audiot.spectral_features.py` can be used to do this.  The code below demonstrates this functionality using the features previously sliced out for a single segment:

In [None]:
# Break the segment into desired_n_columns time ranges and average the signal strength features over each chunk
desired_n_columns = 3
resized_segment_features = resize_matrix_by_averaging(segment_features, (segment_features.shape[0], desired_n_columns))

# Compute the segment duration in seconds by subtracting the start time from the end time
segment_start_time = segment_time_list[segment_index][0]
segment_end_time = segment_time_list[segment_index][1]

# --- Visualization code below ---------------------------------------------------------------------

# Plot the original features for the segment for comparison
fig = plt.figure(figsize=small_figure_size)
axes = fig.add_subplot(1, 1, 1)
axes.imshow(
    segment_features,
    aspect="auto",
    origin="lower",
    interpolation="none",
    extent=[
        segment_start_time,
        segment_end_time,
        signal_strength_features.frequency_axis[0],
        signal_strength_features.frequency_axis[-1],
    ],
)
axes.set_xlabel("Time (seconds)")
axes.set_ylabel("Frequency (Hz)")
axes.set_title(f"Original signal strength features for segment {segment_index}")

# Plot the original features for the segment for comparison
fig = plt.figure(figsize=small_figure_size)
axes = fig.add_subplot(1, 1, 1)
axes.imshow(
    resized_segment_features,
    aspect="auto",
    origin="lower",
    interpolation="none",
    extent=[
        segment_start_time,
        segment_end_time,
        signal_strength_features.frequency_axis[0],
        signal_strength_features.frequency_axis[-1],
    ],
)
axes.set_xlabel("Time (seconds)")
axes.set_ylabel("Frequency (Hz)")
axes.set_title(f"Signal strength features for segment {segment_index}, resized to have {desired_n_columns} columns")


The `resize_features_by_averaging()` function from `audiot.spectral_features.py` can also resize along the vertical axis.  This is like using a rectangular filter bank with evenly spaced, non-overlapping rectangular filters (as opposed to a mel filter bank, which uses mel-scale spaced, overlapping triangular filters).  The code below demonstrates this:

In [None]:
# Break the segment into desired_n_rows frequency ranges and desired_n_columns time ranges, averaging the signal strength features over each chunk
desired_n_rows = 16
resized_segment_features_2 = resize_matrix_by_averaging(segment_features, (desired_n_rows, desired_n_columns))

# --- Visualization code below ---------------------------------------------------------------------

# Plot the original features for the segment for comparison
fig = plt.figure(figsize=small_figure_size)
axes = fig.add_subplot(1, 1, 1)
axes.imshow(
    resized_segment_features_2,
    aspect="auto",
    origin="lower",
    interpolation="none",
    extent=[
        segment_start_time,
        segment_end_time,
        signal_strength_features.frequency_axis[0],
        signal_strength_features.frequency_axis[-1],
    ],
)
axes.set_xlabel("Time (seconds)")
axes.set_ylabel("Frequency (Hz)")
axes.set_title(f"Signal strength features for segment {segment_index}, resized to have {desired_n_rows} rows and {desired_n_columns} columns")

## Extracting features for all segments in a file

The `SpectralFeatures` class contains an `extract_grid_features_for_segments()` function that can be used to extract the slices of features for all the segments present in the recording and resize them all to have the same dimensionality (same as above).  This function strings out (`ravel()`) the grid of features for each segment into a column vector, and then stacks the colum vectors side by side to return a matrix containing all the features for all the segments.  The ordering of the `ravel()` operation is such that the feature index cycles through the time windows first, and the frequency windows second.  So with `desired_n_columns=3`, feature indexes 0-2 would be the features for the first, second, and third time windows of the lowest frequency band.  Indexes 3-5 would be the three time windows for the second frequency band, and so forth.

In [None]:
desired_n_rows = 6
desired_n_columns = 3

# Average each segment's features into a desired_n_rows x desired_n_columns grid and return them as a matrix:
features_for_all_segments = signal_strength_features.extract_grid_features_for_segments(segment_index_list, n_freq_divisions=desired_n_rows, n_time_divisions=desired_n_columns)
print(f"features_for_all_segments.shape = {features_for_all_segments.shape}")

# --- Visualization code below ---------------------------------------------------------------------

# Plot the features for each segment
fig = plt.figure(figsize=small_figure_size)
axes = fig.add_subplot(1, 1, 1)
axes.imshow(
    features_for_all_segments,
    aspect="auto",
    origin="lower",
    interpolation="none",
    extent=[
        0,
        features_for_all_segments.shape[1],
        0,
        features_for_all_segments.shape[0],
    ],
)
axes.set_xlabel("Segment index")
axes.set_ylabel("Feature index")
axes.set_title(f"Features extracted for each of the segments in the entire recording")

If you prefer to use a mel filter bank to reduce dimensionality along the frequency axis, then the `SpectralFeatures.apply_mel_filter_bank()` function can be applied first.  Then you can pass the same number of rows that those features already have (equal to the `n_mels` parameter used in the mel filter bank) as the `n_freq_divsions` parameter so that no additional resizing will be done along the frequency axis.  This is demonstrated below, using the previously computed `mel_signal_strength_features` (which already had `apply_mel_filter_bank()` run on):

In [None]:
# Average each segment's features into a desired_n_rows x desired_n_columns grid and return them as a matrix:
mel_features_for_all_segments = mel_signal_strength_features.extract_grid_features_for_segments(segment_index_list, n_freq_divisions=n_mels, n_time_divisions=desired_n_columns)
print(f"mel_features_for_all_segments.shape = {mel_features_for_all_segments.shape}")

# --- Visualization code below ---------------------------------------------------------------------

# Plot the features for each segment
fig = plt.figure(figsize=small_figure_size)
axes = fig.add_subplot(1, 1, 1)
axes.imshow(
    mel_features_for_all_segments,
    aspect="auto",
    origin="lower",
    interpolation="none",
    extent=[
        0,
        mel_features_for_all_segments.shape[1],
        0,
        mel_features_for_all_segments.shape[0],
    ],
)
axes.set_xlabel("Segment index")
axes.set_ylabel("Feature index")
axes.set_title(f"Features extracted for each of the segments in the entire recording (using a mel filter bank)")

## Computing the chirp features Brandon provided to his teams for chirp detection

The features Brandon has provided to his teams (in the form of CSV files) to train chirp detection models on were computed by taking summary statistics over the columns (frequency axis) of the signal strength features, and then averaging those features for each segment over three time windows covering the segment.  The `audiot.signal_processing.functions.py` file contains a `compute_chirp_features_for_segments()` function that computes these same features, and returns them as a Pandas DataFrame.  Note that a `signal_strength` value and the information about each segment (duration, start time, and end time) are also included in the returned DataFrame, as seen below.  The column names have numbers appended to them indicating which time window they came from (e.g. `mean_0` is the mean over the first third of the segment, `mean_1` is the mean over the middle third, and `mean_2` is the mean over the last third).

The motivation for using summary statistics along the columns is that they should be relatively invariant to the frequency of the chirps, and thus detectors based on them will hopefully work well even as the pitch of the chirps changes over time due to the chickens growing.

In [None]:
# Compute the same features that Brandon provided to his teams (in CSV files) for chirp detection
chirp_features = compute_chirp_features_for_segments(signal_strength_features, segment_index_list)

# Display the resulting DataFrame containing the features
chirp_features

# Clustering

The features extracted for segments can also potentially be passed to clustering algorithms (or other unsupervised algorithms) to try to help separate out the different types of sounds within the data.  The example code below runs clustering on just the segments for this one file, but the same approach can be applied to large numbers of segments extracted from large numbers of files.

In [None]:
# Cluster the features previously extracted for all the segments
n_clusters = 4
kmeans = MiniBatchKMeans(n_clusters=n_clusters, batch_size=2048).fit(features_for_all_segments.T)
print(f"Segment labels from training = {kmeans.labels_}")
print(f"Re-predicted segment labels  = {kmeans.predict(features_for_all_segments.T)}")

# --- Visualization code below ---------------------------------------------------------------------

# Plot the cluster centroids
fig = plt.figure(figsize=small_figure_size)
axes = fig.add_subplot(1, 1, 1)
axes.imshow(
    kmeans.cluster_centers_.T,
    aspect="auto",
    origin="lower",
    interpolation="none",
    extent=[
        0,
        n_clusters,
        0,
        features_for_all_segments.shape[0],
    ],
)
axes.set_xlabel("Cluster index")
axes.set_ylabel("Feature index")
axes.set_title(f"Cluster centroids")

# Plot the features for each cluster
for cluster_idx in range(n_clusters):
    fig = plt.figure(figsize=small_figure_size)
    axes = fig.add_subplot(1, 1, 1)
    cluster_features = features_for_all_segments[:,kmeans.labels_ == cluster_idx]
    axes.imshow(
        cluster_features,
        aspect="auto",
        origin="lower",
        interpolation="none",
        extent=[
            0,
            cluster_features.shape[1],
            0,
            cluster_features.shape[0],
        ],
    )
    axes.set_xlabel("Segment index")
    axes.set_ylabel("Feature index")
    axes.set_title(f"Features for each of the segments in cluster {cluster_idx}")