# Matching times of tokens and corresponding measures

A common task is to extract acoustic and other measurements from a time series analysis that correspond to labelled tokens (e.g. phones, clusters, words, phrases, contexts) in an annotation file. This notebook illustrates how to perform this task.

In [1]:
import sys
import numpy as np
import pandas as pd
from pathlib import Path
import matplotlib.pyplot as plt
from audiolabel import read_label
from phonlab.utils import match_tokentimes

ModuleNotFoundError: No module named 'phonlab'

## Example data

Let's start with a spectrogram, which consists of a series of spectral slices calculated a regular timesteps, with each slice consisting of a number of frequency bins. Each datapoint in the spectrogram is the magnitude of a frequency bin in its spectral slice.

Spectral analysis typically results in a 2d array in which the times of the spectral slices (of which there are 200 here) are along the rows (axis 0), and the frequencies (of which there are 100 here) are along the columns (axis 1). For `imshow` we flip the axes with `.T` (an alias of `transpose()`) so that time is on the x-axis (columns) and frequencies are on the y-axis (rows).

The axis labels show the integer index of the values in each dimension, which range from 0&ndash;199 for times and 0&ndash;99 for frequencies. **These indexes are not the times and frequencies themselves; they are simply the locations of the spectral slices/frequencies in the ordered axis.**

In [None]:
d = np.load('../resource/spectrogram/spec.npz')
spec = d['spec']
print(f'Spectrogram shape: {spec.shape}')
plt.imshow(spec.T, origin='lower', cmap='gray_r')

Individual spectral slices can be selected by indexing `spec` to return a specific row. The resulting plot shows the frequency bins on the x-axis and their magnitudes on the y-axis. The maximum value is marked with a red star.

In [None]:
specidx = 26  # index of spectral slice to plot, any integer from 0 to 199
specslice = spec[specidx,:]
idxmax = np.argmax(specslice)
print(f'Index of max freq in spectral slice {specidx}: {idxmax}')
plt.plot(specslice.T)
plt.axvline(x=idxmax, color='r')

The spectral analysis will also include 1d arrays that list the times of the spectral slices and the frequency bins that are the same length as the corresponding axes of `spec`. `spectimes` has a time value (seconds) for every spectral slice of `spec`, and `freqs` has a frequency value (Hz) for every frequency bin of spec. These lengths of these arrays match the lengths of the axes of `spec`.

In [None]:
spectimes = d['spectimes']
freqs = d['freqs']
print(f'Shape of spec: {spec.shape}.\nLength of spectimes: {len(spectimes)}.\nLength of freqs: {len(freqs)}.')
print(f'\nspectimes values {spectimes}')
print(f'\nfreqs values {freqs}')

We can use `spectimes` and `freqs` to redefine the axes labels displayed by `imshow`.

In [None]:
plt.imshow(
    spec.T,
    origin='lower',
    cmap='gray_r',
    extent=[spectimes[0], spectimes[-1], freqs[0], freqs[-1]],
    aspect=0.0015
)

## Token annotations

Now consider a set of annotations of the audio file that was used to generate the spectrogram. You can load these into a dataframe and correlate them with spectral slices. The start and end times of each annotation are in the `t1` and `t2` columns, and the `label` column contains the content. In the second step the midpoint of each annotation is calculcated and added as the `midpt` column.

In [None]:
[tg] = read_label('../resource/spectrogram/spec.tg', ftype='praat')
tg['midpt'] = (tg['t1'] + tg['t2']) / 2
tg

From these annotations you can select the rows that are the tokens you are interested in processing. Here are all the 'V' tokens.

In [None]:
tok = tg[tg['label'] == 'V']
tok

## Matching times



The next step is to select spectral slices from the spectrogram that correspond to the token times. One way to do this is to find the spectral slice closest to the midpoint of the token. The `match_tokentimes` functions compares two time arrays. For every value of the first array the index of the closest match in the second array is returned. In this example, the indexes of the spectral slices that most closely match the token midpoints are the result stored in `tidx`.

In [None]:
tidx = match_tokentimes(tok['midpt'], spectimes)
tidx

If you use `tidx` to select the times of the spectral slices from `spectimes`, you find values that are close to the midpoints found in the `midpt` column of `tok`.

In [None]:
spectimes[tidx]

To select the spectral slices, use `tidx` on `spec`. The result is three token rows of 100 frequency bins.

In [None]:
spec[tidx].shape

### Multiple measures per token

In many cases you will want to find measures at multiple times per token, for example at the beginning of the token, 25% of the way through the token, 50%, 75%, and at the end. To do this, first use [`np.linspace`](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html) to create five equally-spaced values from the start of each token to the end.
The return value is a 2d array that has a row for each token, and the times are arranged in the columns. The `axis=1` parameter ensures that `linspace` returns the times in the correct shape.

In [None]:
toktimes = np.linspace(tok['t1'], tok['t2'], num=5, axis=1)
toktimes

The `match_tokentimes` function accepts a 2d first parameter, where the tokens are arranged along the first axis, as they are in `toktimes`.

In [None]:
tidx = match_tokentimes(toktimes, spectimes)

In [None]:
spec[tidx].reshape(len(toktimes), -1).shape

In [None]:
mytoken = tidx[0]
mytoken

In [None]:
plt.imshow(
    spec[mytoken].T,
    origin='lower',
    cmap='gray_r',
    extent=[spectimes[mytoken][0], spectimes[mytoken][-1], freqs[0], freqs[-1]],
    aspect=0.0015
)