## Requirements

Run the following setup script first.

In [None]:
from pathlib import Path

if 'google.colab' in str(get_ipython()) and not Path('/content/data').is_dir(): # we only need to run this once
    !wget -q -O /content/setup.sh https://raw.githubusercontent.com/solita/ivves-machine-spraak/main/setup.sh
    !bash /content/setup.sh
else:
    print('This notebook is only meant to be run in Google Colab.')

# Exploratory Analysis of Machine Audio Data

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
from scipy.io import wavfile
from math import ceil
import librosa, librosa.display

import modules.utils as utl

data_folder = Path('/content/data/converted/')
sample_rate, samples, names = utl.load_data(data_folder, channel=0)
print(names)

Some of the audio clips in our dataset contain stereo channels. In `utl.load_data` we decide how to convert these to mono for further processing (channel=0 or 1 correspond to keeping only the left or right channel respectively, while -1 takes the mean of the two). For our data it's dangerous to combine the channels since the phases of the signals are perfectly out of sync and thus cancel out:

In [None]:
begin, window = 0, 5000
tmp_rate, tmp_sample = wavfile.read(data_folder / 'ZOOM0005_Tr34.WAV')
plt.figure(figsize=(8,3))
y=tmp_sample[begin:begin+window,:]
plt.plot(utl.times_like(y), y, alpha=0.5)
plt.title('Snapshot of the waveform from a stereo audio clip.')
plt.show()

## Waves

In [None]:
fig, axs = plt.subplots(4, 2, figsize=(14,8))
fig.tight_layout(pad=1.5, rect=[0, 0.03, 1, 0.95])

xs = [utl.times_like(s, sample_rate) for s in samples]

for n, ax in enumerate(axs.flat):
    ax.plot(xs[n], samples[n])
    ax.set_title(names[n])
plt.show()

## Spectrograms

Spectrograms are typically constructed by considering the Fourier transform of the input signal in a short window and then plotting the coefficients of the resultant frequencies versus time as a heatmap. See <https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.spectrogram.html> for more details.

If you are interested in the mathematical formulation of a spectrogram and the (discrete) Fourier transform, then a short explanation is available [here](https://www.princeton.edu/~cuff/ele201/files/spectrogram.pdf).

In [None]:
show_sample = samples[7]
names[7]

In [None]:
utl.plot_spec(show_sample, sample_rate)

We can also investigate the autocorrelation of the input signal with the help of spectrograms.

In [None]:
fig, axs = plt.subplots(3, 2, figsize=(16,10), sharex=True)
for i in range(6):
    ax=(axs.flat)[i]
    utl.plot_spec(np.diff(show_sample, n=i+1), sample_rate, ax=ax)
    ax.set_xlabel(''); ax.set_ylabel('')
    ax.set_title(f'lag={i+1}')

## Other spectral plots


### Periodogram

Not so useful for this particular data. <https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.periodogram.html>

In [None]:
freqs, spec = signal.periodogram(show_sample, sample_rate, window='flattop', scaling='spectrum')
fig = plt.figure(figsize=(18, 5))
ax = fig.add_subplot(111)
p = ax.semilogy(freqs, spec)
ax.set_ylabel('Frequency (Hz)')
ax.set_xlabel('Time (frames)')
plt.show()

In [None]:
freqs, spec = signal.welch(show_sample, sample_rate, scaling='spectrum')
fig = plt.figure(figsize=(18, 5))
ax = fig.add_subplot(111)
p = ax.semilogy(freqs, spec)
ax.set_ylabel('Frequency (Hz)')
ax.set_xlabel('Time (frames)')
plt.show()

### Mel Spectrogram
<https://librosa.org/doc/main/generated/librosa.feature.melspectrogram.html#librosa.feature.melspectrogram>

In [None]:
lib_sample = show_sample.astype(np.float32)

In [None]:
mel = librosa.feature.melspectrogram(y=lib_sample, sr=sample_rate)
mel_dB = librosa.power_to_db(mel, ref=np.max)
fig, ax = plt.subplots(figsize=(18,5))
img = librosa.display.specshow(mel_dB, y_axis='mel', x_axis='time', ax=ax)
fig.colorbar(img, ax=ax)
plt.show()

### MFCC
<https://librosa.org/doc/main/generated/librosa.feature.mfcc.html#librosa.feature.mfcc>

In [None]:
mfcc = librosa.feature.mfcc(y=lib_sample, sr=sample_rate, hop_length=2**12, dct_type=2)
fig, ax = plt.subplots(figsize=(18,5))
img = librosa.display.specshow(mfcc, x_axis='time', ax=ax)
fig.colorbar(img, ax=ax)
plt.show()

### RMS Energy
<https://librosa.org/doc/main/generated/librosa.feature.rms.html#librosa.feature.rms>

In [None]:
rms = librosa.feature.rms(y=lib_sample)
fig, ax = plt.subplots(figsize=(18,5))
ax.semilogy(librosa.times_like(rms), rms[0])
plt.show()

### Spectral Centroid
<https://librosa.org/doc/main/generated/librosa.feature.spectral_centroid.html#librosa.feature.spectral_centroid>

In [None]:
cent = librosa.feature.spectral_centroid(y=lib_sample, sr=sample_rate)
S, phase = librosa.magphase(librosa.stft(y=lib_sample))
fig, ax = plt.subplots(figsize=(18,5))
librosa.display.specshow(librosa.amplitude_to_db(S, ref=np.max),
                         y_axis='log', x_axis='time', ax=ax)
ax.plot(librosa.times_like(cent), cent.T, label='Spectral centroid', color='w')
plt.show()

### Spectral Contrast
<https://librosa.org/doc/main/generated/librosa.feature.spectral_contrast.html#librosa.feature.spectral_contrast>

In [None]:
S = np.abs(librosa.stft(lib_sample))
contrast = librosa.feature.spectral_contrast(S=S, sr=sample_rate)
fig, ax = plt.subplots(figsize=(18,5))
img = librosa.display.specshow(contrast, x_axis='time', ax=ax)
fig.colorbar(img, ax=ax)
ax.set_ylabel('Frequency bands')
plt.show()

### Zero-crossing rate (ZCR)

ZCR measures how often the input signal crosses zero (so in some sense it's related to the average wavelength). This is one possible feature we can use. What is the appropriate window for computing the rate of change? (smoothness vs. precision)

In [None]:
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(16,7))
ax1.plot(samples[7])
ax2.plot(utl.zero_cross_rate(show_sample, window=2000))
fig.suptitle(f'Audio signal and the corresponding ZCR for part of {names[7]}')
plt.show()