Sascha Spors,
Professorship Signal Theory and Digital Signal Processing,
Institute of Communications Engineering (INT),
Faculty of Computer Science and Electrical Engineering (IEF),
University of Rostock,
Germany

# Data Driven Audio Signal Processing - A Tutorial with Computational Examples

Master Course #24512

- lecture: https://github.com/spatialaudio/data-driven-audio-signal-processing-lecture
- tutorial: https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise

Feel free to contact lecturer frank.schultz@uni-rostock.de

# Exercise 2: Audio Features

## Objectives

There is a vast amount of audio features / measures that can be extracted from audio files and subsequently used for human inspection and/or machine learning (ML) applications. We want to learn some of them here and we want to humanly interprete them on some nice audio data. Once we got an impression from the information we can draw as humans, we can make faithful decisions on how and when to apply this for ML.

We will deal with
- Loudness and true peak measures
- Sample Histogram
- STFT power spectrum
- Spectral Centroid
- Periodogram

## Special Python Packages
- in this exercise we use the `pyloudnorm` package from https://github.com/csteinmetz1/pyloudnorm, we might install it by `pip install pyloudnorm` 
- in this exercise we also use `librosa` package (see https://librosa.org/doc/latest/index.html), we might install it by `pip install librosa`

## Some Initial Python Stuff

In [None]:
import librosa
import matplotlib.pyplot as plt
import numpy as np
import os
import pyloudnorm as pyln
import warnings
from matplotlib.cm import get_cmap
from matplotlib.colors import BoundaryNorm
from scipy.io import wavfile
from scipy.signal import resample, stft, welch
from scipy.signal.windows import kaiser

import librosa.display  # matplotlib dependencies?, therefore import afterwards

CI_flag = True

In [None]:
# generate some audio data, in case that audio_ex02 folder is empty
if True:
    fs = 44100
    if CI_flag:
        N = fs * 2
    else:
        N = fs * 30
    k = np.arange(N)

    fdes = 441 * 3
    x = 0.5 * np.sin(2 * np.pi * fdes / fs * k)
    x = np.tile(x, (2, 1)).T
    print(x.shape)
    wavfile.write("audio_ex02/sine.wav", fs, x.astype(np.float32))

    x = np.random.randn(N, 2)
    x /= np.max(np.abs(x))
    x *= 0.5
    print(x.shape)
    wavfile.write("audio_ex02/noise.wav", fs, x.astype(np.float32))

In [None]:
def set_cmap(xmax, dbrange):
    """cmap for STFT 20log10(|X[t,f]|) in dB."""
    col_tick = np.linspace(xmax - dbrange, xmax, dbrange, endpoint=True)
    cmap = get_cmap("magma").copy()
    cmap.set_over("C3")
    cmap.set_under("C7")
    norm = BoundaryNorm(col_tick, cmap.N)
    return cmap, norm

In [None]:
warnings.filterwarnings("ignore")  # annoying scipy.io.wavfile.read warnings

N_OS = 8  # oversample factor for true peak
dbrange = 72  # colormap dB range for STFT surface plots

ch = 1  # 0...left channel, 1...right channel for stereo file

folder = "audio_ex02/"
files = sorted(os.listdir(folder))  # get file names in this folder
print(files)

# for debug tests choose certain wav-files
# files = [files[1]]
# print(files)

## Audio Load with Librosa vs. Scipy I/O

In [None]:
for i in files:
    if i[-4:] == ".wav":  # consider only wav files
        # make sure to check defaults, they might be not convenient for us
        xlib, fslib = librosa.load(
            folder + i, mono=False, sr=None, dtype="double"
        )
        fssci, xsci = wavfile.read(folder + i)

        # we rather should use the shape (number of samples, number of channels)
        xlib = xlib.T
        print("audio load x librosa == scipy.io:", np.allclose(xlib, xsci))
        print("audio load fs librosa == scipy.io:", np.allclose(fslib, fssci))

## Calculate BS.1170 Loudness, True Peak and Histogram 

In [None]:
for i in files:
    if i[-4:] == ".wav":  # consider only wav files
        fs, x = wavfile.read(folder + i)

        # true peak in decibel true peak (dBTP)
        xr = resample(x, N_OS * x.shape[0])
        dbtp = 20 * np.log10(np.max(np.abs(xr)))

        # normalize to desired dBTP
        dbtp_des = 0.0
        x *= 10 ** ((dbtp_des - dbtp) / 20)

        # check dBTP
        xr = resample(x, N_OS * x.shape[0])
        dbtp = 20 * np.log10(np.max(np.abs(xr)))

        # measure loudness in
        # 'loudness units relative to full scale' (LUFS)
        # according to https://www.itu.int/rec/R-REC-BS.1770
        meter = pyln.Meter(fs)
        lufs = meter.integrated_loudness(x)

        msg = (
            f"{'file: ' + i:<30}"
            f"{'dBTP: '} {dbtp:+4.2f} \t"
            f"{'LUFS: '} {lufs:+4.2f}"
        )
        print(msg)

        plt.figure()
        plt.hist(x, range=(-1, 1), bins="auto", density=True)
        plt.title(
            f"{i}" f"{', dBTP:'} {dbtp:+4.2f}" f"{', LUFS:'} {lufs:+4.2f}"
        )
        plt.xlabel("sample value")
        plt.ylabel("density-like occurence")
        plt.savefig(folder + i[:-4] + "_LUFS_hist.png")

## Calculate and Plot STFT Power Spectrum

In [None]:
if CI_flag:
    nperseg, nfft = 2**10, 2**10
else:
    nperseg, nfft = 2**14, 2**18

for i in files:
    if i[-4:] == ".wav":  # consider only wav files
        fs, x = wavfile.read(folder + i)

        f, t, X = stft(
            x[:, ch], fs, window="hamming", nperseg=nperseg, nfft=nfft
        )
        Xmaxdb = 10 * np.log10(np.max(np.abs(X) ** 2))

        cbticks = np.arange(Xmaxdb - dbrange, Xmaxdb + 6, 6)
        cmap, norm = set_cmap(Xmaxdb, dbrange)
        fig, ax = plt.subplots(figsize=(6, 5), nrows=1, ncols=1)
        srf = ax.pcolormesh(
            t, f, 10 * np.log10(np.abs(X) ** 2), cmap=cmap, norm=norm
        )
        cax = plt.colorbar(srf, cmap=cmap, norm=norm, ticks=cbticks)
        ax.set_ylim(2e1, 2e4)
        plt.yscale("log")
        ax.set_title("STFT Magnitude: " + i)
        ax.set_ylabel("f / Hz")
        ax.set_xlabel("t / s")
        cax.ax.set_xlabel("dB")
        plt.savefig(folder + i[:-4] + "_STFT.png")

## Modified Periodogram

We estimate the power spectral density.

We use the Welch method with Kaiser-Bessel window (contrary to a Hann or a Hamming window, we can conveniently parametrize Kaiser-Bessel by `beta`).

By using a window in the Welch method, we obtain a modified peridogram.

In [None]:
if CI_flag:
    nperseg, nfft = 2**10, 2**10
else:
    nperseg, nfft = 2**11, 2**18

kaiser_beta = 5

for i in files:
    if i[-4:] == ".wav":  # consider only wav files
        fs, x = wavfile.read(folder + i)
        f, Pxx_den = welch(
            x[:, ch],
            fs,
            nperseg=nperseg,
            nfft=nfft,
            window=kaiser(nperseg, beta=kaiser_beta),
        )
        Pxxdb = 10 * np.log10(Pxx_den)
        Pxxdbmax = np.max(Pxxdb)

        fig, ax = plt.subplots(figsize=(6, 4), nrows=1, ncols=1)
        plt.semilogx(f, Pxxdb)
        plt.ylim(Pxxdbmax - dbrange + 6, Pxxdbmax + 6)
        plt.xlim(2e1, 2e4)
        plt.xlabel("frequency in Hz")
        plt.ylabel("Kaiser-Bessel Windowed Welch Estimate of PSD in dB")
        plt.title(i)
        plt.grid(True)
        plt.savefig(folder + i[:-4] + "_PSD.png")

## Spectral Centroid Using Librosa

We should **carefully check** if the **defaults** of functions suit our needs!!!

For example, by default all loaded audio is resampled to 22050 Hz and mixed down to mono. We might not find this very useful.

So make sure, that correct sampling frequency is handled and given to all the functions that rely on it, otherwise very strange results can occur!

STFT has many parameters, so we might also check these in detail. In the example below we can live with the defaults, just changing to the Hamming window.

**Task@Home**: How to set up the STFT here so that we get precisely the same output as above, where we used `scipy.signal.stft` .

In [None]:
print("librosa version:", librosa.__version__)
# https://librosa.org/doc/latest/generated/librosa.feature.spectral_centroid.html?highlight=spectral%20centroid#librosa.feature.spectral_centroid

# due to used librosa.amplitude_to_db(S, ref=np.max) normalization to 0 dB
Xmaxdb = 0
cbticks = np.arange(Xmaxdb - dbrange, Xmaxdb + 6, 6)
cmap, norm = set_cmap(Xmaxdb, dbrange)

for i in files:
    if i[-4:] == ".wav":  # consider only wav files
        x, sr = librosa.load(folder + i, mono=False, sr=None, dtype="double")
        x = x.T
        y = x[:, ch]

        cent = librosa.feature.spectral_centroid(y=y, sr=sr, window="hamming")
        t_cent = librosa.times_like(cent, sr=sr)
        S, _ = librosa.magphase(librosa.stft(y=y, window="hamming"))
        fig, ax = plt.subplots(figsize=(6, 5), nrows=1, ncols=1)
        srf = librosa.display.specshow(
            librosa.amplitude_to_db(S, ref=np.max),
            sr=sr,
            y_axis="log",
            x_axis="time",
            ax=ax,
            cmap=cmap,
            norm=norm,
        )
        ax.plot(t_cent, cent.T, label="spectral centroid", color="white")
        cax = plt.colorbar(srf, cmap=cmap, norm=norm, ticks=cbticks)
        ax.legend(loc="upper right")
        ax.set(
            title="log power spectrogram: " + i, xlabel="t / s", ylabel="f / Hz"
        )
        cax.ax.set_xlabel("dB")
        plt.savefig(folder + i[:-4] + "_SpectralCentroid.png")

## Spectral Centroid Manually on DFT Data

In [None]:
fs, x = wavfile.read(folder + "noise.wav")
x = x[:, ch]
nsig = x.shape[0]
nfft = int(
    2 ** np.ceil(np.log(nsig) / np.log(2))
)  # ensure power of two and even
X = np.fft.fft(x, n=nfft)

df = fs / nfft  # DFT frequency resolution
k = np.arange(nfft // 2 + 1)  # DFT frequency index vector for base band
f = k * df  # frequency vector from 0 Hz to fs/2
Xonesided = X[0 : nfft // 2 + 1]  # DFT spectrum from 0 Hz to fs/2

# calc spectral centroid
moment = 1
norm = np.sum(np.abs(Xonesided) ** moment)
spectral_centroid_k = np.inner(np.abs(Xonesided) ** moment, k) / norm
spectral_centroid_f = np.inner(np.abs(Xonesided) ** moment, f) / norm

print("spectral centroid is at DFT bin: ", spectral_centroid_k)
print("spectral centroid is at frequency: ", spectral_centroid_f, "Hz")
print(np.isclose(spectral_centroid_k * df, spectral_centroid_f))

## Copyright

- the notebooks are provided as [Open Educational Resources](https://en.wikipedia.org/wiki/Open_educational_resources)
- the text is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/)
- the code of the IPython examples is licensed under the [MIT license](https://opensource.org/licenses/MIT)
- feel free to use the notebooks for your own purposes
- please attribute the work as follows: *Frank Schultz, Data Driven Audio Signal Processing - A Tutorial Featuring Computational Examples, University of Rostock* ideally with relevant file(s), github URL https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise, commit number and/or version tag, year.