Sascha Spors,
Professorship Signal Theory and Digital Signal Processing,
Institute of Communications Engineering (INT),
Faculty of Computer Science and Electrical Engineering (IEF),
University of Rostock,
Germany

# Data Driven Audio Signal Processing - A Tutorial with Computational Examples

Winter Semester 2022/23 (Master Course #24512)

- lecture: https://github.com/spatialaudio/data-driven-audio-signal-processing-lecture
- tutorial: https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise

Feel free to contact lecturer frank.schultz@uni-rostock.de

# Exercise 3: Audio Features

## Objectives

- RMS, Peak, Crest Factor as often used, very simple features
- DFT symmetry/redundancy for real valued audio signals
- DFT frequency axis scaling
- DFT graphical resolution vs. leakage effect


In [None]:
import numpy as np
import matplotlib.pyplot as plt


def get_rms_peak_crest(x):
    xPeak = np.max(np.abs(x))
    xRMS2 = x.T @ x / x.size
    xRMS = np.sqrt(xRMS2)  # root mean square/Effektivwert
    # 1**2 indicates reference of 1 -> squared:
    dBRMS = 10*np.log10(xRMS**2 / 1**2)
    # 1**2 indicates reference of 1 -> squared:
    dBPeak = 10*np.log10(xPeak**2 / 1**2)
    CrestFactor_dB = dBPeak - dBRMS
    print('xMRS**2 = ', xRMS2)
    print('xRMS = ', xRMS)
    print('xPeak = ', xPeak)
    print(dBRMS, 'dB_RMS')
    print(dBPeak, 'dBPeak')
    print('CrestFactor_dB = ', CrestFactor_dB)

## Root Mean Square (RMS), Peak Value and Crest Factor

In [None]:
# DFT eigensignal -> audio features from get_rms() match the ideal cosine values
N = 32
k = np.arange(N)
mu = 2
x = 1 * np.cos(2*np.pi/N*mu * k)
get_rms_peak_crest(x)
plt.stem(k, x, basefmt='C0:', linefmt='C0:', markerfmt='C0o')
plt.xlabel('k')
plt.ylabel('x[k]')
plt.grid()

In [None]:
# no! DFT eigensignal -> audio features from get_rms() do NOT match the ideal cosine values
# so for the signal in this block itself, these feature values are correct
# but we probably need statistical signal processing methods
# to evaluate robust features, such as e.g. variance from a random process
# for that see https://github.com/spatialaudio/digital-signal-processing-lecture
N = 32
k = np.arange(N)
mu = 2.15
x = 1 * np.cos(2*np.pi/N*mu * k)
get_rms_peak_crest(x)
plt.stem(k, x, basefmt='C0:', linefmt='C0:', markerfmt='C0o')
plt.xlabel('k')
plt.ylabel('x[k]')
plt.grid()

In [None]:
# rectangular signal is the only(?!) one with RMS=Peak, and therefor 0dB crest factor
# all other signals exhibit higher crest factor indicating RMS<Peak
N = 16
k = np.arange(2*N)
mu = 1
x = 1 * np.sin(2*np.pi/N*mu * k)
x[x >= 0] = 1
x[x < 0] = -1
get_rms_peak_crest(x)
plt.stem(k, x, basefmt='C0:', linefmt='C0:', markerfmt='C0o')
plt.xlabel('k')
plt.ylabel('x[k]')
plt.grid()

In [None]:
# noise signal -> amplitude values are drawn from a normal distribution
# in this example we use standard normal with mean=0 and standard deviation=1
N = 2**11
k = np.arange(N)
rng = np.random.default_rng(1234)
mean, stdev = 0, 1
x = rng.normal(mean, stdev, N)
get_rms_peak_crest(x)
#plt.stem(k,x, basefmt='C0:', linefmt='C0:', markerfmt='C0o')
plt.plot(k, x, 'C0o:', ms=3)
plt.xlabel('k')
plt.ylabel('x[k]')
plt.grid()
# for this type of 'mean free' noise crest factor is typically
# in the range of 11-13 dB
# the actual value highly depends on the actual highest peak
# taht occurs in the signal (note: high signal amplitudes occur
# very rarely in the Gaussian probability density function)
# rather than on the more robust RMS estimation

In [None]:
# in statistical signal processing xMRS**2 is known as quadratic mean, cf.
# https://github.com/spatialaudio/digital-signal-processing-exercises/blob/master/random_signals/ensemble_averages.ipynb
# in the special case of mean=0, xMRS**2 is equal to the (biased) variance (estimator)
# so we create a mean free signal
x1 = x - np.mean(x)
# and check
get_rms_peak_crest(x1)
print('var = ', np.var(x1, ddof=0))
print('only here xMRS**2 == var')

## DFT  Symmetry for Real Signals

In [None]:
def dft_sym_plot():
    if np.mod(N, 2) == 0:
        print('even N = ', N)
    else:
        print('odd N = ', N)
    print(N//2+1, 'unique DFT bins')
    k = np.arange(N)
    mu = k
    mu_base = np.arange(N//2+1)
    rng = np.random.default_rng(1)
    mean, stdev = 0, 1
    x = rng.normal(mean, stdev, N)
    X = np.fft.fft(x)/N

    # power is equally stored in base band and mirrored spectrum
    X_mag_base = 2*X[0:N//2+1]
    # mean has only one bin, so magnitude was already correct
    X_mag_base[0] /= 2
    if np.mod(N, 2) == 0:  # for even DFT sizes, fs/2 frequency has only one bin
        X_mag_base[N//2] /= 2  # so magnitude was already correct there also

    plt.figure(figsize=(12, 4))
    plt.subplot(1, 2, 1)
    plt.stem(mu, np.abs(X), basefmt='C0:', linefmt='C0:',
             markerfmt='C0o', label='full DFT')
    plt.stem(mu[0:N//2+1], np.abs(X[0:N//2+1]), basefmt='C0:', linefmt='C0:',
             markerfmt='C1o', label='base band for real valued signal')
    plt.plot([N/2, N/2], [0, 0.5], 'C7', label='axial symmetry line')
    plt.xlim(-1, N)
    plt.ylim(0, 0.5)
    plt.xlabel(r'$\mu$')
    plt.ylabel(r'$|X[\mu]| / N$')
    plt.legend(loc='upper right')
    plt.grid()

    plt.subplot(1, 2, 2)
    plt.stem(mu_base, np.abs(X_mag_base), basefmt='C3:',
             linefmt='C3:', markerfmt='C3o')
    plt.xlim(-1, N)
    plt.ylim(0, 0.5)
    plt.xlabel(r'$\mu$')
    plt.ylabel(r'base band magnitude')
    plt.grid()

In [None]:
N = 2**4
dft_sym_plot()
# the DFT frequency index mu = N/2 = 8 belongs to half of the sampling freqency
# this also corresponds to DTFT frequency pi

In [None]:
N = 2**4+1
dft_sym_plot()

## DFT Frequency Axis Lin/Log and Leakage Effect Visualization

In [None]:
fs = 48000  # Hz, typical audio sampling freq
N = fs
k = np.arange(N)
df = fs / N
print('df', df)
mu = np.arange(N)  # DFT frequency index
Om_mu = 2*np.pi/N * mu  # DTFT frequencies
f = df * mu  # physical frequency

In [None]:
fsine = 1000.5
x = np.cos(2*np.pi*fsine/fs * k)
X = np.fft.fft(x)

In [None]:
plt.figure(figsize=(10, 6))
plt.subplot(2, 1, 1)
# sloppy version of 2/N scaling!, see correct base band magnitude handling for DC, fs/2 above
plt.plot(f, 10*np.log(np.abs(2/N*X)**2))
# for real valued signals we typically plot only up to half of the sampling frequency
plt.xlim(0, fs/2)
plt.xlabel('f / Hz')
plt.ylabel('dB (relative to sine magnitude)')
plt.grid()

plt.subplot(2, 1, 2)
# sloppy version of 2/N scaling!, see correct base band magnitude handling for DC, fs/2 above
plt.semilogx(f, 10*np.log(np.abs(2/N*X)**2))
plt.xlim(1, fs/2)
plt.xlabel('f / Hz')
plt.ylabel('dB (relative to sine magnitude)')
plt.grid()

We actually see leakage effect of the rectangular window for the worst case, i.e. signal frequency is in the middle of two DFT eigenfrequencies.

Due to high DFT resolution and not sufficient graphical resolution, the zeros of the spectrum (leading to $-\infty$ dB) are not seen in the plot.

To overcome this, we could perform 
- either a DFT->DTFT interpolation (the neat academic way!)
- or zeropadding which also increases the graphical resolution of the spectrum. This is basically DFT->DTFT interpolation in a numerical manner, where we better know what we do. Zeropadding does **not** increase information on the spectrum, its all in the original DFT data!!!

We might want to check https://github.com/spatialaudio/digital-signal-processing-exercises/blob/master/dft/dft_to_dtft_interpolation.ipynb

So, let us perform zerodpadding.

We should pay attention that the frequency vector needs recalculation, but the 2/N scaling is same as above, since we did not added new spectral information and thus added no power.

In [None]:
z = np.zeros(2**16)
xz = np.append(x, z)
Xz = np.fft.fft(xz)

Nz = xz.size
dfz = fs / Nz
print('dfz', dfz)
muz = np.arange(Nz)  # DFT frequency index
Om_muz = 2*np.pi/Nz * muz  # DTFT frequencies
fz = dfz * muz  # physical frequency

plt.figure(figsize=(10, 6))
plt.subplot(2, 1, 1)
# sloppy version of 2/N scaling!, see correct base band magnitude handling for DC, fs/2 above
plt.plot(fz, 10*np.log(np.abs(2/N*Xz)**2))
# for real valued signals we typically plot only up to half of the sampling frequency
plt.xlim(0, fs/2)
plt.xlabel('f / Hz')
plt.ylabel('dB (relative to sine magnitude)')
plt.grid()

plt.subplot(2, 1, 2)
# sloppy version of 2/N scaling!, see correct base band magnitude handling for DC, fs/2 above
plt.semilogx(fz, 10*np.log(np.abs(2/N*Xz)**2))
plt.xlim(1, fs/2)
plt.xlabel('f / Hz')
plt.ylabel('dB (relative to sine magnitude)')
plt.grid()

## Copyright

- the notebooks are provided as [Open Educational Resources](https://en.wikipedia.org/wiki/Open_educational_resources)
- the text is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/)
- the code of the IPython examples is licensed under the [MIT license](https://opensource.org/licenses/MIT)
- feel free to use the notebooks for your own purposes
- please attribute the work as follows: *Frank Schultz, Data Driven Audio Signal Processing - A Tutorial Featuring Computational Examples, University of Rostock* ideally with relevant file(s), github URL https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise, commit number and/or version tag, year.