# 1. Signals & Sounds, and the Uncertainty Principle

In this first module, we get confident with the tools and perform some time-frequency analysis. We will _hear_ the Uncertainty "Principle" at work.

In [1]:
# standard python libs for math, signal processing and plotting
import numpy as np
import scipy.signal as ss
import matplotlib.pyplot as plt
from math import log10

# python libs for the notebook interaction
from ipywidgets import interact
import ipywidgets as widgets

# custom modules, to be installed by hand on the anaconda prompt
# with `pip install tftb simpleaudio`
import simpleaudio as sa   # generates sounds
from tftb.processing import Spectrogram   # computes a time-frequency "map" of a signal

## The Fourier Transform

### Theory recap

The Fourier Transform $S(f)$ of a time-dependent signal $s(t)$ is defined as:

$$S(f) = \int_{-\infty}^{+\infty}s(t)e^{-\imath 2\pi ft}dt$$

The Fourier Transform operator $F$ is a linear operator with several interesting properties. Let's recall a few.

1) Transforming a monochromatic signal of frequency $f_0$ (including $f_0 = 0$, i.e. a constant function) yields the Dirac delta distribution, and on the contrary transforming a Dirac delta yields a constant:

$$s(t) = e^{\imath 2\pi f_0 t} \Rightarrow S(f) = \delta(f - f_0)$$

$$s(t) = \delta(t) \Rightarrow S(f) = 1$$

2) Transforming a Gaussian signal yields another Gaussian, i.e. the gaussian is an _eigenfunction_ of the Fourier operator:

$$s(t) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-t^2 / 2\sigma^2} \Rightarrow S(f) = \frac{1}{2\sigma^2}e^{-2\pi^2\sigma^2 f^2}$$

Note how the transformed Gaussian has a standard deviation proportional to $\sigma^{-1}$.

More in general, the _Hermite functions_ constitute an orthonormal basis of the Fourier operator. Remember the solutions of the Armonic Oscillator?

3) The _convolution_ or _cross-correlation_ of two signals gets transformed to the product of their transforms, and vice versa:

$$c(t) = (s_1 \ast s_2)(t) = \int_{-\infty}^{+\infty}s_1(\tau) \cdot s_2(t-\tau)d\tau => C(f) = S_1(f) \cdot S_2(f)$$
$$p(t) = s_1(t) \cdot s_2(t) => P(f) = (S_1 \ast S_2)(f)$$

3.1) It follows that the _autocorrelation_ of a signal is transformed to the _power spectrum_ of that signal:

$$r(t) = (s \ast \overline{s})(t) = \int_{-\infty}^{+\infty}s(\tau)\cdot \overline{s(t-\tau)}d\tau => R(f) = |S(f)|^2$$

3.2) It also follows that the Dirac delta is the unit for the $\ast$ operator. Therefore, a monochromatic signal enveloped by a *window* signal gets transformed to the Fourier transform of the window, shifted by the signal frequency:

$$s(t) = w(t) \cdot e^{\imath 2\pi f_{0}t} \Rightarrow S(f) = W(f) \ast \delta(f - f_0) = W(f - f_0)$$

4) Given $s\bar{}(t) := s(-t)$, the following applies:

$$F^2(s) = s\bar{}; F^3(S) = s; F^4(s) = s$$

That is, $F^3$ is equivalent to the Inverse Fourier Transform. The latter can be directly defined as:

$$s(t) = \int_{-\infty}^{+\infty}S(f)e^{\imath 2\pi ft}df$$


### Testing live

Let's explore some of those properties live.

The code below defines a function to plot a given signal both in the time domain and in the frequency domain. The latter is a log/log plot of the _power spectrum_ of the signal, as this is the closest picture of our hears' perception.

The function's samples are stored as *numpy* arrays

In [2]:
samples = 65536
def plottimefreq(s, duration, small=False):
    # plot the time series of the signal s
    plt.subplots(figsize=(25, 5)) 
    ax = plt.subplot(1, 3 - int(small), 1)
    plt.plot(np.real(s))
    plt.xlim(0)
    ax.set_xlabel('t [ms]')
    maxx = int(duration*10 + 1)*100
    ax.set_xticks(np.arange(0, maxx*fs/1000, maxx*fs/10000, dtype=int))
    ax.set_xticklabels(np.arange(0, maxx, maxx/10, dtype=int))
    plt.grid(which='major')
    plt.title("Wave packet")

    # also compute and plot power spectrum
    ax = plt.subplot(1, 3 - int(small), 2)
    s = np.pad(s, (0, samples-s.size), mode='constant')
    W = np.abs(np.fft.fft(s) ** 2)
    f = np.fft.fftfreq(s.size, 1/fs)
    plt.plot(f, W)
    plt.xlim(20, 10000)
    #formatter = LogFormatter(labelOnlyBase=False, minor_thresholds=(1, 0.1))
    #ax.get_xaxis().set_minor_formatter(formatter)
    ax.set_xlabel('f [Hz]')
    plt.ylim(1E-4)
    plt.xscale('log')
    plt.yscale('log')
    plt.grid(which='both')
    plt.title("Power spectrum (log/log)")
    plt.show()

The interactive code below creates simple audio signals, plays them and shows the plots from the above function

In [3]:
fs = 44100
@interact(f=widgets.FloatLogSlider(min=log10(20), max=log10(20000), continuous_update=False),
          duration=widgets.FloatSlider(min=0.01, max=1.48, value=0.6, step=0.01, continuous_update=False),
          play=widgets.Checkbox(description='play')
         )
def playwavelet(f, duration, play):
    t = np.linspace(0, duration, int(duration * fs), False)

    # generate the fundamental wave
    s = np.sin(2 * np.pi * f * t)

    # play it
    if play:
        playable = s * (2**15 - 1) / np.max(np.abs(s))
        # stop any ongoing play
        #sa.stop_all()
        # convert to 16-bit data and play
        sa.play_buffer(playable.astype(np.int16), 1, 2, fs)        
        
    plottimefreq(s, duration)

interactive(children=(FloatLogSlider(value=20.000000000000004, continuous_update=False, description='f', max=4…

### The Uncertainty Principle in action

Combining the second moments of a square-integrable function $s$:

$$\sigma^2(s) = \int_{-\infty}^{+\infty}x^2 |s(x)|^2dx$$

Both in the time domain and the frequency domain, and using the Cauchy-Schwartz inequality, it can be proven that:

$$\sigma^2(s) \cdot \sigma^2(S) \ge \frac{1}{16\pi^2}$$

Where the equality holds for a Gaussian:

$$\sigma_t \cdot \sigma_f = \frac{1}{4\pi}$$

With $\sigma_t$ the standard deviation of a Gaussian signal as defined above, and $\sigma_f$ the standard deviation of its Fourier transform.

Compare this with the *Heisenberg Uncertainty Principle*:

$$\sigma_x \cdot \sigma_p \ge \frac{h}{4\pi} = \frac{\hbar}{2}$$

In the following, we plot a single-frequency wavelet, its power spectrum and the "short Fourier Transform" in the time-frequency plane, and compare their time vs. frequency spreads:

* by varying the duration of the wavelet
* by varying the enveloping or windowing signal (*rectangular* i.e. no window, *Hann*, *Hamming*, or *Gaussian*)
* by varying the fraction of the signal that is smoothed by the window

We can "see" how a short time spread yields a large frequency spread, and we can "hear" how the sound progressively becomes a *tic* with no clear pitch!
Also, note how the absence of a windowing signal produces a *click* at the beginning and at the end of the signal, whereas the smoothest sound comes with the Gaussian windowing.

In [4]:
fs = 44100
@interact(f=widgets.FloatLogSlider(min=log10(20), max=log10(20000), continuous_update=False),
          duration=widgets.FloatSlider(min=0.01, max=1.48, value=0.6, step=0.01, continuous_update=False),
          window=widgets.RadioButtons(options=['rect', 'hann', 'hamming', 'gaussian']),
          gauss_stdev=widgets.FloatSlider(min=100, max=8000, value=5000, continuous_update=False),
          win_frac=widgets.FloatSlider(min=0.01, max=1.00, value=1.00, step=0.01, continuous_update=False),
          play=widgets.Checkbox(description='play')
         )
def playwavelet(f, duration, window, gauss_stdev, win_frac, play):
    t = np.linspace(0, duration, int(duration * fs), False)

    # generate the fundamental wave
    s = np.sin(2 * np.pi * f * t)
    # use a window to smooth begin and end
    if window == 'hann':
        w = np.hanning(s.size * win_frac)
    elif window == 'hamming':
        w = np.hamming(s.size * win_frac)
    elif window == 'gaussian':
        w = ss.gaussian(int(s.size * win_frac), duration*win_frac*gauss_stdev)
    if window != 'rect':
        # apply the window at the ramp up and ramp down of the signal
        for i in range(int(w.size/2)):
            s[i] *= w[i]
            s[s.size-int(w.size/2)+i] *= w[int(w.size/2)+i]

    # play it
    if play:
        playable = s * (2**15 - 1) / np.max(np.abs(s))
        # stop any ongoing play
        #sa.stop_all()
        # convert to 16-bit data and play
        sa.play_buffer(playable.astype(np.int16), 1, 2, fs)        
        
    plottimefreq(s, duration)

interactive(children=(FloatLogSlider(value=20.000000000000004, continuous_update=False, description='f', max=4…

### Time-Frequency Analysis

Time-frequency analyses are particularly suitable for fast changing signals (e.g. chirps), where the spectral analysis over the full time range does not accurately represent the perceived effect. In such cases, the energy distribution over the $t$-$f$ plane accounts for the time-varying frequency content of the signal.

Such energy distribution can be obtained for instance with the *spectrogram*, defined from the *Short-Time Fourier Transform*:

$$ S_{st}(\tau, f) = \int_{-\infty}^{+\infty}s(t)w(t-\tau)e^{-\imath 2\pi ft}dt$$

Where $w(t)$ is a windowing function, typically Hamming or Gaussian. The spectrogram is defined as $|S_{st}(\tau, f)|^2$.

The Uncertainty Principle can be seen at play in the time-frequency plane as the "area" occupied by a signal cannot be arbitrarily small: a signal can either be localized over the $t$ axis or over the $f$ axis, not both. Again, the maximum localization in both axis, i.e. the minimal area, is achieved with a Gaussian signal.

In [5]:
@interact(f=widgets.FloatLogSlider(min=log10(100), max=log10(1000), continuous_update=False),
          durationms=widgets.IntSlider(min=1, max=500, value=100, step=1, continuous_update=False),
          window=widgets.RadioButtons(options=['rect', 'hann', 'hamming', 'gaussian']),
          win_frac=widgets.FloatSlider(min=0.01, max=1.00, value=1.00, step=0.01, continuous_update=False),
         )
def playwavelet(f, durationms, window, win_frac):
    t = np.linspace(0, durationms/1000, int(durationms/1000 * fs), False)

    # generate the fundamental wave
    s = 1000 * np.sin(2 * np.pi * f * t)
    # use a window to smooth begin and end
    if window == 'hann':
        w = np.hanning(s.size * win_frac)
    elif window == 'hamming':
        w = np.hamming(s.size * win_frac)
    elif window == 'gaussian':
        w = ss.gaussian(int(s.size * win_frac), durationms*win_frac*4)
    if window != 'rect':
        # apply the window at the ramp up and ramp down of the signal
        for i in range(int(w.size/2)):
            s[i] *= w[i]
            s[s.size-int(w.size/2)+i] *= w[int(w.size/2)+i]
    
    # pad the signal, resample to speed up computation, and compute the spectrogram
    # the window function is 200 times shorter than the input signal to accurately "explore" it
    padding = int((samples/2-s.size)/2)
    sp = Spectrogram(ss.resample(np.pad(s, (padding, padding), mode='constant'), 2048), 
                     fwindow=np.hamming(int(s.size/200)+1))   # ss.gaussian(int(s.size/200)+1, 3) as an alternative
    sp.run()
    sp.plot(kind='contour', scale='log', threshold=0.01)

    # add the usual time domain and frequency domain plots for reference
    plottimefreq(s, durationms/1000, small=True)


interactive(children=(FloatLogSlider(value=100.0, continuous_update=False, description='f', max=3.0, min=2.0),…