<div align="right"><i>COM418 - Computers and Music</i></div>
<div align="right"><a href="https://people.epfl.ch/paolo.prandoni">Paolo Prandoni</a>, <a href="https://www.epfl.ch/labs/lcav/">LCAV, EPFL</a></div>

<p style="font-size: 30pt; font-weight: bold; color: #B51F1F;">From vintage videogames to one-bit audio and sigma-delta</p>

In [None]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import scipy.signal as sp
from scipy.io import wavfile
import IPython

In [None]:
plt.rcParams["figure.figsize"] = (14,4)

In [None]:
def stem(x, color='tab:blue', **kwargs):
    # stem with chosen color
    markerline, stemlines, baseline = plt.stem(x, use_line_collection=True, basefmt=color, **kwargs);
    markerline.set_color(color)
    stemlines.set_color(color)
    
import ipywidgets as widgets

def play(clip, rate):
    display(IPython.display.Audio(clip, rate=rate))
    
def multiplay(clips, rate, title=None):
    outs = [widgets.Output() for c in clips]
    rate = [rate] * len(clips) if isinstance(rate, (int, float)) else rate
    for ix, clip in enumerate(clips):
        with outs[ix]:
            print(title[ix] if title is not None else "")
            display(IPython.display.Audio(clip, rate=rate[ix])  )
    return widgets.HBox(outs)    

In [None]:
def quantize(x, M):
    if M == 0:
        return x
    elif M % 2 == 0:
        # using a mid-riser quantizer
        M = M / 2
        k = np.floor(x * M)
        k = np.maximum(np.minimum(k, M-1), -M)
        return (k + 0.5) / M
    else:
        # using a deadzone quantizer
        k = np.round(np.abs(x) * M / 2)
        k = np.minimum((M - 1) / 2, k)
        return (np.sign(x) * k / M * 2 )

# One-Bit Audio

Why encode audio at one bit per sample?
 * ADC cheaper since quantizer only has 1 threshold 
 * DAC much cheaper, since it's just a simple lowpass filter (many DAC's in smartphones, tablets and PCs use conversion to 1-bit)
 * easy to transmit since no framing required
 
but also...
 * sometimes two-level audio is all you have!

# Early video games

<img src="img/pong.jpg" alt="pong" style="float: right; width: 200px; margin: 00px 30px;"/>

 * early arcade video games had no sound 
 * Pong, in 1972, added some simple beeps when the ball hit the walls
 * the first game with continous sound was Space Invaders (1978)

In the early days, all developers had was square waves

## Square Waves

The cheapest way to create sound is drive a louspeaker directly 
with an I/O data line of the CPU, i.e., playing square waves.


This was still the case in some popular home computers in the 1980s

![spectrum](img/spectrum.png)

### Pitched square waves

First approach: threshold a sine wave

$$
    x[n] = \begin{cases} 
        +1 & \mbox{if $\sin(\omega_0 n) \ge 0$} \\ 
        -1 & \mbox{if $\sin(\omega_0 n) \lt 0$} 
        \end{cases} 
$$ 

In [None]:
def square_wave_naive(w, N):
    return np.where(np.sin(np.arange(0, N) * w) >= 0, 1, -1)

In [None]:
w, N = .13 * np.pi, 100
plt.plot(np.sin(np.arange(0, N) * w), 'g', linewidth=3, label="sine")
stem(square_wave_naive(w, 100), label="thresholded sine")
plt.legend(loc="lower left")
plt.ylim(-1.2, 1.2);

Apparent problem: a variable _duty cycle_ over time, leading to audio artefacts. Worse at low sampling frequencies.

This is exactly the problem highlighted by Tsividis' paradox!

Here's a "cheap" way, that avoids trigonometric functions:
 * period $p = 2\pi / \omega_0 \in \mathbb{R}$
 * approximate period $P = \mathrm{round}(p) \in \mathbb{N}$

$$ 
    x[n] = \begin{cases} 
        +1 & \mbox{if $(n \mod P) \le (P/2)$} \\ 
        -1 & \mbox{otherwise} 
        \end{cases} 
$$ 

This approximate method causes _detuning_ , worse as the frequency increases

In [None]:
def square_wave_cheap(w, N):
    p = np.round(2 * np.pi / w)
    return np.where((np.arange(0, N) % p) >= (p/2), -1, 1)

In [None]:
# we plot the correct square wave in red to show the detuning of the "cheap" version
plt.plot(np.sin(np.arange(0, N) * w), 'g', linewidth=3, label="sine")
plt.plot(square_wave_naive(w, 100), 'red', label="thresholded sine")
stem(square_wave_cheap(w, 100), label="approximate square wave")
plt.legend(loc="lower left")
plt.ylim(-1.2, 1.2);

## A blast from the past

Here's a simple synthesizer

In [None]:
def note_to_freq(note):
    # note name and octave to frequency
    C4 = 261.63
    SEMITONES = {'C': 0, 'C#': 1, 'Db': 1, 'D': 2, 'D#': 3, 'Eb': 3, 'E': 4, 'F': 5, 'F#': 6, 'Gb': 6, 
                 'G': 7, 'G#': 8, 'Ab': 8, 'A': 9, 'A#': 10, 'Bb': 10, 'B': 11}  
    try:
        s = SEMITONES[note[:-1]]
        octave = int(note[-1]) - 4        
        return C4 * (2 ** octave) * (2.0 ** (s / 12.0))
    except KeyError:
        return 0

In [None]:
def play_notes(melody, time_scale=1, rate=32000, wave_engine=square_wave_cheap):
    # melody is a tuple of pairs, each pair containing the pitch and the duration
    #  of each note; time_scale gives the base length of a note of unit duration 
    s = []
    for note in melody:
        f = 2 * np.pi * note_to_freq(note[0]) / float(rate)
        N = int(note[1] * rate * time_scale)
        s = np.concatenate((s, wave_engine(f, N) if f > 0 else np.ones(N)))
    return s

And a simple tune, using the "cheap" square wave

In [None]:
tune = (('B4', 2), ('B5', 2), ('F#5', 2), ('D#5', 2), ('B5', 1), ('F#5', 3), ('D#5', 4), 
        ('C5', 2), ('C6', 2), ('G5', 2),  ('E5', 2),  ('C6', 1), ('G5', 3),  ('E5', 4),
        ('B4', 2), ('B5', 2), ('F#5', 2), ('D#5', 2), ('B5', 1), ('F#5', 3), ('D#5', 4), 
        ('D#5', 1), ('E5', 1), ('F5', 2), ('F5', 1), ('F#5', 1), ('G5', 2), ('G5', 1), 
        ('G#5', 1), ('A5', 2), ('B5', 4))

SF = 24000
jingle = play_notes(tune, time_scale=0.06, rate=SF)
play(jingle, rate=SF)

![pacman](img/pacman.gif)

If we lower the sampling rate we can hear the detuning effect more prominently (this should sound familiar to those who played with early video games):

In [None]:
SF=8000
play(play_notes(tune, time_scale=0.06, rate=SF), rate=SF)

What about the "expensive" square wave? 

In [None]:
SF=8000
play(play_notes(tune, time_scale=0.06, wave_engine=square_wave_naive, rate=SF), rate=SF)

pitches are now correct but we hear artefacts due to the varying duty cycles

We can improve the sound if we increase the sampling rate

In [None]:
SF=48000
play(play_notes(tune, time_scale=0.06, wave_engine=square_wave_naive, rate=SF), rate=SF)

**Exercise**: write a routine that generates the "best" discrete-time square wave for any freqency in $[-\pi, \pi]$

## Polyphony

Let's recover the actual full PacMan experience:

In [None]:
tune_bass = (('B2', 6), ('B3', 2), ('B2', 6), ('B3', 2), ('C3', 6), ('C4', 2), ('C3', 6), ('C4', 2), 
             ('B2', 6), ('B3', 2), ('B2', 6), ('B3', 2), 
             ('F#3', 4), ('G#3', 4), ('A#3', 4), ('B3', 4))

SF = 24000
pacman = jingle + play_notes(tune_bass, time_scale=0.06, rate=SF)
play(pacman, rate=SF)

### Adding two-level signals produces more than 2 levels...

In [None]:
plt.plot(pacman[480:900])
plt.ylim(-2.2, 2.2);

In [None]:
def distinct_values(x):
    v = set()
    for k in x:
        v.add(k)
    return sorted(list(v))


def is_two_level(x):
    v = distinct_values(x)
    print('the signal is', '' if len(v) == 2 else 'NOT', 'two-level; values: ', end='') 
    print(v)

In [None]:
# the monophonic tune is two-valued...
is_two_level(jingle)
# but the polyphonic tune is not!
is_two_level(pacman)

### Can we just re-quantize to one bit?

In [None]:
sq = np.where(pacman >= 0, 1, -1)
print(distinct_values(sq))
play(sq , rate=SF)

Another vintage sound! But not something we want to hear all the time...

(For the record: the PacMan console actually used a custom chip by Namco providing three independent wavetable synthesis channels - it did not use one-bit waveforms)

# Pulse modulation

To truly achieve polyphony via a two-level signal we need to take a page out of dithering.

## Duty cycle of a square wave

Periodic two-level signal:
 * period $P$ samples:
 * $C$ samples equal to $+1$ ($P-C$ equal to $-1$)
 * duty cycle
   $$
       D = \frac{C}{P}, \qquad 0 \leq D \leq 1
   $$

In [None]:
def square_wave_pwm(P, C, N=1):
    # build a variable duty cycle square wave for pulse width modulation
    period = -np.ones(P)
    period[:C] = 1
    sw = np.tile(period, N)
    return sw

In [None]:
def show_sqw(P, C, N=6, sw=square_wave_pwm):
    stem(sw(P, C, N))
    plt.ylim(-1.2, 1.2);
    plt.xlim(-2, P * N)
    plt.title(f'duty cycle: {C}/{P}')

In [None]:
for n, p in enumerate(((4, 2), (4, 3), (5, 3))):
    plt.subplot(1, 3, n+1)    
    show_sqw(*p)

Key fact: if $x[n]$ is a $P$-periodic square wave with duty cycle $D$

$$
    \bar{x} = \frac{1}{P}\sum_{n=k}^{k+P-1} x[n] = 2D -1
$$

 * for a balanced square wave, $D=1/2$, so the period average is zero
 * in general, $0 \leq D \leq 1 \Rightarrow -1 \leq \bar{x} \leq +1$

## Lowpass-filtering square waves

Let's try to filter square waves with different duty cycles with a simple lowpass filter

In [None]:
for n, p in enumerate(((4, 2), (4, 3), (5, 3))):
    N, alpha = 200, 0.98
    x = square_wave_pwm(*p, 50)
    y = sp.lfilter([1 - alpha], [1, -alpha], x)
    # average value
    x_bar = 2.0 * p[1] / p[0] - 1
    plt.subplot(1, 3, n+1)    
    plt.plot(x, 'green', y, 'red', [0, N], [x_bar, x_bar], 'blue')
    plt.ylim(-1.2, 1.2);
    plt.title(f'duty cycle: {p[1]}/{p[0]}, avg = {x_bar:.2f}')

We can thus "approximate" any value between -1 and 1 provided:
 * we produce a **fast** square wave with the right duty cycle
 * we lowpass the result
 
A square wave with period $P$ be used to approximate $P+1$ different values.

### Who does the filtering?

Three choices
 * use an explicit lowpass
 * exploit the fact that loudspeakers have a natural lowpass response
 * exploit the fact that we can't hear above 20 kHz

### How "fast" should the square wave be

The fast square wave introduces high-frequency content and we don't want to hear that.

For a sampling frequency $F_s$:
 * a square wave of period $P$ will have its first spectral line at  $f_1 = F_s/P$, regardless of duty cycle
 * $f_1$ should be in the stopband of the lowpass or outside of hearing range
 * this limits the maximum period (and thus the range of values we can approximate)

### Implicit oversampling

Replacing a sample value by $N$ periods of $P$ samples corresponds to an oversampling factor $K = NP$.

### Pulse width vs pulse density modulation

Since we are using lowpass filtering to obtain an average value, the order of positive and negative values in a period is irrelevant.
 * when all the values with the same sign are grouped together: pulse **width** modulation
 * when they alternate as much as possible: pulse **density** modulation
 
In audio we use pulse density because it spreads out more the spurious high-frequency content that needs to be filtered.


In [None]:
def square_wave_pdm(P, C, N=1):
    # build a variable duty cycle square wave for pulse density modulation
    period = -np.ones(P)
    ix = 0
    for n in range(0, C):
        period[int(ix)] = 1
        ix += P / float(C)
    sw = np.tile(period, N)
    return sw

In [None]:
plt.subplot(1, 2, 1)    
show_sqw(7, 3, sw=square_wave_pwm)
plt.subplot(1, 2, 2)    
show_sqw(7, 3, sw=square_wave_pdm)

## PacMan, again

Goal: encode the two-voice PacMan tune, without an explicit lowpass.
 * we need to encode three values, so $P \ge 2$
 * maximum sampling frequency on soundcard is $F_\max=96~\mathrm{kHz}$
 * if $P=2$, $f_1 = F_\max/2 = 48~\mathrm{kHz}$, outside of hearing range
 * original sampling frequency $F_s =24~\mathrm{kHz}$ 
 * we pick $N=2$: oversampling $K = NP = 4$
 * $(NP)F_s = 96 = F_\max$: perfect!

With $P=2$, three duty cycles possible:
 * $[+1, +1] \rightarrow +1$
 * $[+1, -1] \rightarrow 0$
 * $[-1, -1] \rightarrow -1$
 

In [None]:
v = 1
pacman1bit = np.ones(len(pacman) * 4)
for k in range(0, len(pacman)):
    if abs(pacman[k]) < 0.5: # floating tolerance
        pacman1bit[k*4:(k+1)*4] = [1, -1, 1, -1]
    else:
        pacman1bit[k*4:(k+1)*4] = np.sign(pacman[k]) * np.ones(4)
        
is_two_level(pacman1bit)

In [None]:
a = 600
b = 700
plt.plot(np.arange(a,b), pacman[a:b], 'blue', 
         np.arange(a*4,b*4)/4, pacman1bit[a*4:b*4], 'red')
plt.ylim(-2.2, 2.2);

In [None]:
play(pacman1bit, rate=96000)

In [None]:
multiplay([pacman1bit, pacman], rate=[96000, 24000], title=['1 bit @ 96 kHz', '2 bits @ 24 kHz'])

## The inaudibility of the modulation artefacts

most of the frequency content due to PWM is above hearing range

In [None]:
L = len(pacman1bit)
plt.plot(np.linspace(0, 48000, L//2), np.abs(np.fft.fft(pacman1bit / 2)[:L//2]))
M = len(pacman)
plt.plot(np.linspace(0, 12000, M//2), np.abs(np.fft.fft(pacman)[:M//2]));

## Four-Part Harmony

Generalization to $M$ voices:
 * the sum of $M$ two-level signals spans $M+1$ amplitude levels
 * the level values are $0, \pm 2, \pm 4, \ldots, \pm (M-2), \pm M$
 * minimum period for the fast square wave $P \ge M$
 * minimum oversampling factor is $M$

Let's try with this piece, synthesized at 24 kHz and converted to one-bit at 96 kHz

![Bist Du bei Mir](img/bdbm.png)

In [None]:
bdbm_1 = (('Bb4', 4), ('Eb5', 6), ('F5', 2), ('D5', 8), (' ', 4),  
          ('Eb5', 4), ('Ab4', 4), ('Ab4', 4), ('Ab4', 8), ('G4', 4), 
          (' ', 2), ('Bb4', 2), ('D5', 2), ('Bb4', 2), ('A4', 2), ('Bb4', 2), 
          ('F4', 2), ('Bb4', 2), ('D5', 2), ('Bb4', 2), ('A4', 2), ('Bb4', 2), 
          ('Eb4', 4), ('C5', 6), ('D5', 1), ('Eb5', 1),
          ('D5', 3), ('C5', 1), ('Bb4', 3), ('C5', 1), ('F4', 3), ('A4', 1), 
          ('Bb4', 12), )

bdbm_2 = (('G4', 4), ('G4', 4), ('A4', 4), ('Bb4', 8), (' ', 4), ('Bb4', 4), 
          ('F4', 4), ('F4', 4), ('F4', 8), (' ', 4), ('G4', 12), ('F4', 12), 
          (' ', 12), (' ', 8), ('Eb4', 4), (' ', 12),)  

bdbm_3 = ((' ', 12), ('F4', 8), (' ', 4), ('Eb4', 4), ('F4', 4), ('C3', 4), 
          ('Bb3', 4), ('D4', 4), ('Eb4', 4), (' ', 12), ('D4', 12), ('Bb3', 4), 
          ('F4', 4), ('A4', 4), ('Bb4', 4), ('G4', 4), ('Eb4', 4), ('D4', 12),  )

bdbm_4 = (('Eb3', 4), ('C3', 4), ('F3', 4), ('Bb2', 4), ('Bb3', 4), ('Ab3', 4), 
          ('G3', 4), ('F3', 4), ('Eb3', 4), ('D3', 4), ('Bb2', 4), ('Eb2', 4), 
          ('E2', 4), ('E2', 4), ('E2', 4), ('F2', 4), ('F2', 4), ('F2', 4), 
          ('G2', 4), ('A2', 4), ('F2', 4), ('Bb2', 4), ('Eb2', 4), ('F2', 4), ('Bb2', 12), )

In [None]:
SF=24000
s  = play_notes(bdbm_1, time_scale=0.2, rate=SF)
s += play_notes(bdbm_2, time_scale=0.2, rate=SF)
s += play_notes(bdbm_3, time_scale=0.2, rate=SF)
s += play_notes(bdbm_4, time_scale=0.2, rate=SF)

In [None]:
print(distinct_values(s))
play(s, rate=SF)

There's a bit of detuning due to the approximate pitch of the square waves but let's live with that.

In [None]:
def pdm(waveform, rate):
    MAX_RATE = 96000
    values = distinct_values(waveform)
    voices = P = len(values) - 1
    assert rate * voices <= MAX_RATE, 'conversion to PWM requires too large a sampling rate'
    assert len([v for v in values if v % 2 == 1 or abs(v) > voices]) == 0, \
        f'set of sample values not compatible with the sum of {voices} two-level signals'
    s = np.zeros(len(waveform) * voices)
    # now replace each sample with one period of a square wave with appropriate duty cycle
    # target duty cycle is D = x[n]/M => C = (x[n] + P)/2
    for n in range(0, len(waveform)):
        s[n*P:(n+1)*P] = square_wave_pdm(P, int((waveform[n] + P) / 2))
    return s, rate * voices 

In [None]:
sd, drate = pdm(s, SF)

is_two_level(sd)

play(sd, rate=drate)    

In [None]:
a = 150200
b = 150300
plt.plot(np.arange(a,b), s[a:b], 'blue', np.arange(a*4,b*4)/4, sd[a*4:b*4], 'red')
plt.ylim(-4.2, 4.2);

As the number of voices grows:
 * $P$ grows as the number of voices
 * $NP \le F_\max/F_s$
 * in the limit, $N=1$
 * too few samples for the lowpass filtering to be effective
 
We can add an explicit lowpass to improve sound (but lose high end)

In [None]:
b, a = sp.butter(8, 0.15)
IPython.display.Audio(sp.lfilter(b, a, sd), rate=drate)    

# The road to sigma-delta

Can we use pulse-width modulation to encode an arbitrary audio signal at one bit per sample?

Yes, but we need an extra ingredient called **feedback** . Let's see why

## Our test signal

<img width="200" style="float: right;" src="img/sob.jpg">


Let's use an excerpt from Bach's first Brandenburd concerto, as performed by [Wendy Carlos](https://en.wikipedia.org/wiki/Switched-On_Brandenburgs).

Audio has been converted to 16-bit PCM mono at 8 kHz, so that we can use a large oversampling factor later.

In [None]:
from scipy.io import wavfile

bc_sf, bc = wavfile.read('snd/brand1.wav')
# let's make it zero mean and full scale
bc = np.array(bc, dtype=float)
bc = bc - np.mean(bc)
bc = bc / np.max(np.abs(bc))
play(bc, rate=bc_sf)

As a reference point, let's hear what happens if we just downsample the original signal to one bit per sample: it's just awful, as a signal at 6 dB SNR should.

In [None]:
bc_1b = quantize(bc, 2)
play(bc_1b, rate=bc_sf)

## Simple things that don't really work
 
 * oversampled AD
 * naive PDM

But both methods point us in the right direction!

### Oversampled AD
<img width="600" style="float: right;" src="img/oversampled.png">

In standard oversampled AD:
 * oversample by $K$: signal spectral support shrinks by $K$
 * quantization noise assumed white, PSD independent of $K$
 * filter out-of-band noise
 * downsample by $K$
 
Ideal gain:
 * $\mathrm{SNR}_\mathrm{OS} = \mathrm{SNR} + 3\log_2 K ~~\mathrm{dB}$
 * equivalent bitrate $R + \log_4 K$ bits 

To go down to 1 bit from 16, $K \ge 4^{15} > 10^{9}$: way too much!

We can verify that oversampling does help, but not enough:

In [None]:
def interpolate(x, K):
    return K * sp.lfilter(*sp.butter(10, 1/K), np.kron(x, np.r_[1, np.zeros(K-1)]))

def decimate(x, K):
    return sp.lfilter(*sp.butter(10, 1/K), x)[::K]

In [None]:
K = 12
test = decimate(quantize(interpolate(bc, K), 2), K)
multiplay((bc_1b, test, quantize(bc, 2 ** 2)), rate=bc_sf, 
          title=('1bps resampling', f'1bps via {K}-times oversampling', '2bps resampling'))

Although the total data rate of the file is equivalent to a 12bps PCM signal, the audio quality is approximately equivalent to a 2bps signal as predicted by the theory.

Nevertheless, the idea of filtering out-of-band quantization noise remains valid!

### Direct PDM encoding

Naive approach: replace each sample in the original signal with $P$ alternating samples with the appropriate duty cycle.

 * $R$ bits per sample $\Rightarrow$ $2^R$ possible values
 * minimum period $P \ge 2^R$
 * oversampling factor at least $2^R$: way too much!
 
Nevertheless, the idea behind PDM remains valid!

## Introducing feedback: delta modulation
<img src="img/deltamod.png" width="500" style="float: right; margin: 0px 0px;" >

PDM worked well when there were only a few values to encode. Idea: encode the difference (aka the **delta**) between successive samples.

 * keep running sum of encoder's output via an integrator $H(z)$
 * compute difference with current value
 * output positive or negative value $\pm\tau$ to drive sum in the right direction (**feedback**)
 * if deltas are small, the system will be able to _track_ the input

### The discrete-time integrator

The upcoming idealized quantizers feed back to the input a running sum of all past outputs:

 * feedback signal at time $n$ is $r[n] = \sum_{m=-\infty}^{n-1} y[m]$
 * ideal integrator with delay (delay needed for realizability)
 * transfer function: $\displaystyle H(z) = \frac{z^{-1}}{1-z^{-1}}$

### Implementation and testing

In [None]:
def delta_mod(x, step):
    ret, acc = np.zeros(len(x)), 0
    for n in range(0, len(x)):
        ret[n] = step if x[n] - acc >= 0 else -step
        acc += ret[n]
    return ret

def delta_demod(y):
    return np.cumsum(y)

In [None]:
def show_delta_mod(x, step, K=1):
    y = delta_mod(x, step)
    x_hat = delta_demod(y)
    x_f = sp.lfilter(*sp.butter(8, 0.5 / K), x_hat)
    plt.plot(x, 'C0', label='input')
    plt.plot(y, 'C3', label='1-bit output')
    plt.plot(x_hat, 'C1', label='integrated output');        
    plt.plot(x_f, 'C2', label='filtered output')
    plt.plot(x, 'C0', linewidth=3)
    plt.legend();    

In [None]:
A = 0.95
x_slow = A * np.sin(2 * np.pi * 0.005 * np.arange(0, 250))
show_delta_mod(x_slow, step=0.1)

### The bandwidth problem

Unfortunately, performance is frequency-dependent. As the input signal grows faster, the difference between successive samples is too large and tracking fails.

In [None]:
x_fast = A * np.sin(2 * np.pi * 0.05 * np.arange(0, 100))
show_delta_mod(x_fast, step=0.1)

### Smoothing things out

How do we make the intersample differences small? Oversampling!
 * everything is a straight line if you look close enough
 * the higher the oversampling, the smaller the deltas
 * just remember to filter afterwards

In [None]:
K = 10
x_fast_oversampled = interpolate(x_fast, K)
show_delta_mod(x_fast_oversampled, step=0.1, K=K)

### The step size problem

Delta modulators are sensitive to signal amplitude and performance degrades when amplitude is comparable to step size

In [None]:
show_delta_mod(0.12 * x_fast_oversampled[:500], step=0.1, K=10)

We can fix _this_ case by reducing the step size

In [None]:
show_delta_mod(0.12 * x_fast_oversampled, step=0.01, K=10)

but in so doing we're back square one when signals are full-range: the small step size is too small to track fast signals effectively

In [None]:
show_delta_mod(x_fast_oversampled, step=0.01, K=10)

### The fundamental problem with delta modulation

 * delta operation is equivalent to _differentiation_
 * feedback loop is tracking input's _slope_ , not its amplitude
 * error on DC values same magnitude as step size

### The solution: add another integrator
<img src="img/sddi.png" width="500" style="float: right; margin: 0px 0px;" >

 * loop tracks the _derivative of the integral_ $\Rightarrow$ loop tracks amplitude
 * step size equal to max input amplitude
 * local _average_ of output tracks local average of input: recover input by lowpass filtering 

<img src="img/sddi.png" width="500" style="float: right; margin: 0px 0px;" >

Intuition in time domain:
 * oversampling (mandatory!) $\Rightarrow$ input locally "flat"  
 * integrator output $\approx Nx[n/N]$ (with $|x[n/N]| \le 1$)
 * encoder tries to adjust output's "local" duty cycle so that $\bar{x} = 2D-1 = x[n/N]$ 
 * lowpass filtering (and decimation) recovers $x[n]$

# Sigma-Delta encoding
<img src="img/sigmadelta.png" width="500" style="float: right; margin: 0px 0px;" >


Setup: 
 * analog input $F_s$-bandlimited 
 * oversample at $KF_s$

At each step:
 * compute running sum of encoder's output 
 * compute the difference (the **delta**) with the running sum of the input (the **sigma**)
 * output a _full scale_ value with the sign of the difference
 
Since the difference of the averages is the average of the difference, the encoder uses a single integrator.

## Implementation

Super easy:

In [None]:
def sigma_delta(x):
    ret, acc = np.zeros(len(x)), 0
    for n in range(0, len(x)):
        ret[n] = 1 if acc >= 0 else -1
        acc += x[n] - ret[n]
    return ret

We can verify that the sigma-delta encoder handles without difficulty the signals that were problematic to the delta modulator: 

In [None]:
def show_sigmadelta_mod(x, K=1):
    y = sigma_delta(x)
    x_f = sp.lfilter(*sp.butter(8, 0.5 / K), y)
    plt.plot(x, 'C0', label='input')
    plt.plot(y, 'C3', alpha=0.4, label='1-bit output')
    plt.plot(x_f, 'C2', label='filtered output')
    plt.plot(x, 'C0', linewidth=3)
    plt.legend();        

In [None]:
K = 10
show_sigmadelta_mod(interpolate(x_slow, K), K=K)

In [None]:
show_sigmadelta_mod(interpolate(x_fast, K), K=K)

In [None]:
show_sigmadelta_mod(0.12 * interpolate(x_fast, K), K=K)

## Let's hear some music

Let's use $K=12$ and lowpass the one-bit sequence above 4 kHz prior to playing

In [None]:
K = 12
bc_sd1 = sigma_delta(interpolate(bc, K))
is_two_level(bc_sd1)

In [None]:
multiplay((sp.lfilter(*sp.butter(8, 8000/96000), bc_sd1), quantize(bc, 2 ** 4)), rate=(K * bc_sf, bc_sf), title=('sigma-delta', '4-bit requantization'))    

## Noise shaping

To quantify the performace of sigma-delta let's look at it in the frequency domain

### Linearized model

Exact analysis is too complicated so we use a linearized model:
 * replace nonlinear element by $e[n]$ additive, independent white noise source
 * output quantization noise $\eta[n] = \hat{x}_\mathrm{1B}[n] - x[n]$ 
 
<br/>
<img src="img/sigmadelta.png" width="500" style="float: right; margin: 0px 0px;" >
<br/>
<img width="500" style="float: right;" src="img/sigmadeltalinearized.png">


### Overall SNR

 * assume $x[n] \sim \mathcal{U}[-1, 1]$;  $\sigma_x^2 = 1/3$
 * $e[n] = \mathrm{sign}(x[n] - \hat{x}[n])$ so $|e[n]| \le 1$
 * assuming $e[n]$ white, $\sigma_e^2 = 1/3$
 
$$
    \mathrm{SNR} = 0~\mathrm{dB}
$$
 
But:
 * we are interested in the SNR only over $[-F_s/2, F_s/2]$
 * if the _output_ noise $\eta[n]$ is not white, we can hope that most of it is outside the audio band
 

<br/>
<img src="img/sigmadelta.png" width="500" style="float: right; margin: 0px 0px;" >
<br/>
<img width="500" style="float: right;" src="img/sigmadeltalinearized.png">

 

Transfer functions
<img width="400" style="float: right;" src="img/sigmadeltalinearized.png">

\begin{align*}
    \hat{X}_\mathrm{1B}(z) &= \frac{H(z)}{1+H(z)}X_K(z) + \frac{1}{1+H(z)}E(z) \\ \\
         &= F(z)X_K(z) + G(z)E(z) 
\end{align*}

Using $H(z) = z^{-1}/(1-z^{-1})$:
 * signal transfer function: $\displaystyle F(z) = z^{-1}$
 * noise transfer function:  $\displaystyle G(z) = 1 - z^{-1}$

Noise shaping magnitude response: $ |G(e^{j\omega})| = 2|\sin(\omega/2)| $

Power spectral density of quantization noise $P_\eta(e^{j\omega}) = 4\sigma_e^2\, |\sin(\omega/2)|^2$

In [None]:
plt.axvspan(0, 1 / K, color='C2', alpha=0.5)
w = np.linspace(0, 1, 200)
plt.plot(w, 2 * np.sin(np.pi * w / 2))
plt.xticks([0, 1 / K, 1], ['0', '$\pi/K$', '$\pi$']);

Noise power in audio band
$$
    \sigma_{a}^2 = \sigma_e^2 \frac{\int_{-\pi/K}^{\pi/K} |G(e^{j\omega})|^2 d\omega}{\int_{-\pi}^{\pi} |G(e^{j\omega})|^2 d\omega} = \sigma_e^2 \left(\frac{2}{K} - \frac{2}{\pi}\sin(\pi/K)\right) \approx  \sigma_e^2 \frac{\pi^2}{3K^3}
$$

$$
    \mathrm{SNR}_\mathrm{SD1} = 10\log_{10}\frac{\sigma_x^2}{\sigma_a^2} \approx -5.17 + 9\log_2 K ~~\mathrm{dB}
$$

 * 1.5 bits for every doubling of the oversampling factor
 * for $K=12$, $\mathrm{SNR}_\mathrm{SD1} \approx 27.2~\mathrm{dB}$, equivalent to 4.5bps

## Higher-order sigma-delta
<img src="img/sigmadeltasecond.png" width="600" style="float: right; margin: 0px 0px;" >

 * use $L$ integrators in the loop
 * signal transfer function unchanged: $\displaystyle F(z) = z^{-1}$
 * noise transfer function:  $\displaystyle G(z) = (1 - z^{-1})^L$

### Higher-order noise shaping

In [None]:
plt.axvspan(0, 1 / K, color='C2', alpha=0.5)
w = np.linspace(0, 1, 200)
for n, order in enumerate(['first', 'second', 'third']):
    plt.plot(w, (2 * np.sin(np.pi * w / 2)) ** (n+1), label=f'{order} order sigma-delta')
plt.xticks([0, 1 / K, 1], ['0', '$\pi/K$', '$\pi$']);
plt.legend();

### Second-order example

 * $\mathrm{SNR}_\mathrm{SD2} \approx -12.86 + 15\log_2 K ~~\mathrm{dB}$
 * for $K=12$, $\mathrm{SNR}_\mathrm{SD2} \approx 41~\mathrm{dB}$, equivalent to 6.8bps

In [None]:
def sigma_delta2(x):
    ret, acc = np.zeros(len(x)), np.zeros(2)
    for n in range(0, len(x)):
        ret[n] = 1 if acc[1] >= 0 else -1
        acc[0] += (x[n] - ret[n])        
        acc[1] += (acc[0] - ret[n])        
    return ret

In [None]:
K = 12
bc_sd2 = sigma_delta2(interpolate(bc, K))

In [None]:
multiplay((sp.lfilter(*sp.butter(8, 8000/96000), bc_sd2), quantize(bc, 2 ** 6)), rate=(K * bc_sf, bc_sf), title=('sigma-delta', '6-bit requantization'))    

In [None]:
play(decimate(bc_sd2, K), rate=bc_sf)

## Super Audio CD

[Super Audio CD](https://en.wikipedia.org/wiki/Super_Audio_CD) format:
 * 64-times oversampling, $F_s = 2822.4~\mathrm{kHz}$
 * fifth-order sigma-delta
 * 5.6 Mb/s data rate (stereo) 
 * SNR = 120 dB
 * 20 Hz to 50 kHz effective bandwidth


# ADC and DAC

Sigma-delta is the de-facto technology in consumer-grade AD and DA converters. Here is why

## ADC 
<img src="img/adc.png" width="600" style="float: right; margin: 0px 0px;" >

Acquire signal via sigma delta and then convert to multi-bit. Advantages
 * anti-alias analog filter need not be sharp
 * sigma delta quantizer easy and cheap
 * downsampling cheap in discrete time


## DAC
<img src="img/dac.png" width="600" style="float: right; margin: 0px 0px;" >

Convert digital multi-bit signal to oversampled one-bit prior to interpolation. Advantages:
 * upsampling easy in discrete time
 * digital sigma delta modulator easy 
 * extremely cheap analog converter (two-level zero-order hold)
 
However, despite upsampling, analog lowpass needs to be sharp because of noise modulation. Common solution is to use multi-bit sigma-delta

# Things we left out

 * what about dithering in sigma delta?
 * limit cycles and parasitic tones
 * slew rate effects (as in the delta modulator)
 * stability of higher-order loops