# Rollers Pitch-Shift Implmenetation

The "Rollers" pitch-shifting algorithm is based on narrow subband frequency shifting.
For yielding low latencies, an IIR filter bank is used.

The original IIR filter bank implementation in Java uses Butterworth bandpass filters with crossovers at -12dB.

Things to figure out:

* scipy filter design

* How to design a Butterworth filter bank

* Frequency shifting

* Putting it all together

Note: Keras GPU Filter Bank


## Scipy filter design

analog filter design:

In [None]:
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
import IPython.display as ipd
from scipy.io import wavfile

plt.rcParams['figure.figsize'] = [15, 3]

# filter coeffs
b, a = signal.butter(4, 100, 'low', analog=True)
print('b:', b, 'a:', a)
# frequency response
w, h = signal.freqs(b, a)

plt.semilogx(w, 20 * np.log10(abs(h)))
plt.title('Butterworth analog filter frequency response')
plt.xlabel('Frequency [radians / second]')
plt.ylabel('Amplitude [dB]')
plt.margins(0, 0.1)
plt.grid(which='both', axis='both')
plt.axvline(100, color='green') # cutoff frequency
plt.show()

digital fiter design:

Default output format is ‘ba’ for backwards compatibility, but ‘sos’ should be used for general-purpose filtering.

In [None]:
fs = 44100
fc = 100
# normalized cutoff frequency
wc = fc / (fs / 2)
b, a = signal.butter(4, wc, 'low', analog=False)
print('b:', b, 'a:', a)

w, h = signal.freqz(b, a)
plt.semilogx(w, 20 * np.log10(abs(h)))
plt.title('Butterworth digigtal filter frequency response')
plt.xlabel('normalized Frequency (pi is nyquist freq))')
plt.ylabel('Amplitude [dB]')
plt.axvline(wc*np.pi, color='green') # cutoff frequency
plt.show()

Now we use the recommended _second-order sections_ format when filtering, to avoid numerical error with transfer function (ba) format.

In [None]:
b, a = signal.butter(4, fc, 'low', fs=fs, output='ba')
sos = signal.butter(4, fc, 'low', fs=fs, output='sos')

w, h = signal.freqz(b, a)
plt.semilogx(w, 20*np.log10(np.abs(h)))
plt.show()

## Filtering

In [None]:
x = signal.unit_impulse(1024)

y_tf  = signal.lfilter(b, a, x) # ba format
y_sos = signal.sosfilt(sos, x)  # sos format

plt.plot(y_tf, 'r', label='TF')
plt.plot(y_sos, '--k', label='SOS')
plt.legend(loc='best')
plt.show()

In [None]:
from scipy.fft import rfft, rfftfreq
freq = rfftfreq(x.size, 1 / fs)
plt.semilogx(freq, 20*np.log10(np.abs(rfft(y_sos))))
plt.show()

### How to design a band pass filter

In [None]:
def butter_bp(lowcut, highcut, fs, order=4, t='sos'):
    f_nyq = 0.5 * fs
    low = lowcut / f_nyq
    high = highcut / f_nyq
    return signal.butter(order, [low, high], btype='band', output=t)

In [None]:
lowcut = 500
highcut = 1000

b, a = butter_bp(lowcut, highcut, fs, order=5, t='ba')

# plot
w, h = signal.freqz(b, a)
plt.semilogx((fs * 0.5 / np.pi) * w[1:], 20*np.log10(np.abs(h[1:])))
plt.ylim((-40, 5))
plt.show()

## Filter Bank Design

Let's design a constant Q Butterworth bandpass filter bank.
There are different possible center frequency spacings:

* [Third-Octave Filter Banks](https://ccrma.stanford.edu/realsimple/aud_fb/Third_Octave_Filter_Banks.html)

* [ERB Filter Bank](https://ccrma.stanford.edu/realsimple/aud_fb/Equivalent_Rectangular_Bandwidth_ERB.html)

* [Mel Scale](https://labrosa.ee.columbia.edu/doc/HTKBook21/node54.html)

* Bark Scale

Let's start with a **third-octave filter bank**:

In [None]:
fs = 44100
n = 28

# third-octave filter bank
freq_offset = 2
k = np.arange(n+2) - n // 2 - freq_offset

# center frequencies are defined relative to a bandpass with center frequency at 1kHz
f_cs = np.power(2, k / 3) * 1000
print('f_cs:', f_cs)

f_chs = [] # high cutoff frequencies
f_cls = [] # low cutoff frequencies
filters = []
for k in range(1, f_cs.size-1):
    f_chs.append(np.sqrt(f_cs[k] * f_cs[k+1]))
    f_cls.append(np.sqrt(f_cs[k-1] * f_cs[k]))
    
for k in range(f_cs.size-2):
    sos = butter_bp(f_cls[k], f_chs[k], fs, order=4, t='sos')
    filters.append(sos)
    
# plot
for sos in filters:
    w, h = signal.sosfreqz(sos, worN=10000)
    plt.semilogx((fs * 0.5 / np.pi) * w[1:], 20*np.log10(np.abs(h[1:])))
    plt.ylim((-100, 5))
    plt.xlim((10, 20000))
    plt.ylabel('H [dB]')
    plt.xlabel('f [Hz]')
    plt.title('third-octave filter bank')
plt.show()

Note that the highest and lowest center frequency in `f_cs` does not exist.
There are $n$ filters and $n+2$ center frequencies.

Let's test the filter bank with white noise.

In [None]:
noise = np.random.normal(0, 1, 1*fs)

def plot_spec(sig, fs, name=""):
    freq = rfftfreq(sig.size, 1 / fs)
    plt.semilogx(freq, 20*np.log10(np.abs(rfft(sig))))
    plt.title(name)
    plt.show()

plot_spec(noise, fs, "white noise spectrum")

In [None]:
num = 15
print("f_c:", f_cs[num+1])
sos = filters[num]
filtered_noise = signal.sosfilt(sos, noise)

plot_spec(filtered_noise, fs, "filtered noise spectrum")

Now we apply the filter bank and reconstruct the input signal

In [None]:
in_sig = noise

# generate filtered signals
filt_sigs = []
for sos in filters:
    filt_sigs.append(signal.sosfilt(sos, in_sig))

plot_spec(filt_sigs[15], fs, "filtered noise spectrum")

Now we would do some kind of processing of the individual bands...

In [None]:
# and them add them together
out_sig = np.zeros(filt_sigs[0].size)
for filt_sig in filt_sigs:
    out_sig += filt_sig

plot_spec(out_sig, fs, "reconstructed signal spectrum")

So the filter bank works in the audio range.
We lose information below 20Hz and above 20kHz, but that is totally fine.

This is now a prototype of the filter bank we would actually use, because there should be more bands for better audio quality.
But we will stick with it by now for convenience.

Let's check the audio quality of just the filter bank with an audio signal:

In [None]:
path = "../../samples/Toms_diner.wav"
fs, audio = wavfile.read(path)

# plot and play
plt.plot(audio)
plt.title("original")
ipd.Audio(audio, rate=fs)

In [None]:
# separation
filt_sigs = []
for sos in filters:
    filt_sigs.append(signal.sosfilt(sos, audio))

# reconstruction
out_sig = np.zeros(filt_sigs[0].size)
for filt_sig in filt_sigs:
    out_sig += filt_sig

# plot and play
plt.plot(out_sig)
plt.title("reconstructed")
ipd.Audio(out_sig, rate=fs)

The reconstructed signal sounds fine, but there is the audible _downward chirp artifact_ present, which is mentioned in the paper of the Rollers algorithm.
This is a result of the resonance of the filters.

So the audio quality of the prototype third-octave filter bank using 28 bands is OK.

## Frequency Shifting

The pitch shifting in the rollers algorithm is done by _frequency shifting_ of the bands of the filter bank.
The frequency shifting is done with _single sideband modulation_.