# <center> Sound, Speech and Music Processing</center>
## <center> Introduction to the Phase Vocoder</center>      

In [None]:
%matplotlib inline

import math
import time
import numpy as np
import matplotlib.pyplot as plt

from scipy import signal
from scipy.io import wavfile

import IPython.display as ipd

**NOTE:** *The next two cells are only needed to download the sample file. Ignore them if you are going to work with your own audio files.*

In [None]:
!pip install wget

In [None]:
import wget

### How to run the notebook

The notebook can be downloaded and run locally on a computer.

Or it can also be run on Google Colab using the following link.
<table align="center">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/mrocamora/audio-dsp/blob/main/notebooks/SSMP-intro_phase_vocoder.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

###  Introduction

This exercise serves as an introduction to the phase vocoder algorithm. An audio signal is analyzed using the STFT and then it is reconstructed by anti-transforming the DFT of each frame and combining them using the Overlap-Add method.

The audio signal to be processed is then loaded.

In [None]:
# download audio file
wget.download('https://github.com/mrocamora/audio-dsp/blob/main/audio/singing_voice.wav?raw=true')

In [None]:
# read the audio file to process (from https://openairlib.net/)
filename = 'singing_voice.wav'
# filename = 'trumpet.wav'

# read audio file
fs, x = wavfile.read(filename)

# normalize maximum (absolute) amplitude
x = x / np.max(abs(x)) * 0.9

In [None]:
# time corresponding to the audio signal
time_x = np.arange(0, x.size)/fs

# plot the audio signal waveform
plt.figure(figsize=(12,6))
ax1 = plt.subplot(2, 1, 1)
plt.plot(time_x, x)
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')

In [None]:
ipd.Audio(x, rate=fs)

###   Part 1

In this first part we analyze the effect of the window and the restrictions that must be fulfilled according to the **overlap-add** method.

Fill in the code provided below and follow the steps below.

1. Properly accumulate the analysis windows according to the **overlap-add** algorithm.
2. Calculate the amplitude scaling factor $C$, for the case in which the overlapping of the windows is constant.
3. Change the time decimation factor ($R$) and analyze the results. In particular try for $\frac{1}{4}L$ and $\frac{3}{4}L$.
4. Analyze the result for other smoothing windows (e.g. Hamming, Blackman) using the same decimation factors.


In [None]:
# length of the input signal
M = x.size;

# length of the analysis window in samples
L = 2048

# hop size in samples.
R = int(L/2)

# total number of analysis frames
num_frames = int(np.floor((M-L)/R))

# analysis window
window = signal.windows.get_window('hann', L)

# overlap-add (OLA) of the analysis windows
olawin = np.zeros((num_frames-1)*R+L)

# for each analysis frame
for ind in range(num_frames):
    
    # initial index of current window
    n_ini = ind * R
    
    # overlap-add the window
    # olawin[??] =

# compute the amplitude scaling factor
# C = 


print("C = ", C)
print("max(olawin) = ", max(olawin))

# plot the analysis window
plt.figure(figsize=(12,6))
ax1 = plt.subplot(2, 1, 1)
plt.plot(window, 'r')
plt.ylabel('Amplitude')
plt.xlabel('Time (samples)')
plt.title('Analysis window')

# plot the overlap-add of the analysis windows
plt.figure(figsize=(12,6))
ax1 = plt.subplot(2, 1, 1)
plt.plot(olawin)
plt.xlabel('Time (samples)')
plt.ylabel('Amplitude')
plt.title('Overlap-add of the analysis windows')


### Part 2

The following code implements the analysis stage of the **phase-vocoder** algorithm, i.e. an STFT.

Fill in the code given below by following the steps below.

1. Calculate the DFT of each signal frame applying a smoothing window.
2. Calculate the frequency value of each bin in radians.
3. Calculate the time instant in samples of each frame.


In [None]:
def analysis_STFT(x, L=2048, R=256, win='hann'):
    """ compute the analysis phase of the phase vocoder, i.e. the STFT of the input audio signal
    
    Parameters
    ----------
    x : numpy array
        input audio signal (mono) as a numpy 1D array.
    L : int
        window length in samples.
    R : int
        hop size in samples.
    win : string
          window type as defined in scipy.signal.windows.    
        
    Returns
    -------
    X_stft : numpy array
             STFT of x as a numpy 2D array.
    omega_stft : numpy array
                 frequency values in radians.
    samps_stft : numpy array
                 time sample at the begining of each frame.

    """
    
    # length of the input signal
    M = x.size;      
    
    # number of points to compute the DFT (FFT)
    N = L
    
    # analysis window
    window = signal.windows.get_window(win, L)
   
    # total number of analysis frames
    num_frames = int(np.floor((M - L) / R))

    # initialize stft
    X_stft = np.zeros((N, num_frames), dtype = complex)
    
    # process each frame
    for ind in range(num_frames):

        # initial and ending points of the frame
        n_ini = int(ind * R)
        n_end = n_ini + L

        # signal frame
        # xr = 

        # save DFT of the signal frame
        # X_stft[:, ind] = 
        
    # frequency values in radians    
    # omega_stft = 

    # time sample at the center of each frame
    # samps_stft = 
 
    return X_stft, omega_stft, samps_stft

### Part 3

Once the implementation of the `analysis_STFT` function is complete, follow the steps below.

1. Run the `analysis_STFT` function for different values of $L$ and $R$ and analyze the result on the spectrogram.
2. What does the $L$ parameter control? What does the $R$ parameter control?
3. What relationship must $L$ and $R$ have? Because?

In [None]:
# window length in samples
L = 2048
# hop size in samples
R = 256

# compute STFT
X_stft, omega_stft, samps_stft = analysis_STFT(x, L, R, win='hann')

# max frequency index
ind_fmax = int(X_stft.shape[0]/2)+1
# frequency values (Hz)
stft_freqs = omega_stft[:ind_fmax]*fs/(2*np.pi)
# time values of the stft
stft_time = samps_stft/fs

plt.figure(figsize=(12,8))
ax1 = plt.subplot(2, 1, 1)
plt.pcolormesh(stft_time, stft_freqs, 20*np.log10(np.abs(X_stft[:ind_fmax, :])), cmap='jet', shading='auto')
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')


### Part 4

The following code implements the synthesis stage of the STFT.

Fill in the code given below and by following the steps below.

1. Calculate the reconstruction of each signal frame.
2. Accumulate successive frames according to the **overlap-add** method.
3. Modify the amplitude of the obtained signal by the window compensation factor.

In [None]:
def synthesis_STFT(X_stft, L=2048, R=256, win='hann'):
    """ compute the synthesis using the IFFT of each frame combined with overlap-add
    
    Parameters
    ----------
    X_stft : numpy array
             STFT of x as a numpy 2D array.
    L : int
        window length in samples.
    R : int
        hop size in samples.
    win : string
          window type as defined in scipy.signal.windows.    
        
    Returns
    -------
    x : numpy array
        output audio signal (mono) as a numpy 1D array.
        
    """
    
    # number of frequency bins
    N = X_stft.shape[0];      
 
    # analysis window
    window = signal.windows.get_window(win, L)
   
    # total number of analysis frames
    num_frames = X_stft.shape[1]

    # initialize otuput signal in the time domain
    y = np.zeros(num_frames * R + L)
    
    # process each frame
    for ind in range(num_frames):

        # reconstructed signal frame
        # yr = 

        # initial and ending points of the frame
        # n_ini = 
        # n_end = 

        # overlap-add the signal frame
        # y[n_ini:n_end] = 
        
    # compute the amplitude scaling factor
    # C = 
    
    # compensate the amplitude scaling factor
    y /= C
    
    return y

### Part 5

Once the implementation of the `synthesis_STFT` function is complete, follow the steps below.

1. Run the `analysis_STFT` function and then the `synthesis_STFT` function using $L=2048$ and $R=256$.
2. Evaluate the reconstruction in terms of the waveform. Evaluate the reconstruction aurally.
3. Run the `analysis_STFT` function using $L=2048$ and $R=256$. Run the `synthesis_STFT` function using $L=2048$ and $R=512$.
4. Before listening to the output, indicate what type of temporary modification you expect to occur. A temporary shortening or lengthening?
5. Evaluate the reconstruction in terms of the waveform. Evaluate the reconstruction aurally.
6. Repeat part 3 onwards using $L=2048$ and $R=128$.

In [None]:
# window length in samples
L = 2048
# hop size in samples
R = 256

# compute STFT
X_stft, omega_stft, samps_stft = analysis_STFT(x, L, R, win='hann')

# hop size in samples
R = 256

# compute the synthesis from the STFT
y = synthesis_STFT(X_stft, L, R, win='hann')

In [None]:
# time corresponding to the audio signal
time_y = np.arange(0, y.size)/fs

# plot the audio signal waveform
plt.figure(figsize=(12,6))
ax1 = plt.subplot(2, 1, 1)
plt.plot(time_x, x)
plt.ylabel('Amplitude')
ax1 = plt.subplot(2, 1, 2)
plt.plot(time_y, y)
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')

In [None]:
ipd.Audio(x, rate=fs)

In [None]:
ipd.Audio(y, rate=fs)