#  <center> MUSIC INFORMATION RETRIEVAL</center>
## <center> Mel-filterbank</center>      

In [None]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt

from scipy.io import wavfile

import IPython.display as ipd

**NOTE:** *The following cell is needed to download example audio files.*

In [None]:
!pip install wget

In [None]:
import wget

**NOTE:** *To install [librosa](https://librosa.org/).*

In [None]:
!pip install librosa

In [None]:
import librosa
import librosa.display

In [None]:
librosa.__version__

### About this notebook

We will study filterbanks used to simulate the **frequency selectivity of the auditory system**, in particular its **non-linear distribution** and its **variable bandwidth**. This type of filter bank is frequently used as a first stage in audio processing tasks and allows the construction of an adequate representation of the audio signal.

The proposed task consists of studying the **mel-scale filter bank** implemented in the library [librosa](https://librosa.org/), analyzing its parameters and its function in the design. Then the filterbank is applied to an audio signal and the effect of the value of the parameters on the spectral representation obtained is analyzed.

### How to run the notebook
You can download the notebook and run it locally in your computer.

You can also run it in Google Colab by using the following link.

<table align="center">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/mrocamora/mir_course/blob/main/notebooks/MIR_course-mel_filterbank_example.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

### Get an audio file

In [None]:
# download audio file to use
wget.download('https://github.com/mrocamora/mir_course/blob/main/audio/superstition.wav?raw=true')

In [None]:
# read the audio file
filename = 'superstition.wav'
y, sr = librosa.load(filename)

# play audio
ipd.Audio(y, rate=sr)

In [None]:
# plot audio signal
plt.figure(figsize=(12,8))
ax1 = plt.subplot(2, 1, 1)
librosa.display.waveshow(y, sr=sr)
plt.title('audio waveform')
plt.tight_layout()

### Part 1. Mel-filterbank

In what follows, a mel scale filter bank is designed using [librosa](https://librosa.org/). Study the parameters that the function receives for the design of the filter bank, analyze the result obtained and answer the following questions. It may be helpful to change the number of filters in the bank.

   1. What is the center frequency distribution of the bank filters?
   2. What shape does the frequency response of each filter have?
   3. How does the bandwidth of the filters vary as the frequency increases?
   4. In what frequency regions does the filter bank have more frequency resolution?
   5. How does the gain of filters vary with frequency? What type of normalization does it correspond to?

The following code defines the parameters of the filterbank.

In [None]:
# number of DFT points
n_fft = 2048

# number of mel-frequency bands
n_mels = 128

# maximum frequency for the analysis
fmax = 4000 

Next, the filter bank is constructed and its central frequency and the magnitude of the frequency response of each filter are graphically represented.

In [None]:
# compute and plot the Mel filter bank
melfb = librosa.filters.mel(sr=sr, n_fft=n_fft, fmax=fmax, n_mels=n_mels)
freqs = librosa.fft_frequencies(n_fft=n_fft)

plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
librosa.display.specshow(melfb, x_axis='linear')
plt.xlim([0, fmax])
plt.ylabel('Mel filter')
plt.title('Mel filter bank')
plt.subplot(1, 2, 2)
plt.plot(freqs, melfb.T)
plt.title('Mel filter bank')
plt.xlabel('Frequency [Hz]')
plt.xlim([0, fmax])
plt.tight_layout()

### Part 2. Mel-spectrogram
In the next cell, a bank of mel filters with the same characteristics is applied to the spectrogram of the audio signal, to produce a mel spectrogram. Change the filter bank parameters and compare the original spectrogram and the mel spectrogram.

In particular consider the following.

   1. What is the frequency resolution of the original spectrogram?
   2. In what frequency range does the mel spectrogram have the most resolution?

In [None]:
# 1. Compute spectrogam from STFT
Y = librosa.stft(y, win_length=1024, hop_length=512, n_fft=n_fft, window='hann')
S = np.abs(Y)**2

# 2. apply mel-filterbank to combine FFT bins into Mel-frequency bins
# compute mel-spectrogram
M = librosa.feature.melspectrogram(S=S, n_mels=n_mels, fmax=fmax)

# 3. apply log to convert power to dB
M_log = librosa.power_to_db(M)

In [None]:
# plot spectrogram and mel-spectrogram
ind_max = np.argmax(freqs > fmax)
plt.figure(figsize=(12, 8))
plt.subplot(2, 1, 1)
#librosa.display.specshow(librosa.power_to_db(S[:ind_max, :]), y_coords=freqs[:ind_max], y_axis='linear')
librosa.display.specshow(librosa.power_to_db(S), y_coords=freqs, y_axis='linear')
ax=plt.gca()
ax.set_ylim([0, fmax])
plt.title('spectrogram')
plt.subplot(2, 1, 2)
librosa.display.specshow(M_log, x_axis='time', y_axis='mel', sr=sr, fmax=fmax)
plt.title('mel-spectrogram')
plt.tight_layout()