# <center>MUSIC INFORMATION RETRIEVAL</center>
## <center>Mel-frequency cepstral coefficients (MFCCs)</center>      

In [None]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt

import librosa
import librosa.display

import IPython.display as ipd

librosa.__version__

**NOTE:** *The following cell is needed to download example audio files.*

In [None]:
!pip install wget

In [None]:
import wget

### About this notebook

We will explore the process to compute Mel-frequency Cepstral Coefficients (MFCCs). We will also check that these features are somehow extracting the shape of the spectrum by listening to an audio signal reconstructed from the MFCCs and compare it to the original one.

### How to run the notebook
You can download the notebook and run it locally in your computer.

You can also run it in Google Colab by using the following link.

<table align="center">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/mrocamora/mir_course/blob/main/notebooks/MIR_course-MFCC_example.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

### Part 1 - computing MFCCs
The following steps are needed to compute the MFCCs.

1. Compute a power spectrogam
2. Apply a Mel-filterbank to get a Mel-spectrogram
3. Apply log to convert power to dB
4. Compute the Discrete Cosine Transform (DCT)

The code also computes and plots the Mel filter bank.

In [None]:
# download audio file
wget.download('https://github.com/mrocamora/mir_course/blob/main/audio/superstition.wav?raw=true')

In [None]:
# read the audio file
filename = 'superstition.wav'

y, sr = librosa.load(filename)

In [None]:
# plot audio signal
plt.figure(figsize=(12,8))
ax1 = plt.subplot(2, 1, 1)
librosa.display.waveshow(y, sr=sr)
plt.title('audio waveform')
plt.tight_layout()

In [None]:
ipd.Audio(y, rate=sr)

In [None]:
# 1. Compute spectrogam from STFT
n_fft = 2048
Y = librosa.stft(y, win_length=1024, hop_length=512, n_fft=n_fft, window='hann')
S = np.abs(Y)**2

# 2. apply mel-filterbank to combine FFT bins into Mel-frequency bins
# number of mel-frequency bands
n_mels = 128
# maximum frequency for the analysis
fmax = 4000 
# compute mel-spectrogram
M = librosa.feature.melspectrogram(S=S, n_mels=n_mels, fmax=fmax)

# 3. apply log to convert power to dB
M_log = librosa.power_to_db(M)

# 4. apply DCT and return first n_mfcc coefficients
# number of MFCC coefficients 
n_mfcc = 20
# compute MFCCs from mel-spectrogram
mfccs = librosa.feature.mfcc(S=M_log, n_mfcc=n_mfcc)

# NOTE: the following function is a wrapper for all of the above
# mfccs = librosa.feature.mfcc(y=y, n_mfcc=n_mfcc)

In [None]:
# compute and plot the Mel filter bank
melfb = librosa.filters.mel(sr=sr, n_fft=n_fft, fmax=fmax, n_mels=n_mels)
freqs = librosa.fft_frequencies(n_fft=n_fft)

plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
librosa.display.specshow(melfb, x_axis='linear')
plt.xlim([0, fmax])
plt.ylabel('Mel filter')
plt.title('Mel filter bank')
plt.subplot(1, 2, 2)
plt.plot(freqs, melfb.T)
plt.title('Mel filter bank')
plt.xlabel('Frequency [Hz]')
plt.xlim([0, fmax])
plt.tight_layout()

In [None]:
# plot mel-spectrogram and MFCCs
ind_max = np.argmax(freqs > fmax)
plt.figure(figsize=(12, 8))
plt.subplot(2, 1, 1)
librosa.display.specshow(librosa.power_to_db(S[:ind_max, :]), y_coords=freqs[:ind_max],
                         y_axis='linear')#, x_axis='time')
plt.title('spectrogram')
plt.subplot(2, 1, 2)
librosa.display.specshow(M_log, x_axis='time', y_axis='mel', sr=sr, fmax=fmax)
plt.title('mel-spectrogram')
plt.tight_layout()

In [None]:
# plot MFCCs
ind = [4, 14]

plt.figure(figsize=(12, 8))
plt.subplot(2, 1, 1)
librosa.display.specshow(mfccs, x_axis='time')
plt.title('MFCC (cofficients ' + str(0) + ' to ' + str(n_mfcc) + ')')
plt.tight_layout()

plt.subplot(2, 1, 2)
librosa.display.specshow(mfccs[ind[0]:ind[1], :], x_axis='time')
plt.title('MFCC (cofficients ' + str(ind[0]) + ' to ' + str(ind[1]) + ')')
plt.tight_layout()

### Parte 2 - audio signal reconstruction from the MFCCs

Finally, the process is inverted in order to reconstruct an audio signal from the MFCCs.

To do that, we follow these steps. 

1. First, we get the Mel-spectrogram by applying the inverse DCT and inverting the logarithm. 
2. Then, we change the frequency mapping from Mel to linear, in order to get the traditional spectrogram. 
3. Finally, an audio signal is obtained from the spectrogram, using a fast implementaton of the Griffin-Lim algorithm [1][2].

The audio signal we obtain indicates the MFCCs capture an estimation of the spectral envelope from the original audio signal.

[1] D. W. Griffin and J. S. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Trans. ASSP, vol.32, no.2, pp.236–243, Apr. 1984.

[2] Perraudin, N., Balazs, P., & Søndergaard, P. L. “A fast Griffin-Lim algorithm,” IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (pp. 1-4), Oct. 2013.

In [None]:
# 1. Invert Mel-frequency cepstral coefficients to approximate a Mel power spectrogram.
# Inverse DCT is applied to the MFCCs followed by dB to power spectrum mapping. 
W = librosa.feature.inverse.mfcc_to_mel(mfccs, n_mels=n_mels)

# 2. Approximate STFT magnitude from a Mel power spectrogram.
X = librosa.feature.inverse.mel_to_stft(W)

# 3. Approximate magnitude spectrogram inversion using the “fast” Griffin-Lim algorithm. 
x = librosa.griffinlim(X)

In [None]:
# plot audio signal
plt.figure(figsize=(12,8))
ax1 = plt.subplot(2, 1, 1)
librosa.display.waveshow(x, sr=sr)
plt.title('audio waveform')
ax2 = plt.subplot(2, 1, 2)
librosa.display.specshow(librosa.power_to_db(X[:ind_max, :]**2), y_coords=freqs[:ind_max],
                         y_axis='linear', x_axis='time')
plt.title('spectrogram')
plt.tight_layout()

In [None]:
ipd.Audio(x, rate=sr)