`description of the MFCC extraction technique`:

`Preprocessing:` The audio signal is typically preprocessed by applying a window function to segments of the audio waveform. This helps reduce spectral leakage and allows for more accurate analysis.

`Spectrum Calculation:` The Fast Fourier Transform (FFT) is applied to the windowed segments to obtain the power spectrum of the audio signal. The power spectrum represents the distribution of signal energy across different frequencies.

`Mel Filterbank:` The power spectrum is passed through a bank of filters known as the Mel filterbank. These filters are spaced on the Mel scale, which is a perceptual scale that approximates the human auditory system's frequency perception. The filterbank measures the energy within specific frequency bands.

`Log Compression:` The logarithm of the filterbank energies is calculated to compress the dynamic range. This mimics the logarithmic perception of loudness by the human auditory system.

`Discrete Cosine Transform (DCT):` A Discrete Cosine Transform is applied to the log-compressed filterbank energies. The DCT decorrelates the filterbank coefficients and provides a compact representation of the spectral envelope of the audio signal.

`Feature Extraction:` The resulting DCT coefficients are the MFCCs. Typically, a subset of the coefficients is selected for further analysis, as they capture the most relevant information about the spectral characteristics of the audio signal.

In [7]:
import librosa
import librosa.display
import IPython.display as ipd
import os
import numpy as np

In [8]:
for file in os.listdir('audio data/'):
    print(file)

test3.wav
test1.wav
test4.wav
test2.wav


In [23]:
ipd.Audio('audio data/test5.wav')


In [24]:
def feature_extraction(file_path):
    # load the audio file
    x, sample_rate = librosa.load(file_path, res_type='kaiser_fast')
    # extract features from the audio
    mfcc = np.mean(librosa.feature.mfcc(y=x, sr=sample_rate, n_mfcc=40).T, axis=0)
    return mfcc

In [25]:
features = {}
directory = 'audio data/'
for audio in os.listdir(directory):
    audio_path = directory+audio
    features[audio_path] = feature_extraction(audio_path)


In [26]:
features[audio_path]

array([-4.7075894e+02,  9.0860954e+01,  4.5120220e+00, -6.1391897e+00,
       -1.7515783e+00,  1.3309403e+01, -1.8820751e+01,  7.8064570e+00,
       -1.3678240e+01,  3.6249387e+00, -1.3324854e+01, -3.3700056e+00,
       -4.8521972e+00,  1.5972348e+00,  1.1914641e+00,  5.0755329e+00,
       -4.6084862e+00,  1.3988231e+01, -2.4377925e+00,  7.0714647e-01,
        1.4936173e+00,  1.7917719e+00, -1.3505411e+00,  4.7431974e+00,
       -5.0176950e+00,  5.1214480e+00, -4.4428739e-01,  2.2390256e+00,
       -9.6838605e-01,  2.6494498e+00,  3.5272431e+00,  7.1048231e+00,
        8.9582357e+00,  7.7176499e+00,  8.0310383e+00,  6.1320596e+00,
        7.3456974e+00,  7.6280413e+00,  8.8363714e+00,  9.9106035e+00],
      dtype=float32)