### **Intro**
We're going to start off by doing some data preprocessing for an emotion classifier model that takes a piece of audio data and labels it.

In [None]:
# Libraries to install
!python -m pip install librosa
# !python -m pip install matplotlib==3.3.4
# !python -m pip install datasets[audio]

In [31]:
import os
import librosa
import matplotlib.pyplot as plt
import librosa.display as display
import pandas as pd

# Let's analyze some of our audio files
audio_folder_name = "data\CREMA"
audio_files = os.listdir(audio_folder_name)

for file_name in audio_files[:5]:
    file_path = os.path.join(audio_folder_name, file_name)
    y, sr = librosa.load(file_path, sr=None)

    # Info about sampling rate
    print('Length of the CREMA audio files: {:.2f} seconds'.format(len(y) / sr))
    print('Number of samples: {}'.format(len(y)))
    print('Sampling rate: {} Hz'.format(sr))

    # Extracting features
    mfccs = librosa.feature.mfcc(y=y, sr=sr)
    spectral_centroid = librosa.feature.spectral_centroid(y=y, sr=sr)
    chroma = librosa.feature.chroma_stft(y=y, sr=sr)
    spectral_contrast = librosa.feature.spectral_contrast(y=y, sr=sr)
    spectral_rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr)
    zero_crossing_rate = librosa.feature.zero_crossing_rate(y)
    tempo, _ = librosa.beat.beat_track(y=y, sr=sr)
    rms_energy = librosa.feature.rms(y=y)

    # Displaying some of the extracted features
    print("MFCCs:", mfccs)
    print("Spectral Centroid:", spectral_centroid)
    print("Chroma:", chroma)
    print("Spectral Contrast:", spectral_contrast)
    print("Spectral Rolloff:", spectral_rolloff)
    print("Zero-Crossing Rate:", zero_crossing_rate)
    print("Tempo:", tempo)
    print("RMS Energy:", rms_energy)
    print()

    # About mfccs
    # mfccs = librosa.feature.mfcc(y=array, sr=sampling_rate)
    # print(mfccs)
    # df = pd.DataFrame(mfccs)
    # print(df.head())

Length of the CREMA audio files: 2.28 seconds
Number of samples: 36409
Sampling rate: 16000 Hz
MFCCs: [[-589.87604   -507.21228   -465.82687   ... -451.92017   -457.81427
  -491.41537  ]
 [  47.88035    106.786606   109.389404  ...  112.87541    110.077614
   107.29699  ]
 [  34.01563     33.81172     34.28376   ...   31.895073    25.271503
    27.211044 ]
 ...
 [  -1.3662558   -3.268082    -3.9222825 ...   -3.2544682   -2.8602471
    -2.1259599]
 [  -2.631751    -8.814243    -5.8102837 ...   -4.3985515   -9.950932
    -8.635396 ]
 [  -4.891833    -8.876873    -7.909541  ...   -8.232867    -4.9326334
    -8.461327 ]]
Spectral Centroid: [[1319.68038722 1173.84607922 1190.91817953 1236.86489799 1107.05204106
  1149.2114114  1127.55069251 1067.40546312 1014.1991914  1139.58793025
  1111.63064588 1038.30963576 2641.60362845 2060.85947242 1809.86920294
  1622.92763157 1385.80735335 1122.0516368   978.25309712 1240.26969354
  2379.00859533 2918.66930332 1897.33390363 1443.68146459 1278.50835

### About Datasets
**TESS**: Sampling rate of 24 414 Hz, audio files are between 1-2 seconds long

**SAVEE**: Sampling rate of 48 000 Hz, audio files are between 3-4 seconds long

**RAVDESS**: Sampling rate of 48 000 Hz, audio files are also between 3-4 seconds long

**CREMA**: Sampling rate of 16 000 Hz, audio files 1-2 secs long