# 1 Introduction

### 1.1 Mel-frequency cepstrum <a href='https://en.wikipedia.org/wiki/Mel-frequency_cepstrum'>Link</a>

In sound processing, the mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.

Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC.[1] They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-of-a-spectrum"). The difference between the cepstrum and the mel-frequency cepstrum is that in the MFC, the frequency bands are equally spaced on the mel scale, which approximates the human auditory system's response more closely than the linearly-spaced frequency bands used in the normal cepstrum. This frequency warping can allow for better representation of sound, for example, in audio compression.

### 1.2 Chroma <a href='https://en.wikipedia.org/wiki/Chroma_feature'>Link</a>

In music, the term chroma feature or chromagram closely relates to the twelve different pitch classes. Chroma-based features, which are also referred to as "pitch class profiles", are a powerful tool for analyzing music whose pitches can be meaningfully categorized (often into twelve categories) and whose tuning approximates to the equal-tempered scale. One main property of chroma features is that they capture harmonic and melodic characteristics of music, while being robust to changes in timbre and instrumentation.

### 1.3 Mel Spectrogram Frequency  <a href='https://librosa.org/doc/0.7.2/generated/librosa.feature.melspectrogram.html'>Link</a>

Compute a mel-scaled spectrogram.

If a spectrogram input S is provided, then it is mapped directly onto the mel basis mel_f by mel_f.dot(S).

If a time-series input y, sr is provided, then its magnitude spectrogram S is first computed, and then mapped onto the mel scale by mel_f.dot(S**power). By default, power=2 operates on a power spectrum.

# 2 Data Cleaning

## 2.1 Importing Libraries

In [2]:
import librosa
import soundfile
import os, glob, pickle
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

## 2.2 Feature Extraction

### Steps
1. Define a function extract_feature to extract the mfcc, chroma, and mel features from a sound file. This function takes 4 parameters- the file name and three Boolean parameters for the three features:

        i) mfcc: Mel Frequency Cepstral Coefficient, represents the short-term power spectrum of a sound
        ii) chroma: Pertains to the 12 different pitch classes
        iii) mel: Mel Spectrogram Frequency

2. Open the sound file with soundfile.SoundFile using with-as so it’s automatically closed once we’re done. 
        Read from it and call it X. Also, get the sample rate. If chroma is True, get the Short-Time Fourier Transform of X.

3. Let result be an empty numpy array. Now, for each feature of the three, if it exists, make a call to the corresponding function from librosa.feature (eg- librosa.feature.mfcc for mfcc), and get the mean value. Call the function hstack() from numpy with result and the feature value, and store this in result. hstack() stacks arrays in sequence horizontally (in a columnar fashion). Then, return the result.