# Feeling Fine 

## Data pre-processing 
We first need to define some functions to pre process the data. This involves converting the audio file into something the computer can actually understand (numerical values). We are going to use the librosa library and it has some predefined extraction functions. 

We can extract the following information
<img src="https://i.ibb.co/sbmCxfK/Screenshot-2020-11-30-at-13-54-26.png" alt="Table of function" width="600"/>

So, according to my research:
* Chroma : relates to the 12 different pitches, we will be focused with the short term fourier transformation of the sound files. <img src="https://upload.wikimedia.org/wikipedia/commons/2/25/ChromaFeatureCmajorScaleScoreAudioColor.png" alt="(Image of a the 12 different pitches)" width="300"/>
* Melspectogram : This relates to different Mel scale and Spectrogram (Check notebook on more info)
    * Mel scale : The mel scale is the result of non-linear transformations on frequencies to make it easier to plot and record the distance between frequencies
    * Spectrograms : This is the way we plot audio, y axis is hertz, x axis is time, and there is a color spectrum, which ussually represents the decibles. 
* Mel Frequency Cepstral Co-efficients (MFCC) : A feature of sound (similar to edges in photos) / the log of the magnitude of the fourier transformation of sound waves ... 
* Spectral Centroid : The center of mass of the spectrum (also considered the brightness of the sound),
* Spectral Bandwidth : the difference between the max and the min of the spectrum (max change in frequency),
* Spectral Contrast : The differences between the peaks and the valleys in a spectrum, multiple andwidths calculated,
* Roll-Off Frequency : The freqency at which the filter begins to cut off (not sure either)


Okay, now we've gone into what we can extract from the sound waves in a bit more detail I'll briefly explain the thought process behind the selection I will make. I'm deciding to use Chroma since it measures the pitch. I'll use MFCC because it's a feature of sound that the model will be able to use well, I'll also include the spectral centroid, spectral Bandwidth, spectral contrast to try and mimic the variation in frequency based on the idea people have more voice cracks depending on their emotions (although though i am aware this might cause some over fitting in the model). I'll also include the melspectogram and finally I will also include the roll-off frequency as well under the assumption that even if I start the sentence with a lot of energy my emotions determine how fast i speak, the speed of my language determines my frequency (talking slower ussually gives out a lower sound), and the roll-off frequency might help determine this (once again might be over fitting).

In [2]:
import librosa                                             # Audio analyser  
import soundfile                                           # Read the audio files
import os, glob                                            # Deal with files  
import numpy as np                                         # Numpy used to manipulate dataframes
from sklearn.model_selection import train_test_split       # For testing and training the model 
from sklearn.neural_network import MLPClassifier           # The ANN model  
from sklearn.metrics import accuracy_score                 # used to test the accuracy of our model

The function we are going to define takes in a file name, and flags (which parameters to include in extraction), and then returns a data structure which contains the mean of the extracted information. 

Flag names :
* chroma - Chroma Short Term Fourier Transformation (Pitch)
* mfcc - Mel Frequency Cepstral Co-Efficients
* mel - Melspectrogram
* spec_centroid - Spectral Centroid 
* spec_bandwidth - Spectral Bandwidth 
* spec_contrast - Spectral Contrast 
* roll_off - Roll-Off Frequency 

This function goes though each flag and then returns the mean value of it.

In [11]:
'''
Extracting features 
Params : file_name (str), chroma (bool), mfcc (bool), mel (bool), spec_centroid (bool), spec_bandwidth (bool)
           spec_contrast (bool), roll_off (bool) 
'''
def extract_feature(file_name, chroma, mfcc, mel, spec_centroid, spec_bandwidth, spec_contrast, roll_off):
    with soundfile.SoundFile(file_name) as sound_file:
        raw_audio = sound_file.read(dtype="float32") 
        sample_rate = sound_file.samplerate         
        extracted_features = np.array([])
        stft = np.abs(librosa.stft(raw_audio))
        if chroma:
            chroma = np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T,axis=0)
            extracted_features = np.hstack((extracted_features, chroma))
        if mfcc:
            mfccs=np.mean(librosa.feature.mfcc(y=raw_audio, sr=sample_rate, n_mfcc=40).T, axis=0)
            extracted_features = np.hstack((extracted_features, mfccs))
        if mel:
            mel = np.mean(librosa.feature.melspectrogram(raw_audio, sr=sample_rate).T,axis=0)
            extracted_features = np.hstack((extracted_features, mel))
        if spec_centroid:
            spec_centroid = np.mean(librosa.feature.spectral_centroid(y=raw_audio, sr=sample_rate).T,axis=0)
            extracted_features = np.hstack((extracted_features, spec_centroid))
        if spec_bandwidth:
            spec_bandwidth = np.mean(librosa.feature.spectral_bandwidth(y=raw_audio, sr=sample_rate).T,axis=0)
            extracted_features = np.hstack((extracted_features, spec_bandwidth))
        if spec_contrast:
            spec_contrast = np.mean(librosa.feature.spectral_contrast(y=raw_audio, sr=sample_rate).T,axis=0)
            extracted_features = np.hstack((extracted_features, spec_contrast))
        if roll_off:
            roll_off = np.mean(librosa.feature.spectral_rolloff(y=raw_audio, sr=sample_rate).T,axis=0)
            extracted_features = np.hstack((extracted_features, roll_off))
    return extracted_features

### Loading Data Set

In this section we will load up the data set and split it into the training and testing set

In [5]:
# a dictionary off all emotions we can measure
emotions = {
  '01':'neutral',    # file name XX-XX-01 = neutral 
  '02':'calm',       # file name XX-XX-02 = calm
  '03':'happy',      # file name XX-XX-03 = happy
  '04':'sad',        # file name XX-XX-04 = sad
  '05':'angry',      # file name XX-XX-05 = angry
  '06':'fearful',    # file name XX-XX-06 = fearful
  '07':'disgust',    # file name XX-XX-07 = disgust
  '08':'surprised'   # file name XX-XX-08 = surprised
} 

Above we have a dictionary mapping a casted number to an emotion, when going through the data set we are going to load in each entry and then extract it's features. We are then going to split this into training data and test data. We are going to add a parameter for the percentage of data to be in the test data.

In [9]:
# Load the files, extract the features, and split it into the training and test set
def load_data(test_size=0.25):
    x,y=[],[]
    for file in glob.glob("../data/Actor_*/*.wav"):
        file_name=os.path.basename(file)
        emotion=emotions[file_name.split("-")[2]]
        feature=extract_feature(file,  chroma=True, mfcc=True, mel=True, spec_centroid=True, spec_bandwidth=True, spec_contrast=True, roll_off=True)
        x.append(feature)
        y.append(emotion)
    return train_test_split(np.array(x), y, test_size=test_size, random_state=9)

In [12]:
# Get the training and testing data
x_train,x_test,y_train,y_test=load_data()

AttributeError: module 'librosa.feature' has no attribute 'rolloff'