<a href="https://colab.research.google.com/github/puneat/Audio_Sentiment/blob/puneet/Loading_Audio_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**In this Notebook, we shall be loading the data into dataframes in the form of dataset, emotion and path file columns of the audio. This will combine data from all sources into one single CSV file (for each gender)**

In [1]:
from google.colab import drive
drive.mount('/gdrive', force_remount=True)

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /gdrive


In [2]:
# Import libraries 
import librosa
import librosa.display
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from matplotlib.pyplot import specgram
import pandas as pd
import glob 
from sklearn.metrics import confusion_matrix
import IPython.display as ipd  # To play sound in the notebook
import os
import sys
import warnings
# ignore warnings 
if not sys.warnoptions:
    warnings.simplefilter("ignore")
warnings.filterwarnings("ignore", category=DeprecationWarning) 

In [45]:
TESS = "/gdrive/My Drive/Audio_files/female_final/TESS_ALL/"
RAV = "/gdrive/My Drive/Audio_files/Female_audio/RAVDESS/"
SAVEE = "/gdrive/My Drive/Audio_files/male_final/SAVEE/"
CREMA = "/gdrive/My Drive/Audio_files/CREMA/"

# Run one example 
dir_list = os.listdir(CREMA)
dir_list[0:5]

['1079_TIE_SAD_XX.wav',
 '1080_DFA_DIS_XX.wav',
 '1080_DFA_SAD_XX.wav',
 '1080_DFA_NEU_XX.wav',
 '1079_TIE_ANG_XX.wav']

<a id="savee"></a>
##  <center> 1. SAVEE dataset <center>
The audio files are named in such a way that the prefix letters describes the emotion classes as follows:
- 'a' = 'anger'
- 'd' = 'disgust'
- 'f' = 'fear'
- 'h' = 'happiness'
- 'n' = 'neutral'
- 'sa' = 'sadness'
- 'su' = 'surprise' 

The original source has 4 folders each representing a speaker, but i've bundled all of them into one single folder and thus the first 2 letter prefix of the filename represents the speaker initials. Eg. 'DC_d03.wav' is the 3rd disgust sentence uttered by the speaker DC. It's  worth nothing that they are all male speakers only. This isn't an issue as we'll balance it out with the TESS dataset which is just female only.

In [32]:
# Get the data location for SAVEE
dir_list = os.listdir(SAVEE)

# parse the filename to get the emotions
emotion=[]
path = []
for i in dir_list:
    if i[-8:-6]=='a':
        emotion.append('male_angry')
    elif i[-8:-6]=='d':
        emotion.append('male_disgust')
    elif i[-8:-6]=='f':
        emotion.append('male_fear')
    elif i[-8:-6]=='h':
        emotion.append('male_happy')
    elif i[-8:-6]=='n':
        emotion.append('male_neutral')
    elif i[-8:-7]=='n':
        emotion.append('male_neutral')
    elif i[-8:-6]=='sa':
        emotion.append('male_sad')
    elif i[-8:-6]=='su':
        emotion.append('male_surprise')
    else:
        emotion.append('male_error') 
    path.append(SAVEE + i)
    
# Now check out the label count distribution 
SAVEE_df = pd.DataFrame(emotion, columns = ['labels'])
SAVEE_df['source'] = 'SAVEE'
SAVEE_df = pd.concat([SAVEE_df, pd.DataFrame(path, columns = ['path'])], axis = 1)
SAVEE_df.labels.value_counts()

male_neutral     120
male_happy        60
male_surprise     60
male_sad          60
male_fear         60
male_angry        60
male_disgust      60
Name: labels, dtype: int64

<a id="ravdess"></a>
## <center>2. RAVDESS dataset</center>



- Modality (01 = full-AV, 02 = video-only, 03 = audio-only).
- Vocal channel (01 = speech, 02 = song).
- Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).
- Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.
- Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").
- Repetition (01 = 1st repetition, 02 = 2nd repetition).
- Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).

So, here's an example of an audio filename. 
_02-01-06-01-02-01-12.mp4_

This means the meta data for the audio file is:
- Video-only (02)
- Speech (01)
- Fearful (06)
- Normal intensity (01)
- Statement "dogs" (02)
- 1st Repetition (01)
- 12th Actor (12) - Female (as the actor ID number is even)

I learnt through the hard way that male and female speakers have to be trained seperately or the model will struggle to get a good accuracy. From reading a few blogs and articles, it seems female has a higher pitch that male. 

there's a 'calm' emotion and a 'neutral' emotion as seperate. I'll just combined them into the same category.

In [46]:
dir_list = os.listdir(RAV)
dir_list.sort()

emotion = []
gender = []
path = []
for i in dir_list:
    fname = os.listdir(RAV + i)
    for f in fname:
        part = f.split('.')[0].split('-')
        emotion.append(int(part[2]))
        temp = int(part[6])
        if temp%2 == 0:
            temp = "female"
        else:
            temp = "male"
        gender.append(temp)
        path.append(RAV + i + '/' + f)

        
RAV_df = pd.DataFrame(emotion)
RAV_df = RAV_df.replace({1:'neutral', 2:'neutral', 3:'happy', 4:'sad', 5:'angry', 6:'fear', 7:'disgust', 8:'surprise'})
RAV_df = pd.concat([pd.DataFrame(gender),RAV_df],axis=1)
RAV_df.columns = ['gender','emotion']
RAV_df['labels'] =RAV_df.gender + '_' + RAV_df.emotion
RAV_df['source'] = 'RAVDESS'  
RAV_df = pd.concat([RAV_df,pd.DataFrame(path, columns = ['path'])],axis=1)
RAV_df = RAV_df.drop(['gender', 'emotion'], axis=1)
RAV_df.labels.value_counts()

female_neutral     144
female_angry        96
female_sad          96
female_fear         96
female_surprise     96
female_disgust      96
female_happy        96
Name: labels, dtype: int64

<a id="tess"></a>
##  <center> 3. TESS dataset <center>
Now on to the TESS dataset, its worth nothing that it's only based on 2 speakers, a young female and an older female. This should hopefully balance out the male dominant speakers that we have on SAVEE. 

Its got the same 7 key emotions we're interested in. But what is slightly different about this dataset compared to the previous two above, is that the addition of 'pleasant surprise' emotion.

In [47]:
dir_list = os.listdir(TESS)
dir_list.sort()
dir_list

['OAF_Fear',
 'OAF_Pleasant_surprise',
 'OAF_Sad',
 'OAF_angry',
 'OAF_disgust',
 'OAF_happy',
 'OAF_neutral',
 'YAF_angry',
 'YAF_disgust',
 'YAF_fear',
 'YAF_happy',
 'YAF_neutral',
 'YAF_pleasant_surprised',
 'YAF_sad']

In [48]:
path = []
emotion = []

for i in dir_list:
    fname = os.listdir(TESS + i)
    for f in fname:
        if i == 'OAF_angry' or i == 'YAF_angry':
            emotion.append('female_angry')
        elif i == 'OAF_disgust' or i == 'YAF_disgust':
            emotion.append('female_disgust')
        elif i == 'OAF_Fear' or i == 'YAF_fear':
            emotion.append('female_fear')
        elif i == 'OAF_happy' or i == 'YAF_happy':
            emotion.append('female_happy')
        elif i == 'OAF_neutral' or i == 'YAF_neutral':
            emotion.append('female_neutral')                                
        elif i == 'OAF_Pleasant_surprise' or i == 'YAF_pleasant_surprised':
            emotion.append('female_surprise')               
        elif i == 'OAF_Sad' or i == 'YAF_sad':
            emotion.append('female_sad')
        else:
            emotion.append('Unknown')
        path.append(TESS + i + "/" + f)

TESS_df = pd.DataFrame(emotion, columns = ['labels'])
TESS_df['source'] = 'TESS'
TESS_df = pd.concat([TESS_df,pd.DataFrame(path, columns = ['path'])],axis=1)
TESS_df.labels.value_counts()

female_neutral     400
female_sad         400
female_surprise    400
female_fear        400
female_disgust     400
female_happy       400
female_angry       397
Name: labels, dtype: int64

<a id="crema"></a>
##  <center> 4. CREMA-D dataset <center>
Last but not least, CREMA dataset.  Its a very large dataset which we need. And it has a good variety of different speakers, apparently taken from movies. And the speakers are of different ethnicities.

In [49]:
dir_list = os.listdir(CREMA)
dir_list.sort()
print(dir_list[0:10])

['1001_DFA_ANG_XX.wav', '1001_DFA_DIS_XX.wav', '1001_DFA_FEA_XX.wav', '1001_DFA_HAP_XX.wav', '1001_DFA_NEU_XX.wav', '1001_DFA_SAD_XX.wav', '1001_IEO_ANG_HI.wav', '1001_IEO_ANG_LO.wav', '1001_IEO_ANG_MD.wav', '1001_IEO_DIS_HI.wav']


In [50]:
gender = []
emotion = []
path = []
female = [1002,1003,1004,1006,1007,1008,1009,1010,1012,1013,1018,1020,1021,1024,1025,1028,1029,1030,1037,1043,1046,1047,1049,
          1052,1053,1054,1055,1056,1058,1060,1061,1063,1072,1073,1074,1075,1076,1078,1079,1082,1084,1089,1091]

for i in dir_list: 
    part = i.split('_')
    if int(part[0]) in female:
        temp = 'female'
    else:
        temp = 'male'
    gender.append(temp)
    # if part[2] == 'SAD' and temp == 'male':
    #     emotion.append('male_sad')
    #     path.append(CREMA + i)
    # elif part[2] == 'ANG' and temp == 'male':
    #     emotion.append('male_angry')
    #     path.append(CREMA + i)
    # elif part[2] == 'DIS' and temp == 'male':
    #     emotion.append('male_disgust')
    #     path.append(CREMA + i)
    # elif part[2] == 'FEA' and temp == 'male':
    #     emotion.append('male_fear')
    #     path.append(CREMA + i)
    # elif part[2] == 'HAP' and temp == 'male':
    #     emotion.append('male_happy')
    #     path.append(CREMA + i)
    # elif part[2] == 'NEU' and temp == 'male':
    #     emotion.append('male_neutral')
    #     path.append(CREMA + i)
    if part[2] == 'SAD' and temp == 'female':
        emotion.append('female_sad')
        path.append(CREMA + i)
    elif part[2] == 'ANG' and temp == 'female':
        emotion.append('female_angry')
        path.append(CREMA + i)
    elif part[2] == 'DIS' and temp == 'female':
        emotion.append('female_disgust')
        path.append(CREMA + i)
    elif part[2] == 'FEA' and temp == 'female':
        emotion.append('female_fear')
        path.append(CREMA + i)
    elif part[2] == 'HAP' and temp == 'female':
        emotion.append('female_happy')
        path.append(CREMA + i)
    elif part[2] == 'NEU' and temp == 'female':
        emotion.append('female_neutral')
        path.append(CREMA + i)
    # else:
    #     emotion.append('Unknown')
   
    
CREMA_df = pd.DataFrame(emotion, columns = ['labels'])
CREMA_df['source'] = 'CREMA'
CREMA_df = pd.concat([CREMA_df,pd.DataFrame(path, columns = ['path'])],axis=1)
CREMA_df.labels.value_counts()

female_angry      600
female_sad        600
female_fear       600
female_disgust    600
female_happy      600
female_neutral    512
Name: labels, dtype: int64

In [51]:
df = pd.concat([TESS_df,RAV_df,CREMA_df], axis = 0)
print(df.labels.value_counts())
df.head()
df.to_csv("/gdrive/My Drive/Audio_files/female_df.csv",index=False)

female_sad         1096
female_fear        1096
female_disgust     1096
female_happy       1096
female_angry       1093
female_neutral     1056
female_surprise     496
Name: labels, dtype: int64


In [52]:
# Importing required libraries 
# Keras
import keras
from keras import regularizers
from keras.preprocessing import sequence
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential, Model, model_from_json
from keras.layers import Dense, Embedding, LSTM
from keras.layers import Input, Flatten, Dropout, Activation, BatchNormalization
from keras.layers import Conv1D, MaxPooling1D, AveragePooling1D, GlobalAveragePooling1D
from keras.utils import np_utils, to_categorical
from keras.callbacks import ModelCheckpoint

# sklearn
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

# Other  
import librosa
import librosa.display
import json
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from matplotlib.pyplot import specgram
import pandas as pd
import seaborn as sns
import glob 
import os
import pickle
import IPython.display as ipd  # To play sound in the notebook

In [53]:

ref = pd.read_csv("/gdrive/My Drive/Audio_files/female_df.csv")
ref.head()

Unnamed: 0,labels,source,path
0,female_fear,TESS,/gdrive/My Drive/Audio_files/female_final/TESS...
1,female_fear,TESS,/gdrive/My Drive/Audio_files/female_final/TESS...
2,female_fear,TESS,/gdrive/My Drive/Audio_files/female_final/TESS...
3,female_fear,TESS,/gdrive/My Drive/Audio_files/female_final/TESS...
4,female_fear,TESS,/gdrive/My Drive/Audio_files/female_final/TESS...


In [None]:

df = pd.DataFrame(columns=['feature'])

# loop feature extraction over the entire dataset
counter=0
for index,path in enumerate(ref.path):
    X, sample_rate = librosa.load(path
                                  , res_type='kaiser_fast'
                                  ,duration=2.5
                                  ,sr=44100
                                  ,offset=0.5
                                 )
    sample_rate = np.array(sample_rate)
    
    # mean as the feature. Could do min and max etc as well. 
    mfccs = np.mean(librosa.feature.mfcc(y=X, 
                                        sr=sample_rate, 
                                        n_mfcc=13),
                    axis=0)
    df.loc[counter] = [mfccs]
    counter=counter+1
    print(counter/7029*100)   

# Check processed successfully
print(len(df))
df.head()

0.014226774790155071
0.028453549580310142
0.04268032437046522
0.056907099160620284
0.07113387395077536
0.08536064874093044
0.09958742353108552
0.11381419832124057
0.12804097311139565
0.14226774790155072
0.1564945226917058
0.17072129748186088
0.18494807227201593
0.19917484706217103
0.21340162185232608
0.22762839664248113
0.2418551714326362
0.2560819462227913
0.27030872101294634
0.28453549580310145
0.2987622705932565
0.3129890453834116
0.32721582017356665
0.34144259496372176
0.3556693697538768
0.36989614454403186
0.3841229193341869
0.39834969412434207
0.41257646891449706
0.42680324370465217
0.4410300184948072
0.45525679328496227
0.4694835680751174
0.4837103428652724
0.49793711765542753
0.5121638924455826
0.5263906672357377
0.5406174420258927
0.5548442168160478
0.5690709916062029
0.5832977663963579
0.597524541186513
0.6117513159766681
0.6259780907668232
0.6402048655569782
0.6544316403471333
0.6686584151372884
0.6828851899274435
0.6971119647175985
0.7113387395077536
0.7255655142979087
0.73

In [None]:
df.to_csv("/gdrive/My Drive/Audio_files/mfcc_female.csv",index=False)