# This is the main function for Emotion Forecasting Project. 

The first step is to have the audio-visual features from the IEMOCAP dataset. We have the facial-video features available from IEMOCAP dataset. But we need to extract audio features. The skeleton code `audio_feat.praat` does the job.

The next step is to call different function from our classes to prepare the dataset from raw data

In [20]:
from Codes.Combine_audiovisual_data import combining_AV
from Codes.window_based_reformation import window_based_reformation
from Codes.Utt_Fore_Data_Prep import Prepare_UF_Cur_Data, Prepare_UF_history_Data
from Codes.run_algorithms import run_sequential_learning

First, we will combine the audio-visual features.

The audio and visual features are not extracted in a same framerate. We have to do the following things:
1. remove the `nan` features from video datasets
2. Downsample the video features to make it as a same length as the audio features.

The following code will do that:

In [11]:
Combining_data = combining_AV('Files/audio_features', 'Files/video_features')
Combining_data.produce_speakerwise_AV_data()

Next, to use the sequential information of the audio-visual cues, we will create overlapped frames. These frames will have statistical information which will be used as features. So, the following code will do the tasks:

1. Create window based sequences
2. Find mean, standard deviation, first and third quantile, interquantile range of those windows.

In [13]:
Windowing = window_based_reformation('Files/sameframe')
Windowing.process_data(window_type='dynamic') 
#important note: setting the window_type as 'static' will stop creating any window based features and create statistical features
#from the whole sequence. We will use them to make a FC-DNN based model.

After that, we prepare the dataset for utterance forecasting. The code is so designed that you can have any `step` of utterance forecasting. The code does the followign task:

1. Process the `IEMOCAP_EmoEvaluation.txt` file and produces a smart look-up table for finding the time-distances for Utterance Forecasting.
2. Create the dataset, normalize (or without normalize) it, and add zero padding at the end. The utterance length have variations and thus, the functions add zero (zero-padding) to make it of same length. 

In [14]:
UF_cur = Prepare_UF_Cur_Data()
UF_cur.creating_dataset(step=1, normalization=True)
#Set any number of step. set normalization 'False' to not normalize the file. 
#If set 'True', it will produce speaker-wise z-normalization.

Next, we will create Utterance forecasting dataset with `History` information. 

In [15]:
UF_his = Prepare_UF_history_Data()
UF_his.creating_dataset(step=1, normalization=True)

For running the sequential models, we use `LSTM` or `BLSTM`. Next chunk of code will create a model and run the forecasting task.
The code serves following task:

1. Prepare the feature matrix , label vector for emotion forecasting task using LSTM or BLSTM.
2. If the model_type is selected as `unidirectional`, we will have regular LSTM cell, if set to `bidirectional`, BLSTM layers will be set. 

In [21]:
forecast = run_sequential_learning()
features, label, speaker_group = forecast.prepare_data(directory='Files/UF_His_data/step_1') #Change the file location accordingly
forecast.LSTM(features, label, speaker_group, model_type='bidirectional') #if model_type set to unidirectional, it will be LSTM