# Affective Computing - Programming Assignment 3

### Objective

Your task is to use the feature-level method to combine the facial expression features and audio features. A multi-modal emotion recognition system is constructed to recognize happy versus sadness facial expressions (binary-class problem) by using a classifier training and testing structure. Furthermore, using decision-level method to combine the result of ECG signals and videos including facial expression to decide the pain result. 

For the the original data of feature-level method is based on lab1 and lab2, from ten actors acting happy and sadness behaviors. 
The original data of decision-level method is from a pain dataset including ECG signals and vidoes of facial expressions. 
* Task 1: Subspace-based feature fusion method: In this case, z-score normalization is utilized. Please read “Fusing Gabor and LBP feature sets for kernel-based face recognition” and learn how to use subspace-based feature fusion method for multi-modal system.

* Task 2: Based on Task1, use Canonical Correlation Analysis to calculate the correlation coefficient of facial expression and audio features. Finally, use CCA to build a multi-modal emotion recognition system. The method is described in one conference paper “Feature fusion method based on canonical correlation analysis and handwritten character recognition”

* Task 3: Decision fusion based method, use ECG signals and vidoes of facial expressions to recognize the pain result. Finally, combine the decision of ECG signals and vidoes of facial expressions to build a multi-modal recognition system. 

* Task 4: Use feature-level method (Task 2) on 10-fold cross-validation estimate of the emotion recognition system performance

To produce emotion recognition case, Support Vector Machine (SVM) classifiers are trained.  50 videos from 5 participants are used to train the emotion recognition, use spatiotemporal features. The rest of the data (50 videos) is used to evaluate the performances of the trained recognition systems.

To produce the pain recognition result case in task 3, Support Vector Machine (SVM) classifiers are trained. 40 samples are used to train the pain recongtion model. And 30 samples are used to evaluate the performances of the trained recognition systems. 



## Task 1. Subspace-based method
Please read “Fusing Gabor and LBP feature sets for kernel-based face recognition” and apply their framework for the exercise. We use Support Vector Machine (SVM) with linear kernel for classification.


### Setting up the environment 

First, we need to import the basic modules for loading the data and data processing

In [1]:
!pip install matplotlib



In [2]:
import sys
sys.path.append('../')

from skimage import io
from skimage import transform
from skimage import color
from skimage import img_as_ubyte
import os
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import sklearn
import scipy.io as sio


### Loading data 

We load the facial expression data (training data, training class, testing data, testing class) and audio data (training data, testing data)

In [3]:
mdata = sio.loadmat('lab3_data.mat')
print(mdata.keys())
#facial expression training and testing data, training and testing class
training_data = mdata['training_data']
print(training_data.shape)
testing_data = mdata['testing_data']

training_class = mdata['training_class']
print(training_class.shape)
testing_class = mdata['testing_class']
print(testing_class.shape)

#audio training and testing data
training_data_proso = mdata['training_data_proso']
testing_data_proso = mdata['testing_data_proso']


dict_keys(['__header__', '__version__', '__globals__', 'speech_sample', 'testing_class', 'testing_data_mfcc', 'testing_data_proso', 'testing_personID', 'training_class', 'training_data_mfcc', 'training_data_proso', 'training_personID', 'training_data', 'testing_data'])
(50, 708)
(50, 1)
(50, 1)


In [4]:
print (training_data, testing_data)

[[0.00169576 0.00047895 0.00018466 ... 0.00211146 0.00534386 0.02566711]
 [0.00163015 0.00048877 0.00017013 ... 0.00184821 0.00525585 0.02539483]
 [0.00158712 0.00049059 0.00013353 ... 0.00184614 0.00487164 0.02503638]
 ...
 [0.00123411 0.00067563 0.00036776 ... 0.00138427 0.00534685 0.02067097]
 [0.0013425  0.00068693 0.00037706 ... 0.00095649 0.0041513  0.02007647]
 [0.00131386 0.00068247 0.00037404 ... 0.0013449  0.00486963 0.02074051]] [[0.00130036 0.00074238 0.00030305 ... 0.0026093  0.00430659 0.02710369]
 [0.00138448 0.00089134 0.00028084 ... 0.00234032 0.00452567 0.02559213]
 [0.00117056 0.00094101 0.00026035 ... 0.00208437 0.00454747 0.02623743]
 ...
 [0.00168477 0.00084574 0.00018816 ... 0.00214308 0.00636153 0.02826266]
 [0.00169953 0.00080529 0.00020156 ... 0.0020711  0.00625999 0.02895758]
 [0.00205743 0.00061646 0.00023869 ... 0.00235152 0.00590119 0.02561858]]


### Extract the subspace for facial expression feature and audio features. 
Extract the subspace for facial expression feature and audio features using principal component analysis through using __[`sklearn.decomposition.PCA()`](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)__ function.
ReducedDim is the dimensionality of the reduced subspace.
Set ReducedDim to 20 and 15 for facial expression feature and audio feature, respectively.

In [5]:
from sklearn.decomposition import PCA 

#set ReducedDim for facial expression feature and audio feature, respectively.
reducedDim_v = 20;
reducedDim_a = 15;

#Extract subspace for facial expression feature though PCA
#set n_components
pca_v=PCA(n_components = reducedDim_v, whiten= True)
pca_v.fit(training_data, training_class)

#Transform training_data and testing data respectively
pca_v_trans_training_data =pca_v.transform(training_data)
pca_v_trans_testing_data =pca_v.transform(testing_data)

print (pca_v_trans_testing_data.shape)
#Extract subspace for audio features though PCA
pca_a=PCA(n_components = reducedDim_a, whiten= True)
pca_a.fit(training_data_proso, testing_data_proso)
#Transform training_data and testing data respectively
pca_a_trans_training_data  =pca_a.transform(training_data_proso)
pca_a_trans_testing_data=pca_a.transform(testing_data_proso)  

#print(pca_v_trans_training_data)

#Concatenate ‘video training_data’ and ‘audio training_data’ into a new feature ‘combined_trainingData’
sample_train = np.concatenate((pca_v_trans_training_data, pca_a_trans_training_data), axis=1 )  

#Concatenate ‘video testing_data’ and ‘audio testing_data2 into a new feature ‘combined_testingData’.
sample_test = np.concatenate((pca_v_trans_testing_data, pca_a_trans_testing_data),axis=1)   



(50, 20)


### Feature classification
Use the __[`SVM`](http://scikit-learn.org/stable/modules/svm.html)__ function to train Support Vector Machine (SVM) classifiers.
Construct an SVM using the ‘combined_trainingData’ and linear kernel. The ‘training_class’ group vector contains the class of samples: 1 = happy, 2 = sadness, corresponding to the rows of the training data matrices.

Then, calculate average classification performances for both training and testing data. The correct class labels corresponding with the rows of the training and testing data matrices are in the variables ‘training_class’ and ‘testing_class’, respectively.

In [6]:
from sklearn import svm

# Train SVM classifier
clf = svm.SVC(kernel ='linear', degree=3)
clf.fit(sample_train,training_class)  

#The prediction results of training data and testing data respectively
predict_train = clf.predict(sample_train)
predict_test= clf.predict(sample_test)

#Calculate and Print the training accuracy and testing accuracy. 
def cal_acc(str_):
    x=0
    for i in range (0, training_class.shape[0]):
        if str_[i] == training_class[i] :
            x= x+ 1 
    acc_str = (x/training_class.shape[0]*100) 
    print (acc_str)
    return ;
def cal_acc_test(str_):
    x=0
    for i in range (0, testing_class.shape[0]):
        if str_[i] == testing_class[i] :
            x= x+ 1 
    acc_str = (x/testing_class.shape[0]*100) 
    print (acc_str)
    return ;
cal_acc(predict_train)
cal_acc_test(predict_test)


100.0
96.0


  y = column_or_1d(y, warn=True)


Compute the confusion matrix through __[`sklearn.metrics.confusion_matrix()`](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)__function for training data and testing data respectively

In [7]:
from sklearn.metrics import confusion_matrix
conf_matrix_train = confusion_matrix(training_class, predict_train)
conf_matrix_test = confusion_matrix(testing_class, predict_test)
print (conf_matrix_train, '\n',conf_matrix_test)

[[25  0]
 [ 0 25]] 
 [[25  0]
 [ 2 23]]


### Question
What is PCA? Why we use PCA? 

### Answer:
Using orthogonal transformation convert high dimension data to lower dimension data(linear data). It can reduce dimensionality /Compression.

PCA is the simpleset of the ture eigenvector-based multivariate analyeses. So it can deal with large datasets. There are no special assumptions on the data and PCA can be applied on every data-sets.

## Task 2. 
Based on Task1, use Canonical Correlation Analysis to calculate the correlation coefficient of facial expression and audio features. Finally, use CCA to build a multi-modal emotion recognition system.


Use (__[`sklearn.cross_decomposition.CCA()`](http://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.CCA.html)__) function to calculate the correlation coefficient of facial expression and audio features.

In [8]:
from sklearn.cross_decomposition import CCA
import numpy as np

#Use CCA to construct the Canonical Projective Vector (CPV)
cca = CCA(n_components=reducedDim_a, copy=True)
cca.fit(pca_v_trans_training_data,pca_a_trans_training_data)

print(cca)

#Construct Canonical Correlation Discriminant Features (CCDF) for training data and testing data
cca_tran_v_train, cca_tran_a_train = cca.transform(pca_v_trans_training_data,pca_a_trans_training_data)
cca_tran_v_test, cca_tran_a_test= cca.transform(pca_v_trans_testing_data,pca_a_trans_testing_data)


# Concatenate multiple feature for training data and testing data respectively
training_CCDF = np.concatenate((cca_tran_v_train, cca_tran_a_train),axis=1)
testing_CCDF = np.concatenate((cca_tran_v_test, cca_tran_a_test),axis=1)

print (training_CCDF.shape)
print (cca_tran_v_train.shape)


CCA(copy=True, max_iter=500, n_components=15, scale=True, tol=1e-06)
(50, 30)
(50, 15)


Train SVM classifiers through  'linear' kernel, print the training and testing accuracy and compute the confusion matrix.

In [9]:
#Train svm classifier 
clf = svm.SVC(kernel='linear')
clf.fit(training_CCDF, training_class)  
print(clf)
print(training_class.shape)
#The prediction results for training data and testing data 
prediction_train = clf.predict(training_CCDF)
prediction_test = clf.predict(testing_CCDF)

#Calculate and Print the training accuracy and testing accuracy. 
def cal_acc(str_):
    y=0
    for i in range (0, training_class.shape[0]):
        if str_[i] == training_class[i] :
            y+= 1 
    acc_str = (y/training_class.shape[0]*100) 
    print (acc_str)
    return ;
def cal_acc1(str_):
    num=0
    for i in range (0, testing_class.shape[0]):
        if str_[i] == testing_class[i] :
            num+= 1 
    acc_str = (num/testing_class.shape[0]*100) 
    print (acc_str)
    return ;
cal_acc(predict_train)
cal_acc1(predict_test)
#Compute the confusion matrix through sklearn.metrics.confusion_matrix()function for training data and testing data respectively
conf_matrix_train = confusion_matrix(training_class, predict_train)
conf_matrix_test = confusion_matrix(testing_class, predict_test)
print (conf_matrix_train, '\n',conf_matrix_test)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
    kernel='linear', max_iter=-1, probability=False, random_state=None,
    shrinking=True, tol=0.001, verbose=False)
(50, 1)
100.0
96.0
[[25  0]
 [ 0 25]] 
 [[25  0]
 [ 2 23]]


  y = column_or_1d(y, warn=True)


### Question
What is CCA? Why we use CCA here?

### Answer 

Finding linear combinations of X(X1, ..., Xn) and Y (Y1, ..., Ym) which have maximum correlation with each other.

the only technique available for examining the relationship with multiple dependent variables. 

## Task 3. 
Here is a task to recognize pain through facial-expressions and ECG signals. Firstly, use ECG signals and vidoes of facial expressions to recognize the pain results. Finally, find a way (such as multiply and sum) to combine the decisions based on ECG signals and vidoes of facial expressions to build a multi-modal recognition system for better result. 

## Load data: 
We load the facial expression data and ECG data(training data ('ecg_train' and 'video_train'), training class ('label_train'), testing data ('ecg_test' and 'video_test'), testing class('label_test')) 

In [10]:
mdata = sio.loadmat('ecg_facial_expression_video.mat')
print (mdata.keys())
#facial expression video training and testing data
training_data_video = mdata['video_train']
testing_data_video = mdata['video_test']


#ECG training and testing data, training and testing data
training_data_ecg = mdata['ecg_train'] 
testing_data_ecg = mdata['ecg_test']

#training and testing class
training_class_ecg_video = mdata['label_train']
testing_class_ecg_video = mdata['label_test']

print(testing_data_ecg)

dict_keys(['__header__', '__version__', '__globals__', 'ecg_fea', 'ecg_test', 'ecg_train', 'label_test', 'label_train', 'video_test', 'video_train'])
[[     0.              0.              0.         ...      0.
       0.              0.        ]
 [     0.              0.              0.         ... 159652.60473966
  556451.23403199  57793.99783576]
 [     0.              0.              0.         ...      0.
       0.              0.        ]
 ...
 [     0.              0.              0.         ...      0.
       0.              0.        ]
 [     0.              0.              0.         ...      0.
       0.              0.        ]
 [     0.              0.              0.         ...      0.
       0.              0.        ]]


### Feature classification
Use the __[`SVM`](http://scikit-learn.org/stable/modules/svm.html)__ function to train Support Vector Machine (SVM) classifiers.
Construct two SVM using linear kernel with C=1000 to classify facial video data and ECG data seprately. The ‘training_class’ group vector contains the class of samples: 1 = pain, 0 = no pain, corresponding to the rows of the training data matrices.

Then, calculate average classification performances for both training and testing data. The correct class labels corresponding with the rows of the training and testing data matrices are in the variables ‘training_class’ and ‘testing_class’, respectively.

In [11]:
#Train svm classifier for facial expression data

training_class_video_reshape = np.reshape(training_class_ecg_video, [-1,1])
print (training_class_video_reshape.shape)
print (training_class_ecg_video.shape)
clf = svm.SVC(C=1000,kernel='linear')
clf.fit(training_data_video.transpose(),training_class_video_reshape)  

#print (training_data_video.transpose().shape)
#print (testing_data_video.transpose().shape)
#The prediction results
prediction_train_video = clf.predict(training_data_video.transpose())
prediction_test_video = clf.predict(testing_data_video.transpose())

#print (prediction_train_video)
#print(training_class_ecg_video[0])
training_class=training_class_ecg_video[0]
#print(training_class)
#print(prediction_test_video)
#print(training_class_ecg_video[0].shape[0])

testing_class= testing_class_ecg_video[0]

#Calculate and Print the training accuracy and testing accuracy. 

def cal_acc(str_):
    x=0
    for i in range (0, training_class.shape[0]):
        if str_[i] == training_class[i] :
            x= x+ 1 
    acc_str = (x/training_class.shape[0]*100) 
    print (acc_str)
    return ;
cal_acc(prediction_train_video)

def cal_acc1(str_):
    a=0
    for i in range (0, testing_class.shape[0]):
        if str_[i] == testing_class[i] :
            a= a+ 1 
    acc_str = (a/testing_class.shape[0]*100) 
    print (acc_str)
    return ;
cal_acc1(prediction_test_video)


# Compute the confusion matrix through sklearn.metrics.confusion_matrix()function for training data and testing data respectively
print (confusion_matrix(training_class, prediction_train_video),  confusion_matrix(testing_class, prediction_test_video))

(40, 1)
(1, 40)


  y = column_or_1d(y, warn=True)


100.0
83.33333333333334
[[20  0]
 [ 0 20]] [[13  5]
 [ 0 12]]


In [12]:
#Train svm classifier for ECG signals
clf = svm.SVC(C=1000, kernel='linear')
print(training_class_ecg_video.shape)
print(training_data_ecg.shape)
print(training_data_ecg.transpose().shape)
clf.fit(training_data_ecg,training_class_ecg_video.transpose() )  

#The prediction results
prediction_train_ecg = clf.predict(training_data_ecg)
prediction_test_ecg = clf.predict(testing_data_ecg)

print(prediction_train_ecg.shape)
print(prediction_test_ecg.shape)
#Calculate and Print the training accuracy and testing accuracy. 


def cal_acc(str_):
    x=0
    for i in range (0, training_class.shape[0]):
        if str_[i] == training_class[i] :
            x= x+ 1 
    acc_str = (x/training_class.shape[0]*100) 
    print (acc_str)
    return ;
cal_acc(prediction_train_ecg)

def cal_acc1(str_):
    a=0
    for i in range (0, testing_class.shape[0]):
        if str_[i] == testing_class[i] :
            a= a+ 1 
    acc_str = (a/testing_class.shape[0]*100) 
    print (acc_str)
    return ;
cal_acc1(prediction_test_ecg)


# Compute the confusion matrix through sklearn.metrics.confusion_matrix()function for training data and testing data respectively
print (confusion_matrix(training_class, prediction_train_ecg),  confusion_matrix(testing_class, prediction_test_ecg))

(1, 40)
(40, 722)
(722, 40)
(40,)
(30,)
100.0
73.33333333333333
[[20  0]
 [ 0 20]] [[10  8]
 [ 0 12]]


  y = column_or_1d(y, warn=True)


## Decision fusion: 
Design a stragety to fuse the decisions based on facial expression and ECG data.

In [13]:
#Try to combine the decisions of ECG and videos to get a better result
prediction_test_combine=prediction_test_ecg * prediction_test_video

print(prediction_test_combine.shape)
print(testing_class.shape)

#Calculate and Print the testing accuracy. 

def cal_acc1(str_):
    a=0
    for i in range (0, testing_class.shape[0]):
        if str_[i] == testing_class[i] :
            a= a+ 1 
    acc_str = (a/testing_class.shape[0]*100) 
    print (acc_str)
    return ;
cal_acc1(prediction_test_combine)


#Compute the confusion matrix through sklearn.metrics.confusion_matrix()function for testing data respectively



(30,)
(30,)
93.33333333333333


### Question
What is difference between decision-level fusion and feautre-level fusion?

### Answer 
In feature-level fusion, the feature sets originating from multiple biometric sources are consolidated into a single feature set by the application of appropriate feature normalization, transformation, and reduction schemes.

Decision level fusion falls under a broader area known as distributed detection systems and is the process of selecting one hypothesis from multiple M hypotheses given the decisions of multiple N sensors in the presence of noise and interference


## Task 4: 
Use feature-level method (Task 2) on 10-fold cross-validation estimate of the emotion recognition system performance
* Join the training/testing data matrices and the class vectors. Combine also the ‘training_data_personID’ and ‘testing_data_personID’ vectors that are needed to make the CV folds.
* Construct the CV folds by training ten SVMs. For each SVM nine persons’ data is used as the training set (i.e. 90 samples) and one persons’ samples are kept as the test set (i.e. 10 samples) for the respective fold (i.e. each SVM has different persons’ samples excluded from the training set). Test each ten trained SVMs by using the corresponding one held-out persons’ samples and then calculate the average classification performances for each fold.
* Calculate the mean and SD of the ten CV fold performances to produce the final CV performance estimate of the emotion recognition system. 

In [23]:
data4 = sio.loadmat('lab3_data.mat')

print(data4.keys())
personID_train = data4['training_personID']
personID_test = data4['testing_personID']
training_class = data4['training_class']
testing_class = data4['testing_class']

cca = CCA(n_components=reducedDim_a)
cca.fit(pca_v_trans_training_data,pca_a_trans_training_data)

cca_tran_v_train, cca_tran_a_train = cca.transform(pca_v_trans_training_data,pca_a_trans_training_data)
cca_tran_v_test, cca_tran_a_test= cca.transform(pca_v_trans_testing_data,pca_a_trans_testing_data)

training_CCDF = np.concatenate((cca_tran_v_train, cca_tran_a_train),axis=1)
testing_CCDF = np.concatenate((cca_tran_v_test, cca_tran_a_test),axis=1)

feature_CCDF_all = np.concatenate ((training_CCDF,testing_CCDF),axis=0)
personID_all = np.concatenate((personID_train, personID_test))
class_all = np.concatenate((training_class, testing_class))



#Transform training_data and testing data respectively
#pca_all_trans_training_data =pca_all.transform(training_data)


uniquePerson = np.unique(personID_all)
print(uniquePerson)

array = np.zeros(len(uniquePerson))
print(array)
print (np.where(personID_all == 1)[0])

i = 0
for personID in uniquePerson:
    
    ind_test = np.where(personID_all == personID)[0]
    #print(ind_test)
    ind_train = np.where(personID_all != personID)[0]
    #print(ind_train)
    
    sample_test = feature_CCDF_all [ind_test]
    label_test = class_all [ind_test]
    
    sample_train = feature_CCDF_all [ind_train]
    label_train = class_all [ind_train]


    cl_sample = svm.SVC(kernel='linear')

    cl_sample.fit(sample_train, label_train) 
    pred = cl_sample.predict(sample_test)
    print(pred)
    array[i] = sum(pred == label_test.transpose()[0])/len(ind_test)
    i = i+ 1

print (array)

dict_keys(['__header__', '__version__', '__globals__', 'speech_sample', 'testing_class', 'testing_data_mfcc', 'testing_data_proso', 'testing_personID', 'training_class', 'training_data_mfcc', 'training_data_proso', 'training_personID', 'training_data', 'testing_data'])
[ 1  2  3  4  5  7  8  9 10 12]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0 1 2 3 4 5 6 7 8 9]
[1 1 1 1 1 2 2 2 2 2]
[1 1 1 1 2 2 2 2 2 2]
[1 1 1 1 1 2 2 2 1 2]
[1 1 1 1 1 2 2 2 2 2]
[1 1 1 2 2 2 2 2 2 2]
[1 1 1 1 1 2 2 2 2 2]
[1 2 1 1 1 2 2 2 2 2]
[1 1 1 1 1 2 2 2 2 2]
[1 1 1 1 1 2 2 2 1 1]
[1 1 1 1 1 2 2 2 2 2]
[1.  0.9 0.9 1.  0.8 1.  0.9 1.  0.8 1. ]


  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


In [None]:
    personID =1
    ind_test = np.where(personID_all == personID)[0]
    #print(ind_test)
    ind_train = np.where(personID_all != personID)[0]
    #print(ind_train)
    
    sample_test = feature_CCDF_all [ind_test]
    label_test = class_all [ind_test]
    
    sample_train = feature_CCDF_all [ind_train]
    label_train = class_all [ind_train]


    cl_sample = svm.SVC(kernel='linear')

    cl_sample.fit(sample_train, label_train) 
    pred = cl_sample.predict(sample_test)
    print(pred)
    print(label_test.transpose()[0])
    print(len(ind_test))
    
    array[i] = sum(pred == label_test.transpose()[0])/len(ind_test)
    i = i+ 1