## Objective

The task is to use the feature-level method to combine facial expression features and audio features. A multi-modal emotion recognition system is constructed to recognize happy versus sadness facial expressions (binary-class problem) by using a classifier training and testing structure.

The original data is based on lab1 and lab2, from ten actors acting happy and sadness behaviors. 
* Task 1: Subspace-based feature fusion method: In this case, z-score normalization is utilized. Please read “Fusing Gabor and LBP feature sets for kernel-based face recognition” and learn how to use subspace-based feature fusion method for multi-modal system.

* Task 2: Based on Task 1, use Canonical Correlation Analysis to calculate the correlation coefficients of facial expression and audio features. Finally, use CCA to build a multi-modal emotion recognition system. The method is described in one conference paper “Feature fusion method based on canonical correlation analysis and handwritten character recognition”
* Task 3: Based on Task 1, create a Leave-One-Subject-Out (LOSO) cross-validation to estimate the performance more reliably.

To produce emotion recognition case, Support Vector Machine (SVM) classifiers are trained.  50 videos from 5 participants are used to train the emotion recognition systems by using spatiotemporal features. The rest of the data (50 videos) are used to evaluate the performances of the trained recognition systems.

## Task 1. Subspace-based method  
Please read “Fusing Gabor and LBP feature sets for kernel-based face recognition” and apply their framework for the exercise. We use Support Vector Machine (SVM) with linear kernel for classification. As opposed to using Gabor features we are using the prosodic features from the last exercise.


### Setting up the environment 

First, we need to import the basic modules for loading the data and data processing

In [1]:
import sys
sys.path.append('../')
from skimage import io
from skimage import transform
from skimage import color
from skimage import img_as_ubyte
import os
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import sklearn
import scipy.io as sio

### Loading data  <font color='red'>(0.5 point)</font>

We load the facial expression data (training data, training class, testing data, testing class) and audio data (training data, testing data)

In [3]:
mdata = sio.loadmat('lab3_data.mat')

# Facial expression training and testing data, training and testing class
training_data = mdata['training_data']
testing_data = mdata['testing_data']
training_class = mdata['training_class']
testing_class = mdata['testing_class']

# Audio training and testing data
training_data_proso = mdata['training_data_proso']
testing_data_proso = mdata['testing_data_proso']

### Extract the subspace for facial expression features and audio features <font color='red'>(2 point)</font>
Extract the subspace for facial expression features and audio features using principal component analysis through using **PCA class**.
The `reduced_dim` is the dimensionality of the reduced subspace.
Set `reduced_dim` to 20 and 15 for facial expression features and audio features, respectively. Normalization should be done subject wise. The test data should be normalized with the values from the training data.
For concatenating the features use the __[`np.concatenate()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html)__ function.

You will implement the PCA class with two methods, **fit** and **transform**. The **fit** method takes one input array with no return values and the **transform** method takes one input array and returns a transformed array with dimensions. Use (__[`numpy.linalg.svd`](https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html)__) for singular values extraction.

In [4]:
class PCA:
    """Principal component analysis (PCA).
    Parameters
    ----------
    n_components : int
        Number of principal components to use.
    whiten : bool, default=False
        When true, the output of transformed features is divided by the
        square root of the explained variance.
    Examples
    --------
    >>> import numpy as np
    >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
    >>> pca = PCA(n_components=2)
    >>> pca.fit(X)
    >>> pca.transform(X)
    >>> array([[ 1.38340578,  0.2935787 ],
               [ 2.22189802, -0.25133484],
               [ 3.6053038 ,  0.04224385],
               [-1.38340578, -0.2935787 ],
               [-2.22189802,  0.25133484],
               [-3.6053038 , -0.04224385]])
    """
    def __init__(self, n_components: int, whiten: bool = False) -> None:
        self.n_components = n_components
        self.whiten = whiten
        self.selected_components = None
        self.mean = None 
                   
    def fit(self, X: np.ndarray) -> None:
        """Fit the model with X.
        Parameters
        ----------
        X : a numpy array with dimensions (n_samples, n_features)
        """        
        #Step 1: Find the mean, and center the data
        self.mean = np.mean(X)
        X_centered = X-self.mean
        
        #Step2:  Find the Covariance
        cov = np.cov(X_centered, rowvar=False)

        #Step 3: Apply SVD and choose the components, make the hermitian argument True.
        SV = np.linalg.svd(hermitian=True)
        self.selected_components = SV[: self.n_components]
        # choose the singular values of diagnal matrix
        self.explained_variance = SV[: self.n_components]
    
    def transform(self, X: np.ndarray) -> np.ndarray:
        """Transform X with the fitted model.
        Parameters
        ----------
        X : a numpy array with dimensions (n_samples, n_features)
        
        Returns
        -------
        X_transformed: a numpy array with dimensions (n_samples, n_components)
        """
        # Center the data 
        
        X = X - self.mean
        
        # Step 4: Choose and transform the features
        X_transformed = p.dot(X, self.selected_components.T)
        if self.whiten:
            # Normalize the transform features
            X_transformed /= np.sqrt(self.explained_variance)
        return X_transformed
        

In [5]:
#from sklearn.decomposition import PCA 
from scipy import stats
from sklearn.decomposition import PCA
#Set Reduced_dim for facial expression features and audio features, respectively.
reduced_dim_v = 20
reduced_dim_a = 15

#Extract the subspace for facial expression features though PCA. 
#If you are using sklearn use random_state=0, to ensure consistant results
pca_v = PCA(n_components=reduced_dim_v)
pca_v.fit( training_data)

#Transform training_data and testing data respectively
transformed_training_data_fe = pca_v.transform(training_data)
transformed_testing_data_fe = pca_v.transform(testing_data)


#Extract the subspace for audio features though PCA
pca_a = PCA(n_components=reduced_dim_a)
pca_a.fit(training_data_proso)

#Transform the training_data and testing_data respectively

transformed_training_data_audio = pca_a.transform(training_data_proso)
transformed_testing_data_audio = pca_a.transform(testing_data_proso)


#Normalize the features

transformed_training_data_fe = stats.zscore(transformed_training_data_fe, axis=0)
transformed_testing_data_fe = stats.zscore(transformed_testing_data_fe, axis=0)

transformed_training_data_audio = stats.zscore(transformed_training_data_audio, axis=0)
transformed_testing_data_audio = stats.zscore(transformed_testing_data_audio, axis=0)



#Concatenate the transformed training data of facial expression features and audio features together
combined_train = np.concatenate((transformed_training_data_fe, transformed_training_data_audio), axis=1)

#Concatenate the transformed testing data of facial expression features and audio features together
combined_test = np.concatenate((transformed_testing_data_fe, transformed_testing_data_audio), axis=1)

### Question 1. Why is PCA used? Why not just concatenate the extracted features without PCA? <font color='red'>(0.5 point)</font>

### Your answer:

PCA is used for reducing dimensionality, noise reduction, data visualizations and improved model performance.

Concatenating features without dimensionality reduction can lead to issues such as, increased computation and memory requirements and a higher risk of overfitting, especially when dealing with a large number of features. PCA  addresses these challenges by retaining the essential information while mitigating the problems associated with high-dimensional data. 

### Feature classification <font color='red'>(0.5 point)</font>
Use the __[`SVM`](http://scikit-learn.org/stable/modules/svm.html)__ function to train Support Vector Machine (SVM) classifiers.
Construct a SVM using the combined training data and linear kernel. The `training_class` group vector contains the class of samples: 1 = happy, 2 = sadness, corresponding to the rows of the training data matrices.

Then, calculate average classification performances for both training and testing data. The correct class labels corresponding with the rows of the training and testing data matrices are in the variables ‘training_class’ and ‘testing_class’, respectively.

In [6]:
from sklearn import svm
from sklearn.metrics import accuracy_score

# Train SVM classifier
svm_classifier = svm.SVC(kernel='linear')
svm_classifier.fit(combined_train, training_class.ravel())  # Train the classifier

# Predict the class labels for training and testing data
train_predictions = svm_classifier.predict(combined_train)
test_predictions = svm_classifier.predict(combined_test)

# Calculate and print the training accuracy and testing accuracy
train_accuracy = accuracy_score(training_class, train_predictions)
test_accuracy = accuracy_score(testing_class, test_predictions)


print(train_accuracy)
print(test_accuracy)

1.0
1.0


### <font color='red'>(0.5 point)</font>
Compute the confusion matrices using __[`sklearn.metrics.confusion_matrix()`](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)__function for both the training data and testing data.


In [7]:
from sklearn.metrics import confusion_matrix

# Compute confusion matrix for training data
train_confusion_matrix = confusion_matrix(training_class, train_predictions)
print(train_confusion_matrix)

# Compute confusion matrix for testing data
test_confusion_matrix = confusion_matrix(testing_class, test_predictions)
print(test_confusion_matrix)


[[25  0]
 [ 0 25]]
[[25  0]
 [ 0 25]]


## Task 2. 
As opposed to a simple concatenation we can try something smarter that utilizes the common characteristics of the fused features. This is achieved using the CCA. Use the PCA transformed vectors and set the number of components for the CCA to be 15.


### <font color='red'>(1 point)</font>

Use (__[`sklearn.cross_decomposition.CCA()`](http://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.CCA.html)__) function to calculate the correlation coefficients of facial expression features and audio features. For `n_components` of CCA use the same number as the reduced dimensionality of the audio features in the previous task.

In [8]:
from sklearn.cross_decomposition import CCA
import numpy as np

#Use CCA to construct the Canonical Projective Vector (CPV)
cca = CCA(n_components= 15)
cca.fit(transformed_training_data_fe, transformed_training_data_audio)

#Construct Canonical Correlation Discriminant Features (CCDF) for both the training data and testing data
cca_train_features = cca.transform(transformed_training_data_fe, transformed_training_data_audio)
cca_test_features = cca.transform(transformed_testing_data_fe, transformed_testing_data_audio)


# Concatenate the CCA transformed features for training data and testing data

combined_train_cca = np.concatenate((cca_train_features[0], cca_train_features[1]), axis=1)
combined_test_cca = np.concatenate((cca_test_features[0], cca_test_features[1]), axis=1)


### <font color='red'>(1 point)</font>
Train a SVM classifier using a linear kernel, print the training and testing accuracy and compute the confusion matrix.

In [9]:
#Train svm classifier 

svm_classifier_cca = svm.SVC(kernel='linear')
svm_classifier_cca.fit(combined_train_cca, training_class.ravel())


#The prediction results

train_predictions_cca = svm_classifier_cca.predict(combined_train_cca)
test_predictions_cca = svm_classifier_cca.predict(combined_test_cca)



#Calculate and print the training accuracy and testing accuracy. 

train_accuracy_cca = accuracy_score(training_class, train_predictions_cca)
test_accuracy_cca = accuracy_score(testing_class, test_predictions_cca)

print(f"Training Accuracy (CCA): {train_accuracy_cca}")
print(f"Testing Accuracy (CCA): {test_accuracy_cca}")



# Compute the confusion matrix using sklearn.metrics.confusion_matrix() function for training data and testing data respectively

train_confusion_matrix_cca = confusion_matrix(training_class, train_predictions_cca)
test_confusion_matrix_cca = confusion_matrix(testing_class, test_predictions_cca)

print(train_confusion_matrix_cca)
print(test_confusion_matrix_cca)


Training Accuracy (CCA): 1.0
Testing Accuracy (CCA): 1.0
[[25  0]
 [ 0 25]]
[[25  0]
 [ 0 25]]


### Question 2. In this exercise a feature-level method was used to fuse the features. What are the other types of methods for data fusion? <font color='red'>(0.5 point)</font>

### Your answer:

Data fusion is the process of combining information from multiple sources or features to improve the overall understanding or analysis of a given data source. In addition to feature-level methods, there are several other types of data fusion methods,

1. **Decision-level Method**: In this method, decisions from individual sources are combined to make a final decision. This is commonly used in ensemble methods, such as, majority voting or weighted average.

2. **Match-score level Method**: This approach refers to such integration where the similarity scores between the corresponding features from different data sources are combined or compared to make new decisions. It is typically used when dealing with multiple modalities of data, such as text, images, audio etc. and the goal is to find relationships between them.

3. **Model-level Method**: This method involves integrating information from multiple models to create a more robust model. This can be done through techniques like stacking or ensembling multiple models and combines model outputs for better performance.

4. **Sensor-level Method**: This approach involves combining raw data from multiple sensors or sources before feature extraction or any other processing step.

### Question 3. Compare the results from all the the different methods from assignments 1, 2 and 3. What method performed the best? What was the worst? Hypothesize as to why certain methods performed better than others. <font color='red'>(0.5 point)</font>

### Your answer:

We have used three methods,

1. Subspace-based Feature Fusion with SVM (linear kernel) Classifier.
2. Canonical Correlation Analysis (CCA) for Feature Fusion with SVM (linear kernel) Classifier.
3. Leave-One-Subject-Out (LOSO) Cross-Validation with PCA and SVM (linear kernel) Classifier.

Based on the results in Task 1 & 2, Subspace-based Feature Fusion with SVM and Canonical Correlation Analysis (CCA) for Feature Fusion with SVM performed well in the test dataset. However, the performance of the Leave-One-Subject-Out (LOSO) Cross-Validation with PCA and SVM was not upto the mark comparing to the other two models.

## Task 3: 
For a more reliable evaluation, often the Leave-One-Subject-Out (LOSO) cross-validation is used instead of the common train-test split. Cross-validation gives us a more reliable measure of the performance as all of the data is used for both training and testing. LOSO is used as emotions are highly dependent on the subject. By using LOSO, we guarantee that a subject is always in either the training or testing data and not in both.

* Join the training/testing data matrices and the class vectors. Combine also the ‘training_data_personID’ and ‘testing_data_personID’ vectors.

* Assume we have a total of $n$ subjects. Now, we will create a total of $n$ folds (loops), where each folds' training set contains the data from $n-1$ subjects and the testing set consists of only $1$ subject.

* Follow the steps taken in the first task: project the data to a subspace using PCA, conatenate the audio and video features together, train an SVM and finally evaluate the performance.

* The solution should be able to generalize over different numbers of subjects and samples, *e.g.*, a dataset may have 24 subjects, where subject1 has 4 samples and subject2 has 32 samples.

### <font color='red'>(0.5 point)</font>

In [10]:
mdata = sio.loadmat('lab3_data.mat')

#Combine the training data, testing data,label and persion ID for video and audio respectively, in order to get the whole dataset. 
lbp_data = np.vstack((mdata['training_data'], mdata['testing_data']))
proso_data =  np.vstack((mdata['training_data_proso'], mdata['testing_data_proso']))

labels = np.vstack((mdata['training_class'], mdata['testing_class']))
subjects = np.vstack((mdata['training_personID'], mdata['testing_personID']))

#Get the number of the subject
subject_ids = np.unique(subjects)

#Print the shapes and the list of subject_ids for a sanity check

# Print the shapes of the data
print("Shape of lbp_data:", lbp_data.shape)
print("Shape of proso_data:", proso_data.shape)
print("Shape of labels:", labels.shape)
print("Shape of subjects:", subjects.shape)

# Print the list of subject IDs
print("List of subject IDs:", subject_ids)


Shape of lbp_data: (100, 708)
Shape of proso_data: (100, 15)
Shape of labels: (100, 1)
Shape of subjects: (100, 1)
List of subject IDs: [ 1  2  3  4  5  7  8  9 10 12]


### <font color='red'>(2 point)</font>

In [11]:
accuracies = []
#Loop over each subject
for subject_id in subject_ids:
    #Create a boolean array for the training and testing set indices
    #The train_idx should be a list of form [True, True, False, ...], where True indicates the position
    #for the samples that are not the current subject_id
    train_idx = (subjects != subject_id).flatten()
    test_idx = (subjects == subject_id).flatten()
    #Similar for the test_idx, True indicates the position of the current subject_id
    
    #Create the training and testing sets for lbp, proso and labels by indexing lbp_data, proso_data and labels
    #with the boolean arrays train_idx and test_idx
    train_lbp = lbp_data[train_idx]
    test_lbp = lbp_data[test_idx]
    train_proso = proso_data[train_idx]
    test_proso = proso_data[test_idx]
    train_labels = labels[train_idx]
    test_labels = labels[test_idx]
    
    #Create the PCA for both lbp and proso. We take a slight shortcut compared to task 1,
    #by using the whiten=True parameter for normalizing the features. This means that
    #there is no need for normalization afterwards
    pca_v = PCA(n_components=20, whiten=True)
    pca_a = PCA(n_components=15, whiten=True)
    
    #Fit the PCAs with the training data
    pca_v.fit(train_lbp)
    pca_a.fit(train_proso)
    
    
    #Transform both the training and testing data with the PCA
    transformed_train_lbp = pca_v.transform(train_lbp)
    transformed_test_lbp = pca_v.transform(test_lbp)
    transformed_train_proso = pca_a.transform(train_proso)
    transformed_test_proso = pca_a.transform(test_proso)
    
    
    
    #Concatenate the features together
    combined_train = np.concatenate((transformed_train_lbp, transformed_train_proso), axis=1)
    combined_test = np.concatenate((transformed_test_lbp, transformed_test_proso), axis=1)
    
    
    #Create a linear SVM and train it
    
    svm_classifier = svm.SVC(kernel='linear')
    svm_classifier.fit(combined_train, train_labels.ravel())
    
    
    #Calculate the accuracy for the testing data and add it to the list of accuracies
    test_predictions = svm_classifier.predict(combined_test)
    
    test_accuracy = accuracy_score(test_labels, test_predictions)
    accuracies.append(test_accuracy)

    
    
#Calculate the average of the accuracies. Print both the list of accuracies and the average    
average_accuracy = np.mean(accuracies)

print("List of Accuracies:", accuracies)
print("Average Accuracy:", average_accuracy)


List of Accuracies: [0.9, 0.8, 1.0, 0.9, 0.9, 1.0, 1.0, 1.0, 0.8, 1.0]
Average Accuracy: 0.93


### Question 4. The accuracy of LOSO (0.93) is lower than the accuracy achieved by the train-test split (0.98) in task 1. Hypothesize as to why the two are different. Which one is better for evaluation?  <font color='red'>(0.25 point)</font>

### Your answer:

The difference in accuracy between Leave-One-Subject-Out (LOSO) cross-validation (0.93) and train-test split (0.98) in Task 1 can be attributed to several factors. To hypothesize why they differ, consider the following possibilities,

1. Data Imbalancing in the train dataset.
2. Overfitting the trained model.
3. Sample data size was small for training the model.

The choice between LOSO and train-test split depends on the evaluation and characteristics of the dataset. However, in this exercise module, for this dataset, train-test split performed better in terms of evaluation than LOSO.

### Question 5. In PCA why `whiten` parametere is better and why it replaces the normalization?  <font color='red'>(0.25 point)</font>

### Your answer:

The `whiten` parameter in PCA is preferred over normalization because it decorrelates the features and also scales them to have unit variance. This simplifies data preprocessing, dimensionality reduction and enhances the performance of machine learning models.

In [None]:
import numpy as np

# Normalize facial expression features
train_v_mean = np.mean(train_tr_v, axis=0)
train_v_std = np.std(train_tr_v, axis=0)
train_tr_v = (train_tr_v - train_v_mean) / train_v_std
test_tr_v = (test_tr_v - train_v_mean) / train_v_std  

# Normalize audio features
train_a_mean = np.mean(train_tr_a, axis=0)
train_a_std = np.std(train_tr_a, axis=0)
train_tr_a = (train_tr_a - train_a_mean) / train_a_std
test_tr_a = (test_tr_a - train_a_mean) / train_a_std 

