# Explaining Facial Expression Recognition with LIME
## Notebook 1:  XAI for Affective Computing (SoSe2022)

In this notebook, using the [LIME python package](https://github.com/marcotcr/lime) you will attempt to explain predictions of two Facial Expression recognition models, trained using a sample of the [AffectNet dataset](http://mohammadmahoor.com/affectnet/). AffectNet is a dataset of facial expressions expression in the wild, and is labeled with 8 facial expression categories: **Neutral, Happy, Sad, Surprise, Fear, Disgust, Anger, and Contempt**. (If interested, have a look at the paper https://arxiv.org/abs/1708.03985). 

In **Part 1**, will first explore the explanations of an already trained Convolutional Neural Network trained on raw face images, using LIME Image Explainer.  Then in **Part 2**, we will explore the explanations of a pretrained Random Decision Forest trained on [Facial Action Units](https://imotions.com/blog/facial-action-coding-system/) that were extracted from the face images using [OpenFace2.0](https://github.com/TadasBaltrusaitis/OpenFace). 

To use this notebook, please make sure to go step by step through each of the cells review the code and comments along the way.

See **README** To get Started

## Part 0: Notebook Setup

In [None]:
%load_ext autoreload
%autoreload 2

##### Import necessary libraries

(see README for necessary package installations if you receive a `module not found` error.

In [None]:
import pickle
from pathlib import Path

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

# import tensorflow for model loading
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import to_categorical

# import sklearn for processing data and results
from sklearn.metrics import confusion_matrix, classification_report, auc, roc_curve, roc_auc_score
from sklearn.preprocessing import LabelBinarizer

# import model loading function
from model import cnn_model

from IPython.display import clear_output
import warnings
warnings.filterwarnings('ignore')

##### Helper functions for plotting faces

In [None]:
# Helper Functions
def display_one_image(image, title, subplot, color='black', mask=None):
    plt.subplot(subplot)
    plt.axis('off')
    plt.imshow(image, )
    plt.title(title, fontsize=16)
    
def display_nine_images(images, titles, preds, start, title_colors=None):
    subplot = 331
    plt.figure(figsize=(13,13))
    for i in range(9):
        color = 'black' if title_colors is None else title_colors[i]
        idx = start+i
        display_one_image(images[idx], f'Actual={titles[idx]} \n Pred={preds[idx]} \n Index = {idx}', 331+i, color)
    # plt.tight_layout()
    plt.subplots_adjust(wspace=0.1, hspace=0.4)
    plt.show()

def image_title(label, prediction):
  # Both prediction (probabilities) and label (one-hot) are arrays with one item per class.
    class_idx = np.argmax(label, axis=-1)
    prediction_idx = np.argmax(prediction, axis=-1)
    if class_idx == prediction_idx:
        return f'{CLASS_LABELS[prediction_idx]} [correct]', 'black'
    else:
        return f'{CLASS_LABELS[prediction_idx]} [incorrect, should be {CLASS_LABELS[class_idx]}]', 'red'

def get_titles(images, labels, model):
    predictions = model.predict(images)
    titles, colors = [], []
    for label, prediction in zip(classes, predictions):
        title, color = image_title(label, prediction)
        titles.append(title)
        colors.append(color)
    return titles, colors

## Part 1:  Local Explations of Facial Expression Recognition with Images

##### Set our global variables

In [None]:
SEED = 12
IMG_HEIGHT = 128
IMG_WIDTH = 128
BATCH_SIZE = 80
NUM_CLASSES = 8
CLASS_LABELS = ['Neutral', 'Happy', 'Sad', 'Surprise', 'Fear', 'Disgust', 'Anger', 'Contempt']

### Load Pretrained CNN Model and Setup Data Generator

In [None]:
# make sure you've downloaded the models from LernraumPlus (see README instructions for Notebook I)
model_path = '../models/affectnet_model_e=60/affectnet_model'

# test loading weights
model_xai = cnn_model(input_shape=(IMG_HEIGHT, IMG_WIDTH, 3), num_classes=NUM_CLASSES)
model_xai.load_weights(model_path)

In [None]:
test_dir = '../data/affectnet/val_class/'
# test_dir = '../localdata/affectnet/val_class/'


# Load data
test_datagen = ImageDataGenerator(validation_split=0.2,
                                  rescale=1./255)
test_gen = test_datagen.flow_from_directory(directory=test_dir,
                                            target_size=(IMG_HEIGHT, IMG_WIDTH),
                                            batch_size=BATCH_SIZE,
                                            shuffle=False,
                                            color_mode='rgb',
                                            class_mode='categorical', 
                                            seed = SEED)
images, classes = next(test_gen)

### Evaluation and Predictions
Here we evaluate the loaded model to ensure it is working as expected.  You should get around $48.75\%$ accuracy. While this is not a perfect classifier is well above random guessing which is $1 / 8 * 100 = 12.5$ accuracy

Then we load predictions to use throughout the notebook. 

The predictions results can then be viewed with a confusion matrix to see where the model is confused

In [None]:
loss, acc = model_xai.evaluate(test_gen, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100 * acc))

In [None]:
# get softmax predictions from model
preds = model_xai(images)

# convert predictions to integers
y_pred = np.argmax(preds, axis=-1)
y_true = np.argmax(classes, axis=-1)

In [None]:
# we can also review the confusion matrix
cm_data = confusion_matrix(y_true, y_pred)
cm = pd.DataFrame(cm_data, columns=CLASS_LABELS, index = CLASS_LABELS)
cm.index.name = 'Actual'
cm.columns.name = 'Predicted'
plt.figure(figsize = (20,10))
plt.title('Confusion Matrix', fontsize = 20)
sns.set(font_scale=1.2)
ax = sns.heatmap(cm, cbar=False, cmap="Blues", annot=True, annot_kws={"size": 16}, fmt='g')

## Task 1: LIME Local Prediction Explanations

Now that we have our model setup, we will review the images and predictions to identify a few data instances to explain.  

### Task 1.0
- Try changing start value to get a new set of images (there are 10 images for each class, so for example, the class happy will be at indexes 10-19)
- Search through the images to find at least 4 to explain 
    - Find classes that you would like to explain, and from each class select 2 images
        - one should be a correct prediction  
        - and one should be an incorrect prediction

In [None]:
# displays first 9 images in array
start = 0

true_labels = [CLASS_LABELS[idx] for idx in y_true]
pred_labels = [CLASS_LABELS[idx] for idx in y_pred]
display_nine_images(images, true_labels, pred_labels, start)

In [None]:
#### Enter the Indexes Here ### 
###############################
# you will use this array later in this task
img_idxs = []


### Task 1.1 Implement a LIME Image Explainer

Implement a [LimeImageExplainer](https://lime-ml.readthedocs.io/en/latest/lime.html#module-lime.lime_image) instance, you can review the [LIME tutorial](https://github.com/marcotcr/lime/blob/master/doc/notebooks/Tutorial%20-%20Image%20Classification%20Keras.ipynb) for help. 

*Hint*: Use a to loop iterate through your `img_idxs` array to create a seperate explainer instance for each.

In [None]:
import lime
from lime import lime_image
from lime.wrappers.scikit_image import SegmentationAlgorithm

from skimage.segmentation import mark_boundaries # used to get boundries from explanation for plotting

In [None]:
##### YOUR CODE GOES HERE #####
###############################


#### Task 1.1
Print the predicted labels for the top $N$ labels as found by explainer

In [None]:
##### YOUR CODE GOES HERE #####
###############################


### Task 1.2: Visualize Explanations
Visualize the explanations for each of the 4 data points from LIME using matplotlib's `imshow` function (see above tutorial). (Or pass the explanation to the `display_one_image` function defined above.)

*HINT*: Use the `subplot` parameter of the `display_one_image` to plot a 2x2 grid.  The value should be an integer formated as `RCN` where `R` is the number of rows, `C` is the number of columns, and `C` is the number of the image to plot.  For example, `221` means to plot the first image of a 2x2 grid, `222` means the plot the second images, and so forth... (also see `display_nine_images` for example of this usage.)

Experimenting with at least 2 different sets of parameters for the explanation visualizations.  For example, view positive and negative contributions, change the number of features for the explation, or try visualizing a heatmap)

In [None]:
##### YOUR CODE GOES HERE #####
###############################
# (you can use more than one notebook cell for this task)



### Task 1.3 Report your findings?
What are your findings? Can you identify any patterns that explain how the model is working? Are you more or less confident in the model's performance after reviewing the explanations?

Write here findings here...

## PART 2 LIME Submodular Pick

Now let's have a look at using LIME's submodular pick (SP-LIME).  Remember the goal of SP-LIME is to identify a set of data instances that maximize the coverage of the explanations.  So the SP-LIME module will identify $N$ instances to explain, then you can plot the explanations.  

Unfortunately, SP-LIME does not work with images directly, so now we will work with a dataset of [Facial Action Units](https://imotions.com/blog/facial-action-coding-system/) extracted from the AffectNet dataset using [OpenFace2.0](https://github.com/TadasBaltrusaitis/OpenFace). In this way, we can now apply SP-LIME to our dataset. 

In this section, we will now use a pretrained [Random Decision Forest](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) as our classifier instead of a Deep Neural Network.  

### Load the data

In [None]:
# Full data from training and evaluation
train_csv = '../data/affectnet_aus/train_aus.csv'
val_csv = '../data/affectnet_aus/val_aus.csv'

# load training and validation data as pandas dataframeas
df_train = pd.read_csv(train_csv)
df_val = pd.read_csv(val_csv)

# smaller dataset for explanations (same data as in Task 1)
xai_csv = '../data/affectnet_aus/eval_aus.csv'
df_xai = pd.read_csv(xai_csv)

# get only the columns storing action units from the dataframe
feature_cols = [col for col in df_val if col.startswith('AU')]

CLASS_LABELS = ['Neutral', 'Happy', 'Sad', 'Surprise', 'Fear', 'Disgust', 'Anger', 'Contempt']  # same class labels as before

In [None]:
# convert data from dataframe to Numpy arrays

X_train = np.array(df_train.loc[:, feature_cols])
y_train = np.array(df_train['class'])

X_test = np.array(df_val.loc[:, feature_cols])
y_test = np.array(df_val['class'])

X_xai = np.array(df_xai.loc[:, feature_cols])
y_xai = np.array(df_xai['class'])

print('Train', X_train.shape, y_train.shape)
print('Test', X_test.shape, y_test.shape)
print('XAI', X_xai.shape, y_xai.shape)

### Load pretrained RDF model
And validate that it works.  
The accuracy of the model should be around $99.65%$

In [None]:
with open('../models/affect_rdf.pkl', 'rb') as f:
    clf = pickle.load(f)
    
clf.score(X_train, y_train)

#### Now evaluate on full test data
Unfortunately, the accuracy is only $43\%$ but this is well above chance guessing which would get $1 / 8 * 100 = 12.5%$ accuracy (since there are 8 total classes)

In [None]:
# get model predictions
y_test_preds = clf.predict(X_test)
y_test_true = y_test

In [None]:
print(classification_report(y_test_true, y_test_preds))

We can also review the confusion matrix to see where the model is confused

In [None]:
cm_data = confusion_matrix(y_test_true, y_test_preds)
cm = pd.DataFrame(cm_data, columns=CLASS_LABELS, index=CLASS_LABELS)
cm.index.name = 'Actual'
cm.columns.name = 'Predicted'
plt.figure(figsize = (20,10))
plt.title('Confusion Matrix', fontsize = 20)
sns.set(font_scale=1.2)
ax = sns.heatmap(cm, cbar=False, cmap="Blues", annot=True, annot_kws={"size": 16}, fmt='g')

#### Evaluate in XAI Data
The XAI data is a subset of the test data

In [None]:
# get model predictions
y_xai_preds = clf.predict(X_xai)
y_xai_true = y_xai
print(classification_report(y_xai_true, y_xai_preds))

In [None]:
cm_data = confusion_matrix(y_xai_true, y_xai_preds)
cm = pd.DataFrame(cm_data, columns=CLASS_LABELS, index=CLASS_LABELS)
cm.index.name = 'Actual'
cm.columns.name = 'Predicted'
plt.figure(figsize = (20,10))
plt.title('Confusion Matrix', fontsize = 20)
sns.set(font_scale=1.2)
ax = sns.heatmap(cm, cbar=False, cmap="Blues", annot=True, annot_kws={"size": 16}, fmt='g')

### TASK 2: SP-LIME Implementation  

Now on to the implementation of SP-LIME. 

#### Task 2.0: Identify Some Images to Explain
- Try changing start value to get a new set of images (there are 10 images for each class, so for example, the class happy will be at indexes 10-19)
### Task 2.0
- Try changing start value to get a new set of images (there are 10 images for each class, so for example, the class happy will be at indexes 10-19)
- Search through the images to find at least 4 to explain 
    - Find classes that you would like to explain, and from each class select 2 images
        - one should be a correct prediction  
        - and one should be an incorrect prediction
    - can be the same, or different, as in task 1

In [None]:
# displays first 9 images in array
start = 10

images = [io.imread(f) for f in df_xai.image]

true_labels = [CLASS_LABELS[idx] for idx in y_xai_true]
pred_labels = [CLASS_LABELS[idx] for idx in y_xai_preds]
display_nine_images(images, true_labels, pred_labels, start)

In [None]:
#### Enter the Indexes Here ### 
###############################
# you will use this array later in the task
img_idxs = []


#### TASK 2.1
Implement a [LimeTabularExplainer](https://lime-ml.readthedocs.io/en/latest/lime.html#module-lime.lime_tabular), you can review the [LIME tutorial](https://marcotcr.github.io/lime/tutorials/Tutorial%20-%20continuous%20and%20categorical%20features.html) for help.

*Hint*: Use a to loop iterate through your `img_idxs` array to create a seperate explainer instance for each.

In [None]:
import lime.lime_tabular

In [None]:
##### YOUR CODE GOES HERE #####
###############################


#### TASK 2.2
Review at the previously identified 4 data instances from the `X_xai` dataset, by getting an explanation from the tabular explainer, and then plotting the explanations for each data instance (see tutorial mentioned above).  

HINT: Use the subplot parameter of the display_one_image to plot a 2x2 grid. The value should be an integer formated as RCN where R is the number of rows, C is the number of columns, and C is the number of the image to plot. For example, 221 means to plot the first image of a 2x2 grid, 222 means the plot the second images, and so forth... (also see display_nine_images for example of this usage.)

Make sure to print out the **True** and **Predicted** labels for each instance.

Try experimenting with different parameters for the explainer and explanation.

**Bonus Task: Include images**   
The data frame includes paths to the images (in the column named `image`) that correspond with the AU features.  Load the images so you can compare the AUs with the actual data. (you can use [`skimage io`](https://scikit-image.org/docs/dev/user_guide/getting_started.html) to load the image) 

In [None]:
##### YOUR CODE GOES HERE #####
###############################


In [None]:
img = io.imread(df_val.iloc[i]['image'].replace('data', 'localdata'))
plt.imshow(img)

In [None]:
exp.show_in_notebook(show_table=True, show_all=False)

#### TASK 2.3
Identify the important Facial Actions Units and compare with the images at [Facial Action Units](https://imotions.com/blog/facial-action-coding-system/).  What insights do these local explanations provide? How does this compare with the image explanations from Task 1?

Note: In the feature names, you will see features with a `_c` and a `_r` at the end.  The `_r` means the intentsity of the action unit (i.e., how strong is it's presence), and the `_c` is a binary feature indicating the presence (value=1), or non-presence (value=0), of an action unit.


Write your answer here...

25 = lip opener
12 = lip corner puller
04 = brow lowerer
14 = dimpler
02 = outer brow raiser

#### Task 2.4

Now implement [Submodular Pick](https://lime-ml.readthedocs.io/en/latest/lime.html#lime-submodular-pick-module) instance to get try to gain a global perspective of how the model makes decisions. You can review the [LIME tutorial](https://github.com/marcotcr/lime/blob/master/doc/notebooks/Submodular%20Pick%20examples.ipynb)

Try setting `num_exps_desired` to 16 to try to get 2 examples per class.

In [None]:
from lime import submodular_pick
from skimage import io

In [None]:
##### YOUR CODE GOES HERE #####
###############################


#### Task 2.5
Now plot the explantions

In [None]:
##### YOUR CODE GOES HERE #####
###############################


#### TASK 2.6

Again identify important AUs for the classes and compare with the images at [Facial Action Units](https://imotions.com/blog/facial-action-coding-system/).  What insights do these explanations provide? Do you know have a better understanding of how the model is working? If not, what is lacking using the LIME approach and/or what could be done differently?

### Task 3: Final Discussion
Between tasks 1 and 2, which of the 2 models and methods best support explainablity as we've discussed throughout the seminar. Why?

write your answer here...