<a href="https://colab.research.google.com/github/goddice/funny-stuffs/blob/master/6835_MP4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 6.835 MiniProject 4: Expression Recognition

## Due Monday March 16 at 5pm

![alt text](https://i.pinimg.com/originals/56/be/63/56be63d62aa5b174bf9daa13a700a8d2.jpg)
Figure 1: Actress Scarlet Johansson making various face expressions.



## **1	Introduction**

The goal of this project is to explore the task of expression analysis and emotion classification over two different data sets: 2D images and 3D point clouds. You will implement and compare several neural network architectures, building on what you learned in Mini Project 2. In addition, you’ll get experience with Transfer Learning for creating personalized models. Please start early and ask questions!
The data sets you will use in this project are:

1) [Kaggle Facial Expression Recognition Challenge](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expressionrecognition-challenge/data)

2)	Personal face expression images generated by you

3)	Staff generated point clouds from the iPhone X True-Depth Camera Included with this project is iMotions’s “Facial Expression Analysis: The Complete Pocket Guide.” Use this as a reference throughout the project and please go through it before writing any code. It describes the anatomical characteristics of expression and shows examples of each type of expression.

The Facial Action Coding System (FACS) is a tool for measuring expressions first published in 1978 by Ekman and Friesen. It is an anatomical system for describing all observable face movement. It breaks down expressions into individual components of muscle movement known as Activation Units. In this section, we will ask you to describe Facial Expressions based on Activation Units from FACS. A complete guide to FACS and AUs can be found on [iMotions blog](https://imotions.com/blog/facial-action-coding-system/) .

## **2	Getting Started**

### 2.1 Guide to using Colab
Colab is a free Jupyter notebook environment by Google that runs entirely in the cloud. It does not require a setup and allows you to run code and see the output immediately (see [here](https://towardsdatascience.com/a-beginners-tutorial-to-jupyter-notebooks-1b2f8705888a) for more info on Jupyter notebooks and [here](https://www.tutorialspoint.com/google_colab/your_first_colab_notebook.htm) for a walkthrough of how Colab works). 

Colab notebooks are automatically saved on your Google Drive account but can also be saved on a github account. For the purposes of this lab, you only need to know the very basics of using Jupyter notebooks (and by extension Colab).

**Coding Interface**
*   You will write your code directly into the code blocks in the notebook
*   To run your code, you can either press ctrl-enter when inside the cell, or click the run button in the upper left corner of the cell (need to hover over the brackets for it to appear).
*   In order for this lab to work correctly, you should run every cell in order (i.e., as you come upon a code cell, even if it's just staff code, please run it). 
*   If a cell has been run, a number will appear in brackets in the upper left corner where the run button appears. This number helps you track the order of the calls.
*   You are welcome to add any new coding blocks you want (by clicking + Code in the top menu) but you cannot import any more libraries than the ones in this project

**Uploading files**
*   To upload files from your computer into the notebook (will be required later in the project), click the folder icon on the sidebar on the left. The upload button will let you select the file
*   File reading works the same as in a normal IDE. You have to specify the path to your file if it is inside a folder vs. in the main file area.
*   Everytime Colab is closed (or refreshed), uploaded files are removed and must be re-uploaded.

**Saving files**

You will be saving some trained models in this mini project to include in your submission. We suggest that each time you save a model, you also download it to your local machine. Saved files are removed if Colab is closed or refreshed, so you may want to download them just in case.


### 2.2 Colab Environment Setup

#### Importing Data
We will be pulling in two datasets for this mini project. The Kaggle data set is located in kaggle fer2013/fer2013.csv. There are 28K training and 3K testing images in the dataset, each composed of a 48x48 square of pixels and labeled with an emotion [0-6] (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral). The iPhone X dataset is located in iPhoneX/faces.py. There are samples of 50 subjects posing with 7 different expressions. Each sample consists of 1220 (x, y, z) points to make up a depth map. We’ll explore this data more in Section 5.

The data will be pulled into a folder names mp4_data

In [1]:
! git clone https://gitlab.com/JMadiedo/mp4_data.git

Cloning into 'mp4_data'...
remote: Enumerating objects: 22, done.[K
remote: Counting objects: 100% (22/22), done.[K
remote: Compressing objects: 100% (20/20), done.[K
remote: Total 22 (delta 3), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (22/22), done.


#### Setting up imports

In [2]:
%tensorflow_version 1.x
import keras
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Model, load_model
from keras.layers import Input, Dense, Dropout, Flatten, Conv2D, MaxPooling2D, AveragePooling2D

import numpy as np

import matplotlib
import matplotlib.pyplot as plt


TensorFlow 1.x selected.


Using TensorFlow backend.


#### Predefined Staff Varaibles/Functions

In [0]:
emotions = ['angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral']

def plot_emotion_prediction(pred):
    """
    Plots the prediction for each emotion on a bar chart
    :param pred: the predictions for each emotions
    """
    labels = np.arange(len(emotions))
    plt.bar(labels, pred, align='center', alpha=0.5)
    plt.xticks(labels, emotions)
    plt.ylabel('prediction')
    plt.title('emotion')
    plt.show()

### 2.3 Mini Project Structure

In this miniproject, you will

1) 	Code a Convolutional Neural Network using Keras to detect Face Expressions from a large dataset of images

2)	Use transfer learning to adapt the CNN you’ve created into a personalized network for your own expressions (you’ll have to supply the data)

3)	Explore a new type of available face data - point clouds from the iPhone X True Depth Camera and classify samples by modifying your original CNN.

Please do not hesitate to post for help on Piazza using the miniproject4 tag.

## **3	Expression Recognition with Convolutional Neural Networks**
As you read about and saw during the lecture on face recognition, the hidden layers of a Convolutional Neural Network (CNN) typically consist of some combination of convolutional layers, pooling layers, fully connected layers and normalization layers. We’ll give a brief overview of the CNNs and what these layers do. If you don’t have any experience with CNNs, watch [this video](https://www.youtube.com/watch?v=YRhxdVk_sIs) on YouTube to get a solid understanding of how they function.

Together, we’ll code up a simple CNN to process the Kaggle dataset. You are encouraged to improve this model by adding additional layers. The Input layer will take in image data (represented as a matrix of numbers) and pass them into a convolution layer. Here the image data is “convolved” - a filter (i.e., a function) is applied methodically to overlapping tiles of the input. The values the filter produces (technically, the dot product of the filter with each sub matrix, is itself another matrix of data. The final layer, the fully connected layer, takes the convolution and produces a vector of predictions.

Now it’s time to write a CNN using the Keras API and Tensorflow backend. We’ve already started an implementation for you below. You should complete the implementation by following these steps:

###Part A
We need to understand our input before we can begin our model. The Kaggle dataset contains 35888 images: 28709 for training and 3589 for testing. Let’s organize this data so we can use it in out model. 

The supplied code imports the data from the .csv file for you. Each line of data contains an emotion label, image data, and test/train usage. The emotion label is a number between 0 and 6, corresponding to the labels specified above. 

1) We parse the .csv file for you into x train, the training image pixel data as 1D arrays of pixels, y train, the labels corresponding to the training images, x test, the testing image pixel data as 1D arrays of pixels, y test, the labels corresponding to the test images. You will need to do this on your own later in the mini project

2)	The pixel values in the image data are strings. Convert them to float32 and normalize the inputs to be between 0 and 1.


3) Reshape the image data so we can enter samples into our modes with the shape (48, 48, 1).

In [0]:
num_classes = 7 #angry, disgust, fear, happy, sad, surprise, neutral
batch_size = 256
epochs = 5

def split_data():
  # READ IN KAGGLE DATA
  with open("mp4_data/kaggle_fer2013/fer2013.csv") as file:
      data = file.readlines()

  lines = np.array(data)

  x_train, y_train, x_test, y_test = [], [], [], []

  # A. 1) SPLIT DATA INTO TEST AND TRAIN
  for i in range(1,lines.size):
      emotion, img, usage = lines[i].split(",")
      val = img.split(" ")
      pixels = np.array(val, 'float32')
      emotion = keras.utils.to_categorical(emotion, num_classes)

      if 'Training' in usage:
          y_train.append(emotion)
          x_train.append(pixels)
      elif 'PublicTest' in usage:
          y_test.append(emotion)
          x_test.append(pixels)

  # A. 2) CAST AND NORMALIZE DATA
  x_train = np.array(x_train).astype(np.float32)
  x_test = np.array(x_test).astype(np.float32)
  max_val = np.amax(x_train);
  x_train = x_train/max_val
  x_test = x_test/max_val

  # A. 3) RESHAPE DATA
  x_train = x_train.reshape(x_train.shape[0],48,48,1);
  x_test = x_test.reshape(x_test.shape[0],48,48,1);

  return x_train, y_train, x_test, y_test
  
# split_data()

### Part B
Now it’s time to code the CNN. We’ll make use of the Keras functional API for building models. For more information on Keras, see the Keras Tutorial online and explore the documentation [here](https://keras.io/getting-started/functional-api-guide/) . We’ll create a simple model together that gives you a working solution. However, we will later expect you to add additional layers to this model to improve its performance.

1)	Create an Input layer that takes in data of shape (48, 48, 1, ). This is the size of our photos

2)	Add a Convolutional layer to the network using the 2D convolution layer for spatial convolution over images. Make sure the layer has the following properties: filters=64., kernel_size=(5,5), activation=‘relu’

3)	Add a MaxPooling2D layer with pool_size=(5,5) and strides=(2, 2)

4)	Add a Flatten layer which converts the data into a 1D feature vector ready for classification

5)	Add a Dense layer with 1024 units and activation=‘relu’

6)	Add a final Dense layer with 7 units (for classification) and the ‘softmax’ activation function

In [0]:
# B. CREATE CNN MODEL

def create_model():
  inputs = Input(shape=(48, 48, 1, ))
  conv_layer = Conv2D(filters=64, kernel_size=(5,5), activation="relu")(inputs)
  pool_layer = MaxPooling2D(pool_size=(5,5), strides=(2,2))(conv_layer)
  flatten_layer = Flatten()(pool_layer)
  dense_layer = Dense(1024, activation="relu")(flatten_layer)
  outputs = Dense(7, activation="softmax")(dense_layer)
  # INSERT LAYERS HERE
  return Model(inputs, outputs)

### Part C
Now it’s time to train our model and see how well it performs.

1) Batch the training and testing data using the Keras ImageDataGenerator() with .flow(..., batch  size = 256). The ImageDataGenerator does image augmentation and artificially creates training images through different ways of processing or combination of multiple processing, such as random rotation, shifts, shear and flips, etc. Here we are using it to randomly selected training set instances for our model.

2) Compile your model with loss=‘categorical_crossentropy’ and the Adam optimizer (i.e. keras.optimizers.Adam()).

3)  Train your model by calling model.fit generator(...) and the provided steps per epoch = len(x train)/batch size and epochs = 5 variables. Make sure to save your model after training it: export the model to a .h5 file using the built in model.save(‘model 1.h5’) (please use this naming convention). The model should take about 10 minutes to run and should achieve about 55% accuracy.

**Note: you may want to download the model to your local machine just incase Colab crashes or unexpectedly closes. You can download by right-clicking on the model in the file sidebar and selecting download**

In [0]:
num_classes = 7 #angry, disgust, fear, happy, sad, surprise, neutral
batch_size = 256
epochs = 5

def cnn():
  x_train, y_train, x_test, y_test = split_data()
  model = create_model()

  # C. 1) DATA BATCH PROCESS
  data_gen = ImageDataGenerator().flow(x_train, y_train, batch_size=batch_size)

  # C. 2) COMPILE MODEL
  model.compile(loss='categorical_crossentropy', 
                optimizer='adam', 
                metrics=['accuracy'])

  # C. 3) TRAIN AND SAVE MODEL
  model.fit_generator(data_gen, steps_per_epoch=len(x_train)/batch_size, epochs=5)

  model.save('model_1.h5')

cnn()


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where



Epoch 1/5






4) You can test your model with the script below to see how a certain example (jackman.png) gets classified.

In [0]:
def test_cnn():
  model = load_model('my_model_1.h5')

  img = image.load_img("mp4_data/jackman.png", color_mode = "grayscale", target_size=(48, 48))

  x = image.img_to_array(img)
  x = np.expand_dims(x, axis = 0)

  x /= 255

  custom = model.predict(x)
  plot_emotion_prediction(custom[0])

test_cnn()

### Part D

Now, modify the model to improve your accuracy. You may change the parameters (such as batch size) and layers of the model. We've provided new code blocks below for you to experiment in (you can copy over your code from part C).

 In your writeup, include:

*   A diagram of your final network architecture
*   A description of the structure and the parameters you used.
*   The accuracy and loss for your model

In the original Kaggle challenge, the winner achieved just 34% accuracy - so congrats, your model is already much better! Be sure to save your trained model as model_2.h5, i.e., following our naming convention. 


In [0]:
num_classes = 7 #angry, disgust, fear, happy, sad, surprise, neutral
batch_size = 256
epochs = 5

def create_model():
  inputs = Input(shape=(48, 48, 1, ))
  # INSERT LAYERS HERE
  return Model(inputs, outputs)

def cnn():
  x_train, y_train, x_test, y_test = split_data()
  model = create_model()

  # C. 1) DATA BATCH PROCESS

  # C. 2) COMPILE MODEL

  # C. 3) TRAIN AND SAVE 
  
  # model.save('model_2.h5')
  
cnn()

In [0]:
def test_cnn():
  model = load_model('my_model_2.h5')

  img = image.load_img("mp4_data/jackman.png", color_mode = "grayscale", target_size=(48, 48))

  x = image.img_to_array(img)
  x = np.expand_dims(x, axis = 0)

  x /= 255

  custom = model.predict(x)
  plot_emotion_prediction(custom[0])
  
test_cnn()

##4	Transfer Learning for a Personalized Machine Learning Model

In practice, very few people train an entire Convolutional Network from scratch (with random initialization) as we just did. This is because it is relatively rare to have a dataset of sufficient size. Instead, it is common to pre-train a CNN on a (different) very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the CNN either as an initialization or a fixed feature extractor for the task of interest.

The process of sharing results across different problems is known as Transfer Learning. In other words, Transfer Learning is a machine learning technique where a model trained on one task is re-purposed for a second related task.

In the section, we’ll first ask you to generate your own expression data (yes, you’ll have to take pictures of yourself) and use that data to fine-tune your CNN from the previous section into a personalized model for your face, using Transfer Learning.

###Part E
Pose in front of a camera or get a friend to help you take 7 pictures: one for each type of expression: angry, disgust, fear, happy, sad, surprise, neutral. This is your training data. Crop these images so that they contain a bounding box of only your face. It is OK if the pictures are not 48 x 48, Keras will resize them.

###Part F
Take 7 more pictures. This is your testing data. 

Add your training and testing images to the notebook:

1) Open the files section of the sidebar on the left side of your screen

2) Right-click in the area to open the context menu. Select create new folder.

3) Upload your images (you can upload multiple at a time)

4) Structure the folder and name your images as you like

###Part G
Now it’s time to load your model from the previous section. We’ve started an implementation for you below.

1) Import the data from your images and reshape the data so that you can retrain your model from Section 3 (model 2.h5). You will need to grayscale your images. You may find functions in keras.preprocessing useful for image manipulation.

2) Load your model using load_model(‘model_2.h5’) and train the model on your 7 training images.

3) Finally, test the newly trained model on your test images. Save your trained model as model_3.h5. Remember we provided the helper function plot_emotion_prediction(pred) at the beginning of this notebook that takes in a model prediction values from a single call of model.predict(x) and plots them on a bar graph. 

In [0]:
num_classes = 7 #angry, disgust, fear, happy, sad, surprise, neutral
batch_size = 7
epochs = 3

def transfer():
  # G. 1) Import data and reshape data
  
  # G. 2) Load model and train on training images

  # G. 3) Test newly trained model and save

transfer()

4)  Which expressions are not being recognized? Why do you think some expressions are recognized better than others? Report the accuracy of the model.

5) If your model did not achieve good accuracy on your personal data, explain why you think that is.

## 5	From Images to Depth: Next Generation of Face Representation
In this section, we expect you to implement a fully functional model on your own. Your work in this part will be graded on correctness, not on how accurate your final model is. 

Recall structured light: the technology used by the Xbox Kinect that shines thousands of infrared dots in an area and can create a corresponding depth map. The iPhone X also uses this technology to scan a user’s face. We’ve accumulated samples of 100 iPhone X users posing with different face expressions using [Apple’s ARKit framework](https://developer.apple.com/documentation/arkit/arfaceanchor?language=objc) . Apple anchors the face into a specific origin, and provides us with vertex positions for each point in the face mesh. These vertices are referred to as a point cloud. The point cloud we receive is sparse, so we see a smoothed version of an actual face. Together, this normalizes all of our data and makes it ready for analysis.

![alt text](https://www.bing.com/th?id=OIP.nxQY5PjyWSCBU1oYGGqy3AHaDS&pid=Api&rs=1)

Figure 2: Origin of the face coordinate system.


In this last part, we’ll classify the 3D data into the same 7 emotions: angry, disgust, fear, happy, sad, surprise, neutral. However, this time we’re expecting you to craft the model. The model you will create should be a 3D Convolutional Neural Network. They are the same in essence to 2D CNNs, but perform operations in 3 dimensions. In order to use a 3D CNN, we’ll have to transform our input data into voxels. Voxels are the three-dimensional analogue of a pixel:
unit volumes of space that contain a value.


### Part H
To visualize the various expressions, you will need to use the Visualization GUI we provided in the zip file for you to run on your local machine. To use the GUI, run

```
python show_gui.py
```


You’ll be able to see the 7 different expressions. You can drag the graph to view the data from different orientations. 



1) Inspect the 3D face data and give us your impressions. Compare the expression data using FACS. Which expressions are the most unique? Which expressions are most similar? What information does the point cloud provide us that the image does not?



Now it's time to try your hand at making your own model here in Colab! The data is provided as a Python dictionary 

```
face_samples = { sample id : { emotion: { x: [...], y: [...], z: [...]}}}
```

2)  Read in the iPhone X data. For each point cloud, create a 24 x 24 x 24 voxel grid represented as a 3D numpy array initialized with all 0s. For each point in the cloud, increment the value of the voxel that the point falls in.

3) Construct a 3D Convolution Neural Network using Keras. You can use your previous work as a starting point, but will have to make use of Conv3D and MaxPool3D from Keras. You are free to add as many layers as you’d like.

4) Train and test your model. Save your trained model as model_4.h5.


In [0]:
num_classes = 7 #angry, disgust, fear, happy, sad, surprise, neutral
batch_size = 12
epochs = 20

emotions = ['angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral']

def cnnX():
  # H. 2) READ IN iPHONE X DATA AND SHAPE

  # H. 3) CREATE MODEL OF CHOICE

  # H. 4) TRAIN AND TEST MODEL. SAVE AS model_4.h5

cnnX()

5) In your writeup, include:

*   A diagram of your final network architecture
*   A description of the structure 
*   The test/train ratio
*   The parameters you used (batch size, number of epochs)
*   The accuracy and loss for your model


## Extra Credit
###Extra Credit 1 (5 points): 
Continue to modify your model until you achieve an accuracy >= 42%. Save your model as model_ec1.h5.



In [0]:
# EC. 1) Accuracy >= 42%


###Extra Credit 2 (10 points maximum): 
Use a different classification technique to classify the data (it does not have to be deep learning). Describe what you built and report how well your classification works. Be sure to include where you found inspiration for the implementation and what additional libraries you used

In [0]:
# EC. 2) Different Classification Technique

## Exporting your Colab Notebook for Submission

Once you're done implmenting all the parts of the project, you will need to download your Colab Notebook and models to include as part of your submission.

1) For each model you saved, download it by right-clicking on it in the files and selecting download.

2) To download the notebook, click *download .ipynb* in the file menu.
