<b> Face/Emotion Detection using Integrated Webcam</b>

This program will use OpenCV, a trained face detection model in Caffe, and a custom Keras model for emotion recognition. Both models are pretrained. The face detection does a great job while the custom Keras model has a mediocre performance.

<b> Step 1:</b><br/>
Import required libraries. This includes OpenCV, numpy and os. Keras layers are imported later.

In [1]:
import os
import cv2
import numpy as np

<b>Step 2:</b><br/>
Read the caffe model and wieghts, both located in a subfolder named "model"

In [2]:
# Define paths
base_dir = os.getcwd()
prototxt_path = os.path.join(base_dir + '/model/deploy.prototxt')
caffemodel_path = os.path.join(base_dir + '/model/weights.caffemodel')

# Read the model
model = cv2.dnn.readNetFromCaffe(prototxt_path, caffemodel_path)

<b>Step 3:</b><br/>
Keras libraries and model are defined here. It includes 4 2D convolution layers; then, the model weights are loaded. <br/> <b> Note that </b>Since face detection model exist, classification model is called "model1".

In [3]:
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Flatten
from keras.layers.convolutional import Conv2D
from keras.optimizers import Adam
from keras.layers.pooling import MaxPooling2D

# Create the model
model1 = Sequential()

model1.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(48,48,1)))
model1.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model1.add(MaxPooling2D(pool_size=(2, 2)))
model1.add(Dropout(0.25))

model1.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model1.add(MaxPooling2D(pool_size=(2, 2)))
model1.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model1.add(MaxPooling2D(pool_size=(2, 2)))
model1.add(Dropout(0.25))

model1.add(Flatten())
model1.add(Dense(1024, activation='relu'))
model1.add(Dropout(0.5))
model1.add(Dense(7, activation='softmax'))

#Load the model weights
model1.load_weights(base_dir + '/model/model.h5')

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])






  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])




Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
Instructions for updating:
keep_dims is deprecated, use keepdims instead




<b>Step 4:</b><br/>
The keras model is defined to categorize facial emotions into seven categories. These categories are defined in a dictionary which assigns them a label (alphabetical order)

In [4]:
emotion_dict = {0: "Angry", 1: "Disgusted", 2: "Fearful", 3: "Happy", 4: "Neutral", 5: "Sad", 6: "Surprised"}

<b>Step 5:</b><br/>
In the final step, OpenCV is used to read a video from the webcam, using VideoCaptured. <br/>
<span style="color:#FF0000">The first step</span> in processing the video stream is mean subtraction and scaling. This has been done using <b><i>blobFromImage</b></i> function in OpenCV. The argument it takes are shown in the code. Scale factor of 0 is used and swapRB is false. Note that OpenCV reads BGR colorful images and swapRB converts them to RGB.<br/>
<span style="color:#0000ff">The second step</span> is to use the processed image as an input image for trained face detection model. Then loop on the function (since performed on a live feed) and when probability of a face being in an image is more than 0.5 show the bounding box.

<span style="color:#006a4e">In the third and last step</span> use the detected face as an input for emotion classifier, convert it to 48 by 48 (size of the input for model1), predict the emotion and put it as a text on the detected frame.

In [5]:
# To capture video from webcam. 
cap = cv2.VideoCapture(0)

# To use a video file as input 
# cap = cv2.VideoCapture('filename.mp4')

cv2.startWindowThread()


while True:
    try:
        # Read the frame
        _, img = cap.read()

        #Preprocess the image, subtract mean and convert BGR to RGB
        (h, w) = img.shape[:2]
        blob = cv2.dnn.blobFromImage(image = cv2.resize(img, (300, 300)), \
                                     scalefactor = 1.0, size = (300, 300), \
                                     mean = (104.0, 177.0, 123.0), swapRB=True)
        #Use the blob as an input to the face detection model
        model.setInput(blob)
        detections = model.forward()




        # Create frame around face

        for i in range(0, detections.shape[2]):
            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
            (startX, startY, endX, endY) = box.astype("int")

            confidence = detections[0, 0, i, 2]

          # If confidence > 0.5, show box around face
            if (confidence > 0.5):
                cv2.rectangle(img, (startX, startY), (endX, endY), (255, 255, 255), 2)

                frame = img[startY:endY, startX:endX]


                #Colorful image is converted to grayscale just for the classification step (emotion detection)
                frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
                #The input size of the classifier is 48 by 48 as it converted to 48 by 48
                cropped_img = np.expand_dims(np.expand_dims(cv2.resize(frame_gray, (48, 48)), -1), 0)
                prediction = model1.predict(cropped_img)

                #Select the category with the maximum probability that came from the softmax layer(prediction)
                maxindex = int(np.argmax(prediction))

                #Putting text on the frame to show result of the classification
                cv2.putText(img, emotion_dict[maxindex], (startX, startY), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2, cv2.LINE_AA)

                #Resize the frame to show
                resized = cv2.resize(img, (1280, 960), interpolation = cv2.INTER_LINEAR)

                #Show the live feed on screen
                cv2.imshow('Live', resized)

        if cv2.waitKey(1) & 0xff == ord('q'):
            break
    except:
        pass
        
        
cv2.destroyWindow('Live')

# When everything done, release the capture
cap.release()