# Application: Control games with face position

In this notebook, you will learn how to connect the dots between using OpenCV to identify faces within video feeds, and connecting that to PyAutoGUI to drive keyboard inputs in different video game applications. Be sure to test and tinker with the accompanying python script along side this notebook to run the end to end application, as you may not be able to effectively run it from within the notebook itself.

The high level steps look like the following:

1. Define a webcam capture loop, drawing the "center bounds".
2. Load and use a pre-made face detection model.
3. For any faces found, determine if has moved outside the center-position bounds.
4. If out of bounds, send the according key press to the operating system.
5. Finally, we can run the script and switch to a window with a game to control.

In [1]:
import cv2
import numpy as np
import pyautogui as gui
import time

# Set keypress delay to 0
gui.PAUSE = 0

# Loading the pre-trained face model.
model_path = "../res10_300x300_ssd_iter_140000.caffemodel"
prototxt_path = "../deploy.prototxt"

## 1. Webcam loop and drawing the "center bounds"

To get started, we will create the basic structure for our main gameplay loop. Notice there are placeholder comments for now for the functions that our loop will need to call, which we will add later.

In [2]:
def play(prototxt_path, model_path):
    """ 
    Run the main loop until cancelled.
    """
    
    cap = cv2.VideoCapture(1)
    
    # Getting the Frame width and height.
    frame_width, frame_height = int(cap.get(3)), int(cap.get(4))
    
    # Co-ordinates of the bounding box on frame
    left_x, top_y = frame_width // 2 - 150, frame_height // 2 - 200
    right_x, bottom_y = frame_width // 2 + 150, frame_height // 2 + 200
    bbox = [left_x, right_x, bottom_y, top_y]
    
    while not cap.isOpened():
        cap = cv2.VideoCapture(1)
        
    while True:
        ret, frame = cap.read()
        if not ret:
            return 0
        
        frame = cv2.flip(frame, 1)
        # To be added: Detecting and drawing bounding box around faces
        
        # Drawing the control rectangle in the center of the frame.
        frame = cv2.rectangle(frame, (left_x, top_y), (right_x, bottom_y), (0, 0, 255), 5)
        # To be added: Checking for game-start position, and checking to run keyboard press.
        # Exit the loop on pressing the "esc" key.
        k = cv2.waitKey(5)
        if k == 27:
            return

### Using a pre-made face detection model

In order to use pre-trained face model, we will need to do the following steps:

1. Load in the deep neural network (DNN).
2. Transform an input frame into the require format.
3. Set this frame as the input to the face detection model.
4. Read out any detected results.

---

### Reading the DNN model

**readNetFromCaffe()** Reads a network model stored in *Caffe* framework's model.

#### Function Syntax

cv2.dnn.readNetFromCaffe(prototxt[, caffeModel])

The function gas **2 required arguments** and 1 optional:

1. **prototxt** path to the prototxt file with text description of the network architecture.
2. **caffeModel** path to the caffemodel file with learned network.

---

### Converting an image into the model format

**blobFromImage()** Creates 4-dimensional blob from image. Optionally resizes and crops image from center, subtract mean values, scale values by scalefactor and swap Blue and Red Channels.

---

### Read the neural networl model in main function

We will add the following line one at the top of our main **play** function, but outside the loop, so that we only load the model once.

In [3]:
net = cv2.dnn.readNetFromCaffe(prototxt_path, model_path)

### Function to detect the faces in the frame

We then need to define the function which runs the detection of faces. We transform the image into blob format with **cv2.dnn.blobFromImage**, assign it as an input into the model using **net.setInput**, and then run the detections using **net.forward()**.

In [4]:
def detect(net, frame):
    """
    Detect the faces in the frame.
    
    returns: list of faces in the frame.
        here each face is a dictionary of format-
        {'start': (startX, startY), 'end': (endX, endY), 'confidence': confidence}
    """
    detected_faces = []
    (h, w) = frame.shape[:2]
    blob = cv2.dnn.blobFromImage(
        cv2.resize(frame, (300, 300)),
        1.0,
        (300, 300),
        (104.0, 177.0, 123.0)
    )
    net.setInput(blob)
    detections = net.forward()
    for i in range(0, detections.shape[2]):
        confidence = detections[0, 0, i, 2]
        if confidence > 0.5:
            box = detections[0, 0, i, 3:7] * np.array([w, h, 2, h])
            (startX, startY, endX, endY) = box.astype("int")
            detected_faces.append(
                {
                    'start': (startX, startY),
                    'end': (endX, endY),
                    'confidence': confidence
                }
            )
    return detected_faces

### Function to draw rectangular bounding box around detected faced.

Finally, we want to visually draw a rectangle for each detected face on the screen. This is regardless of whether or not a keyboard signal is to be sent.

In [5]:
def draw_face(frame, detected_faces):
    """
    Draw rectangular box over detected faces.
    
    returns: frame with rectangular boxes over detected faces.
    """
    for face in detected_faces:
        cv2.rectangle(frame, face["start"], face["end"], (0, 255, 0), 10)
    return frame

### Detect movement outside the center box

This is a function to check that a detected face is inside the bounding box at the center of the frame. If this value is True on one frame and False on the next, then it will tell a feature function that a keyboard press should occur.

In [6]:
def check_rect(detected_faces, bbox):
    """
    Check for a detected face inside the bounding box at the center of the frame.
    
    returns: True or False.
    """
    for face in detected_faces:
        x1, y1 = face["start"]
        x2, y2 = face["end"]
        
        if x1 > bbox[0] and x2 < bbox[1]:
            if y1 > bbox[3] and y2 < bbox[2]:
                return True
    return False

### Send keyboard press on detected movement

Based on the output of the *check_rect* function, we can now decide whether to send a keyboard arrow press event to the operating system via PyAutoGUI (imported as *gui*). The *last_mov* check is added to make sure the character doesn't keep drifting in the previous detection.

In [7]:
def move(detected_faces, bbox):
    """
    Press correct button depending on the position of detected face and bbox.
    
    The last_mov check is added for making sure the character doesn't keep
    drifting in the previous detection.
    """
    global las_mov
    for face in detected_faces:
        x1, y1 = face["start"]
        x2, y2 = face["end"]
        
        # Center
        if check_rect(detedcted_faces, bbox):
            last_mov = "center"
            return
        elif las_mov == "center":
            # Left
            if x1 < bbox[0]:
                gui.press("left")
                last_mov = "left"
            # Right
            elif x2 > bbox[1]:
                gui.press("right")
                last_mov = "right"
            # Down
            if y2 > bbox[2]:
                gui.press("down")
                last_mov = "down"
            # Up
            elif y1 < bbox[3]:
                gui.press("up")
                last_mov = "up"
                
            # Print out the button pressed if any.
            if last_mov != "center":
                print(last_mov)

### Updating the play function

We can now update our *play* loop to call the functions defined in the prior steps. Below, we have added the calls to detect faces, draw them on the video feed, and then send the according command to PyAutoGUI for keyboard actions. Notice the loop below is also enhanced further with an FPS display, calculated manually using the time elapsed between displayed frames.

In [8]:
def play(prototxt_path, model_path):
    """
    Run the main loop until cancelled.
    """
    global last_mov
    # Used to record the time when we processed last frame.
    prev_frame_time = 0
    # Used to record the tme at which we processed current frame.
    new_frame_time = 0
    
    net = cv2.dnn.readNetFromCaffe(prototxt_path, model_path)
    cap = cv2.VideoCapture(1)
    
    # Counter for skipping frame.
    count = 0
    
    # Used to initialize the game.
    init = 0
    
    # Getting the Frame width and height.
    frame_width, frame_height = int(cap.get(2)), int(cap.get(4))
    
    # Co-ordinates of the bounding box on frame
    left_x, top_y = frame_width // 2 - 150, frame_height // 2 - 200
    right_x, bottom_y = frame_width // 2 + 150, frame_height // 2 + 200
    bbox = [left_x, right_x, bottom_y, top_y]
    
    while not cap.isOpened():
        cap = cv2.VideoCapture(1)
        
    while True:
        fps = 0
        ret, frame = cap.read()
        
        if not ret:
            return 0
        
        frame = cv2.flip(frame, 1)
        # Detect the face.
        detected_faces = detect(net, frame)
        # Draw bounding box around detected faces.
        frame = draw_face(frame, detected_faces)
        # Drawing the control rectangle in the center of the frame.
        frame = cv2.rectangle(frame, (left_x, top_y), (right_x, bottom_y), (0, 0, 255), 5)
        
        # Skipping every alternate frame.
        if count % 2 == 0:
            # For first pass.
            if init == 0:
                # If face is inside the control rectangle.
                if check_rect(detected_faces, bbox):
                    init = 1
                    cv2.putText(frame, "Game is running", (100, 100), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
                    cv2.waitKey(10)
                    last_mov = "center"
                    # Click to start the game.
                    gui.click(x=500, y=500)
            else:
                move(detected_faces, bbox)
                cv2.waitKey(50)
        # Calculating the FPS.
        new_frame_time = time.time()
        fps = int(1 / (new_frame_time - prev_frame_time))
        prev_frame_time = new_frame_time
        
        frame = cv2.putText(frame, str(fps) + "FPS", (200,100), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 2)
        cv2.imshow("camera_feed", frame)
        count += 1
        
        # Exit the loop on pressing the 'esc' key.
        k = cv2.waitKey(5)
        if k == 27:
            return