# Application: Control games with face position
In this notebook, you will learn how to connect the dots between using OpenCV to identify faces within video feeds, and connecting that to PyAutoGUI to drive keyboard inputs in different video game applications. Be sure to test and tinker with the accompanying python script along side this notebook to run the end to end application, as you may not be able to effectively run it from within the notebook itself.

The high level steps look like the following:
1. Define a webcam capture loop, drawing the "center bounds".
2. Load and use a pre-made face detection model.
3. For any faces found, determine if has moved outside the center-position bounds.
4. If out of bounds, send the according key press to the operating system.
5. Finally, we can run the script and switch to a window with a game to control.

You can see the example output below of what we are going to achieve:

In [1]:
from IPython.display import Video
Video('Output/Output_Pac_man.mov',  width = 800)

In [2]:
import cv2
import numpy as np
import pyautogui as gui
import time

# Set keypress delay to 0.
gui.PAUSE = 0

# Loading the pre-trained face model.
model_path = './model/res10_300x300_ssd_iter_140000.caffemodel'
prototxt_path = './model/deploy.prototxt'

## 1. Webcam loop and drawing the "center bounds"

To get started, we will create the basic structure for our main gameplay loop. Notice there are placeholder comments for now for the functions that our loop will need to call, which we will add later.

In [3]:
def play(prototxt_path, model_path):
    '''
    Run the main loop until cancelled.
    '''
    cap = cv2.VideoCapture(0)

    # Getting the Frame width and height.
    frame_width, frame_height = int(cap.get(3)), int(cap.get(4))

    # Co-ordinates of the bounding box on frame
    left_x, top_y = frame_width // 2 - 150, frame_height // 2 - 200
    right_x, bottom_y = frame_width // 2 + 150, frame_height // 2 + 200
    bbox = [left_x, right_x, bottom_y, top_y]

    while not cap.isOpened():
        cap = cv2.VideoCapture(0)

    while True:
        ret, frame = cap.read()
        if not ret:
            return 0

        frame = cv2.flip(frame, 1)
        # To be added: Detecting and drawing bounding box around faces

        # Drawing the control rectangle in the center of the frame.
        frame = cv2.rectangle(
            frame, (left_x, top_y), (right_x, bottom_y), (0, 0, 255), 5)
        # To be added: Checking for game-start position, and checking to run keyboard press.
        # Exit the loop on pressing the `esc` key.
        k = cv2.waitKey(5)
        if k == 27:
            return

## 2. Using a pre-made face detection model

In order to use our pre-trained face model, we will need to do the following steps:

1. Load in the deep neural network (DNN).
2. Transform an input frame into the require format 
3. Set this frame as the input to the face detection model.
4. Read out any detected results.

More details about how face detection works is included in a later module, in the meantime you can view the key functions we are using for now.

<hr   style="border:none; height: 4px; background-color: #D3D3D3 " />

## Reading the DNN model

**`readNetFromCaffe()`** Reads a network model stored in [Caffe](http://caffe.berkeleyvision.org/) framework's format.

### <font style="color:rgb(8,133,37)">Function Syntax </font>
``` python
cv2.dnn.readNetFromCaffe( prototxt[, caffeModel] )
```
The function has **2 required arguments** and 1 optional:

1. `prototxt` path to the .prototxt file with text description of the network architecture.
2. `caffeModel`	path to the .caffemodel file with learned network.

### <font style="color:rgb(8,133,37)">OpenCV Documentation</font>

[**`cv2.dnn.readNetFromCaffe()`**](https://docs.opencv.org/4.5.2/d6/d0f/group__dnn.html#ga29d0ea5e52b1d1a6c2681e3f7d68473a)

## Converting an image into the model formal

**`blobFromImage()`** Creates 4-dimensional blob from image. Optionally resizes and crops image from center, subtract mean values, scales values by scalefactor and swap Blue and Red channels.

### <font style="color:rgb(8,133,37)">Function Syntax </font>
``` python
dst = cv2.dnn.blobFromImage( image[, scalefactor[, size[, mean[, swapRB[, crop[, ddepth]]]]]] )
```
`dst`: Is the output image of the same size and depth as `image`.

The function has **1 required arguments** and rest are optional:

1. `image` is the input image (with 1-, 3- or 4-channels).
2. `scalefactor` is the multiplier for image values.
3. `size`is  the spatial size for output image.
4. `mean` scalar with mean values which are subtracted from channels. Values are intended to be in (mean-R, mean-G, mean-B) order if image has BGR ordering and swapRB is true. 
5. `swapRB` is the flag which indicates that swap first and last channels in 3-channel image is necessary.
6. `crop` flag which indicates whether image will be cropped after resize or not
7. `ddepth`	depth of output blob. Choose either CV_32F or CV_8U.

### <font style="color:rgb(8,133,37)">OpenCV Documentation</font>

[**`cv2.dnn.blobFromImage()`**](https://docs.opencv.org/4.5.2/d6/d0f/group__dnn.html#ga98113a886b1d1fe0b38a8eef39ffaaa0)



## Setting input value

**`setInput()`** Sets the new input value for the network.

### <font style="color:rgb(8,133,37)">Function Syntax </font>
``` python
cv2.dnn_Net.setInput( blob[, name[, scalefactor[, mean]]] )
```
The function has **1 required arguments** and 3 optional:

1. `blob` is a new blob.
2. `name` is the name of input layer.
3. `scalefactor` is an optional normalization scale.
4. `mean` is an optional mean subtraction values.

### <font style="color:rgb(8,133,37)">OpenCV Documentation</font>

[**`cv2.dnn_Net.setInput()`**](https://docs.opencv.org/4.5.2/db/d30/classcv_1_1dnn_1_1Net.html#a5e74adacffd6aa53d56046581de7fcbd)


## Detections using the DNN Model

**`forward()`** Runs forward pass to compute output of layer with name outputName. Returns blob for first output of specified layer.

### <font style="color:rgb(8,133,37)">Function Syntax </font>
``` python
cv2.dnn_Net.forward( [, outputName] )
```
The function has 1 optional arguments:

1. `outputName`	is the name of layer for which output is needed.

### <font style="color:rgb(8,133,37)">OpenCV Documentation</font>

[**`cv2.dnn_Net.forward()`**](https://docs.opencv.org/4.5.2/db/d30/classcv_1_1dnn_1_1Net.html#a98ed94cb6ef7063d3697259566da310b)

<hr   style="border:none; height: 4px; background-color: #D3D3D3 " />

### <font style='color:rgb(50,120,230)'> 2.1 Read the neural network model in main function</font>

We will add the following line one at the top of our main `play` function, but outside the loop, so that we only load the model once.

In [4]:
net = cv2.dnn.readNetFromCaffe(prototxt_path, model_path)

### <font style='color:rgb(50,120,230)'> 2.2 Function to detect the faces in the frame. </font>

We then need to define the function which runs the detection of faces. We transform the image into blob format with `cv2.dnn.blobFromImage`, assign it as an input into the model using `net.setInput`, and then run the detections using `net.forward()`.

In [5]:
def detect(net, frame):
    '''
    Detect the faces in the frame.

    returns: list of faces in the frame
                here each face is a dictionary of format-
                {'start': (startX,startY), 'end': (endX,endY), 'confidence': confidence}
    '''
    detected_faces = []
    (h, w) = frame.shape[:2]
    blob = cv2.dnn.blobFromImage(
        cv2.resize(frame, (300, 300)),
        1.0,
        (300, 300),
        (104.0, 177.0, 123.0))
    net.setInput(blob)
    detections = net.forward()
    for i in range(0, detections.shape[2]):
        confidence = detections[0, 0, i, 2]
        if confidence > 0.5:
            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
            (startX, startY, endX, endY) = box.astype("int")
            detected_faces.append({
                'start': (startX, startY),
                'end': (endX, endY),
                'confidence': confidence})
    return detected_faces

### <font style='color:rgb(50,120,230)'> 2.3 Function to draw rectangular bounding box around detected faces.</font>

Finally, we want to visually draw a rectangle for each detected face on the screen. This is regardless of whether or not a keyboard signal is to be sent.

In [6]:
def drawFace(frame, detected_faces):
    '''
    Draw rectangular box over detected faces.

    returns: frame with rectangular boxes over detected faces.
    '''
    for face in detected_faces:
        cv2.rectangle(frame, face['start'], face['end'], (0, 255, 0), 10)
    return frame

## 3. Detect movement outside the center box

This is a function to check that a detected face is inside the bounding box at the center of the frame. If this value is True on one frame and False on the next, then it will tell a future function that a keyboard press should occur.

In [7]:
def checkRect(detected_faces, bbox):
    '''
    Check for a detected face inside the bounding box at the center of the frame.

    returns: True or False.
    '''
    for face in detected_faces:
        x1, y1 = face['start']
        x2, y2 = face['end']
        if x1 > bbox[0] and x2 < bbox[1]:
            if y1 > bbox[3] and y2 < bbox[2]:
                return True
    return False

## 4. Send keyboard press on detected movement

Based on the output of the `checkRect` function, we can now decide whether to send a keyboard arrow press event to the operating system via PyAutoGUI (imported as `gui`). The `last_mov` check is added to make sure the character doesnt keep drifting in the previous detection.<br>


<hr style='border:none; height: 4px; background-color:#D3D3D3'/>

## Press a button

`press()`  function is really just a wrapper for the `keyDown()` and `keyUp()` functions, which simulate pressing a key down and then releasing it up.

### <font style="color:rgb(8,133,37)">Function Syntax </font>
``` python
pyautogui.press( key )	
```

The function has **1 required input argument**:

1. `key` string of the key to be pressed from the [pyautogui.KEYBOARD_KEYS](https://pyautogui.readthedocs.io/en/latest/keyboard.html#keyboard-keys)


### <font style="color:rgb(8,133,37)">PyAutoGUI Documentation</font>

[**`press()`**](https://pyautogui.readthedocs.io/en/latest/keyboard.html)

<hr style='border:none; height: 4px; background-color:#D3D3D3'/>

In [8]:
def move(detected_faces, bbox):
    '''
    Press correct button depending on the position of detected face and bbox.

    The last_mov check is added for making sure the character doesn't keep
    drifting in the previous detection.
    '''
    global last_mov
    for face in detected_faces:
        x1, y1 = face['start']
        x2, y2 = face['end']

        # Center
        if checkRect(detected_faces, bbox):
            last_mov = 'center'
            return

        elif last_mov == 'center':
            # Left
            if x1 < bbox[0]:
                gui.press('left')
                last_mov = 'left'
            # Right
            elif x2 > bbox[1]:
                gui.press('right')
                last_mov = 'right'
            # Down
            if y2 > bbox[2]:
                gui.press('down')
                last_mov = 'down'
            # Up
            elif y1 < bbox[3]:
                gui.press('up')
                last_mov = 'up'

            # Print out the button pressed if any.
            if last_mov != 'center':
                print(last_mov)

## 5. Updating the play function

We can now update our `play` loop to call the functions defined in the prior steps. Below, we have added the calls to detect faces, draw them on the video feed, and then send the according commands to PyAutoGUI for keyboard actions. Notice the loop below is also enhanced further with an FPS display, calculated manually using the time elapsed between displayed frames.

To increase performance, the loop captures and process every other frame for face detection and positioning.

In [9]:
def play(prototxt_path, model_path):
    '''
    Run the main loop until cancelled.
    '''
    global last_mov
    # Used to record the time when we processed last frame.
    prev_frame_time = 0
    # Used to record the time at which we processed current frame.
    new_frame_time = 0

    net = cv2.dnn.readNetFromCaffe(prototxt_path, model_path)
    cap = cv2.VideoCapture(0)

    # Counter for skipping frame.
    count = 0

    # Used to initialize the game.
    init = 0

    # Getting the Frame width and height.
    frame_width, frame_height = int(cap.get(3)), int(cap.get(4))

    # Co-ordinates of the bounding box on frame
    left_x, top_y = frame_width // 2 - 150, frame_height // 2 - 200
    right_x, bottom_y = frame_width // 2 + 150, frame_height // 2 + 200
    bbox = [left_x, right_x, bottom_y, top_y]

    while not cap.isOpened():
        cap = cv2.VideoCapture(0)

    while True:
        fps = 0
        ret, frame = cap.read()

        if not ret:
            return 0

        frame = cv2.flip(frame, 1)
        # Detect the face.
        detected_faces = detect(net, frame)
        # Draw bounding box around detected faces.
        frame = drawFace(frame, detected_faces)
        # Drawing the control rectangle in the center of the frame.
        frame = cv2.rectangle(
            frame, (left_x, top_y), (right_x, bottom_y), (0, 0, 255), 5)

        # Skipping every alternate frame.
        if count % 2 == 0:
            # For first pass.
            if init == 0:
                # If face is inside the control rectangle.
                if checkRect(detected_faces, bbox):
                    init = 1
                    cv2.putText(
                        frame, 'Game is running', (100, 100),
                        cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
                    cv2.waitKey(10)
                    last_mov = 'center'
                    # Click to start the game.
                    gui.click(x=500, y=500)
            else:

                move(detected_faces, bbox)
                cv2.waitKey(50)
        # Calculating the FPS.
        new_frame_time = time.time()
        fps = int(1 / (new_frame_time - prev_frame_time))
        prev_frame_time = new_frame_time

        frame = cv2.putText(
            frame, str(fps) + 'FPS', (200, 100),
            cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 2)
        cv2.imshow('camera_feed', frame)
        count += 1

        # Exit the loop on pressing the `esc` key.
        k = cv2.waitKey(5)
        if k == 27:
            return

### <font style='color:rgb(50,120,230)'> Calling the main loop </font>

Finally, we call our `play` function, which will run until the escape key is pressed.

In [10]:
# Used to pass the previous move of the user to the play() function.
last_mov = ''
# play(prototxt_path, model_path)

up


### You can run `play()` by uncommenting it or use the python script

## 6. Examples with different games

In [11]:
from IPython.display import Video

Video('Output/Output_Web_Game.mov',  width=900)

### As we are using PyAutoGUI to control the keyboard, we can play any game by slight modification to the logic of the `move()` function.


## Pac Man

You can play [Google's Pacman Doodle](https://www.google.com/logos/2010/pacman10-i.html) entirely with up, down, left, and right inputs too.

In [12]:
from IPython.display import Video

Video('Output/Output_Pac_man.mov',  width = 900)

## Racing Game

In this example racing game, only the left-right control inputs are controlled by the script.

In [13]:
from IPython.display import Video

Video('Output/Output_car_race.mov',  width = 900)