# Emulated Hand Controller

## By Haris Naveed, Jash Narwani, and Robert Walsh

For this project our goal is to create a hand tracking and gesture recognition system in such a way that the gestures could be used as a replacement for a simple game controller.
In this case we intend to target the NES for our controller input

## Step 1: Hand Tracking

In order to do anything for this project, we first need a method to track our hands. We can do this by making use of the mediapipe and openCV packages

In [2]:
# here's our packages
import cv2
import mediapipe as mp

# we will also need copy so we can get a clean cut of our frames
import copy

# so we can show intermediate images
import matplotlib.pyplot as plt

ImportError: DLL load failed while importing _framework_bindings: A dynamic link library (DLL) initialization routine failed.

In [None]:
# Before we get started proper, we should start by defining our hand detection model.
# Thankfully, mediapipe has a premade model specifically for finding and tracking hands, we can use it
hands = mp.solutions.hands.Hands()

In [None]:
# we need to also define our cv2 capturing method. Without it, we won't be able to track our hands at all. I hope you have a camera somewhere on your computer...
def videoCapture():
    cap = cv2.VideoCapture(0)
    
    if not cap.isOpened():
        print("No camera, that sucks")
        return None
    
    return cap

def endCapture(cap):
    cap.release()
    
    cv2.destroyAllWindows()

In [None]:
# use this to identify our hands
def findHand(frame, margin, hands):
    # mediapipe uses RGB to find hands, cv2 captures BGR for some reason
    # we correct this
    img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

    # use the mediapipe hands model to identify our hand and keypoints
    results = hands.process(img)
    cutFrame = None

    # mediapipe normalizes coordinates, use capture size to fix that for later
    h, w, c = img.shape

    # if a hand is identified...
    if results.multi_hand_landmarks:
        # get the landmarks of the first hand
        hand1 = results.multi_hand_landmarks[0]
        Xs = []
        Ys = []
        # denormalize the coordinates of each landmark
        for landmark in hand1.landmark:
            cx, cy = int(landmark.x * w), int(landmark.y * h)
            Xs.append(cx)
            Ys.append(cy)

        # identify the bounds of our hand
        xmin = min(Xs)
        xmax = max(Xs)
        ymin = min(Ys)
        ymax = max(Ys)

        # crop the frame to only have our hand
        cutFrame = copy.deepcopy(frame[max(0, ymin-margin):min(frame.shape[0], ymax+margin), max(0, xmin-margin):min(frame.shape[1], xmax+margin)])

        cv2.rectangle(frame, (max(0, xmin-margin), max(0, ymin-margin)), (min(frame.shape[1], xmax+margin), min(frame.shape[0], ymax+margin)), (0, 255, 0), 2)

    # return the frame, and the cropped frame (if applicable)
    return frame, cutFrame

Let's give it a try. Try running the next cell with your hand in clear view of the camera

In [None]:
cap = videoCapture()

ret, frame = cap.read()

# Find the hand and get the cropped frame
frame, cutFrame = findHand(frame, 100, hands)

if cutFrame is not None:
    if cutFrame.shape[0] != 0 and cutFrame.shape[1] != 0:
        plt.imshow(frame)
        plt.show()
        plt.imshow(cutFrame)
        plt.show()

endCapture(cap)