# **Computer Vision Spring2023 - Fulbright University Vietnam**

# **FINAL PROJECT REPORT - HEADSHOT TRACKING TEAM**

> **Authors: Pham Dang Yen Nhi - Nguyen Thanh Long - Pham Hoang Lan**

> **Instructor: Prof. Duong Phung**

# **1. INTRODUCTION**

## **1.1 Project Description**

The aim of this project is to leverage computer vision techniques as well as mechanical and electrical knowledge to develop an engaging and interactive game that incorporates face tracking, hand tracking, and finger counting functionalities. The game provides a unique and immersive experience for the players by utilizing these features in a creative manner.

The core feature of the game is face tracking, where the user's face is continuously monitored and tracked in real-time. This tracking information is then used to control a motorized laser light, which dynamically adjusts its position to always point towards the user's face.

In addition to face tracking, the game incorporates hand tracking and finger counting capabilities. The system recognizes and tracks the user's hand movements, allowing them to interact with the game using gestures and finger counts. This functionality enables the user to answer questions or perform specific actions within the game by utilizing their hand movements.

The flow of the game involves the user's face being continuously tracked, with the laser light responding to their facial movements. As the user moves, the laser light dynamically adjusts its position to always stay focused on their face. The player needs to find hidden randomized zones on the screen. When they are in those safe zones, the laser light does not point at them. While finding those zones, users will need to user their hands to interact and answer math questions on the screen. When the user's hand is shown to the camera, the hand is tracked and is used to determine finger counting. 

Overall, it is an interactive and fun game that applies multiple computer vision techniques.

## **1.2 Motivation**

The decision to explore this project focusing on headshot tracking stems from several motivating factors. 

* Firstly, our team wanted to create a project that has practical applications of technology. By combining several computer vision techniques such as face tracking, hand tracking, and finger counting, we aim to gain a comprehensive understanding of the potential applications and limitations of these techniques. 

* Secondly, the project benefits from a straightforward mechanical and electrical setup, allowing for efficient and time-saving testing. 

* Finally, the resulting product offers a visible and enjoyable experience, making it easily accessible and immersive for all users. This project serves as an excellent opportunity to explore various engineering concepts while delivering a fun and engaging experience.

## **1.3 Main Features**

Here are the main features of the game and their descriptions, including how they are used:

> 1. Face tracking: The game utilizes face tracking technology to control the motorized laser light. The user's face movements are tracked in real-time, allowing the laser light to always point towards the user's face. Additionally, the user can control a point on the screen using their face, providing an interactive and dynamic gameplay experience.

> 2. Hand tracking: Hand tracking is employed to enable natural and intuitive interaction within the game. The user's hand movements are detected and tracked, allowing them to perform various actions and gestures in the virtual environment. This enhances immersion and engagement, making the gameplay more interactive and enjoyable.

> 3. Finger counting: The game incorporates finger counting as a means of user input. The system accurately recognizes and counts the user's fingers, allowing them to provide answers to in-game questions or perform specific actions. This feature adds an element of responsiveness and interactivity, making the gameplay more engaging and challenging.

> 4. Laser controlling: The game introduces a fourth feature where computer vision results in a real-life application by controlling laser movements based on face detection. This feature adds a physical interaction element between the users and the computer vision technology. As the user's face is tracked in real-time, the motorized laser light dynamically adjusts its position to align with the user's face movements. This creates a unique and immersive experience where players can physically interact with the game through the laser light.

By combining these features, the game creates an interactive and immersive experience where the user's face movements control the motorized laser light, hand movements enable intuitive interaction, and finger counting allows for user input and response to in-game challenges.

## **1.4 Application**

The project has numerous practical applications in the domain of camera-based machinery control. Specifically, this project is well-suited for application in robotics and surveillance contexts.

> •	In the field of robotics, the project can be utilized to track moving targets, such as people or vehicles, and control robotic arms accordingly. This technology is particularly relevant in warehouse automation systems where robotic arms must pick and place items moving along a conveyor belt.

> •	In the field of surveillance, the project can facilitate traffic light control and the detection of anomalous activities or traffic violations. For example, the system can be used to track and detect vehicles driving in the wrong direction or exceeding speed limits on highways, automatically changing traffic lights to prevent accidents and ensure public safety. The versatility of this technology across diverse fields highlights its potential for widespread implementation and innovation.


# **2. METHOD**

## **2.1 System Framework**

### *2.1.1 Block Diagram*

At the center of the diagram is the Processor, which is a computer running the Python Programming Language. The Processor receives input from the laptop's webcam in the form of continuous frames captured by the webcam. 

The Processor utilizes OpenCV and MediaPipe libraries for two main features: Face detection and face tracking, as well as hand detection and finger counting. These features enable the system to identify and track faces in the captured frames, as well as detect and count the number of fingers in a hand.

To control the movement of the laser module, the Processor communicates with an Arduino Uno R3, which acts as the Microcontroller Unit (MCU). The Arduino is responsible for controlling the motors that adjust the X-axis and Y-axis of the laser module. This allows the laser module to be directed and positioned accordingly.

The output and interface of the system are managed through two micro servo SG90 motors connected to the Arduino. These motors control the movement of the laser module based on the instructions received from the Processor. The interface for the user is the laptop's display, which shows the results of the face detection and hand detection, providing a visual representation of the system's functionality.

In short, the block diagram showcases a system that utilizes computer vision techniques for face and hand detection, along with motor control via an Arduino, to provide a user interface through the laptop's display.

__Figure 1:__ Block Diagram

![Figure 1](Figures\block_diagram.png)

### *2.1.2 System Flowchart*

The flowchart depicted in *Figure 2* provides a comprehensive outline of the gameplay mechanics of the system.

To start the game, players are required adjust their face at the right position, and then use their right-hand fingers to show a countdown starting from 3 and ending at 1.

At the "START GAME" stage, the system generates five random 50x50 windows, representing safe zones where the laser remains deactivated. Points can be earned by players as they move their faces around to locate these generated windows. Throughout the gameplay, a laser will track the player's face and point towards it, introducing a physical element to the game that necessitates active exploration for all the safe zones. The objective is to find all the windows within a time limit of 1 minute, otherwise players will lose the game.

While in search of the safe zones, players will encounter pop-up equations every 5 seconds and must provide the correct solutions by displaying the answers using their fingers. Showing the accurate answer is crucial to progress with the search for safe zones.

Overall, the provided flowchart offers an insightful overview of the designed, involving actions such as finger counting, safe zones searching, equation generating, and game display.

__Figure 2:__ System Flowchart

![Figure 2](Figures\game_flowchart.png)

## **2.2 Unit Implementation - Development**

### *2.2.1 Face Detection and Distance Measuring*

1. `FaceMeshDetector` class: This class encapsulates the functionality for detecting and tracking faces using the FaceMesh algorithm. It is imported from the `cvzone.FaceMeshModule` module. The class takes parameters such as maxFaces to determine the maximum number of faces to detect and track. The class provides methods to find the face mesh in an input image and retrieve the face landmarks.

2. `findFaceMesh()` function: This function is a method of the `FaceMeshDetector` class. It takes an input image and performs face detection and tracking using the FaceMesh algorithm. The draw parameter specifies whether to draw the face landmarks and connections on the image. The function returns the modified image and a list of faces detected.

3. `findDistance()` function: This function is also a method of the `FaceMeshDetector` class. It calculates the distance between two specified landmarks on a face using their coordinates. The function takes the coordinates of two landmarks (p1 and p2) as input and computes the distance between them. It can also draw a circle at the midpoint between the two landmarks. The function returns the calculated distance, the modified image, and a list of coordinates. In the code, we use this function to find the distance between two eyes.

4. `d = (W * f) / w` equation: This equation is used to calculate the actual distance `d` of the face from the camera. It involves the known distance between the eyes  `W`, the focal length `f`, and the measured distance between two face landmarks `w` which is the distance between two eyes we obtain using the `findDistance()` function above.

### *2.2.2 Finger Counting*

1. `handDetector` class: to encapsulate the hand tracking functionality. Its initialization function sets parameters like mode, `maxHands`, `model complexity`, `detectionCon`, and `trackCon`. It also initializes objects for hand detection (`mpHands`), landmark drawing (`mpDraw`), and fingertip identification (`tipIds`).

3. `findHands` function: to detects hands in an input image using `self.hands.process` from the `mpHands` object. If draw is `True`, it also draws landmarks and connections using `self.mpDraw.draw_landmarks`. The function returns the modified image.

4. `findPosition` function: to determine the position of landmarks on a specified hand in an input image. It retrieves the hand landmarks from `self.results.multi_hand_landmarks` and iterates over them. It extracts the x and y coordinates of each landmark, appends them to `xList` and `yList`, and stores the landmark information in `self.lmList`. Additionally, it calculates the bounding box of the hand based on the minimum and maximum coordinates. The function returns `self.lmList` (list of landmarks) and `bbox` (bounding box coordinates).

5. `findDistance` function: to calculate the distance between two specified landmarks (`p1` and `p2`) using the x and y coordinates retrieved from `self.lmList`. It also calculates the midpoint between the landmarks. It draws a circle at the midpoint. The function returns the calculated distance, modified image, and a list of coordinates.

6. `fingersUp` function: to determine which fingers are up by comparing the y-coordinate of the fingertip landmarks with the y-coordinate of the adjacent landmarks of the fingers. It iterates over the `tipIds` list and checks if the y-coordinate of the current fingertip landmark is less than the y-coordinate of the adjacent landmark. It appends the result (1 for up, 0 for down) to the `fingers` list. The function returns the `fingers` list, indicating which fingers are up.

### *2.2.3 Servo Motor Positioning*

The servo positions are calculated based on the detected face position in the image. The x and y coordinates of the face bounding box are extracted, representing the center of the face.

To convert the face position to servo positions, a mapping is performed from the range of the image resolution to the range of servo positions (0 to 180). This mapping ensures that the servo positions correspond to the appropriate angles based on the face position.

The process to perform a mapping is described as follows:

1. Determine the range of the image resolution for both the x and y coordinates. For example, if the resolution is 640x480 pixels, the x range would be [0, 640] and the y range would be [0, 480].

2. Determine the range of the servo positions. Typically, servo motors operate within a range of 0 to 180 degrees, but we only operate the motors within a range of 50 to 170 degrees after calibration testing.

3. Calculate the corresponding servo position for the x coordinate of the face. This can be done using linear interpolation. For example, if the face x coordinate is 320 (which is the center of a 640x480 image), the servo position for the x coordinate can be calculated as follows:
   - Normalize the face x coordinate within the image resolution range: `x_normalized = (face_x - image_x_min) / (image_x_max - image_x_min)`
   - Calculate the servo position within the servo range: `servo_x = x_normalized * (servo_max - servo_min) + servo_min`
   
   In this case, if servo_min is 50 and servo_max is 170, the servo position for the x coordinate would be 90 degrees.

4. Perform the same calculation for the y coordinate of the face to obtain the servo position for the y-axis.

By mapping the face position from the image resolution range to the servo position range, the servo positions are calculated and used to control the camera's movement to track the detected face.

### *2.2.3 Communication with Arduino*

The `pyfirmata` library allows us to communicate with the Arduino board using the Firmata protocol. By using 1pyfirmata`, we can control various Arduino peripherals, such as servos, motors, sensors, etc., from the Python program. send commands to the two servo motors on which the laser pointer is attached from the Python program. 

In this specific case, pyfirmata is used to send commands to the two servo motors on which the laser pointer is attached, and thereby adjust the position of the camera based on the detected face.

## **2.3 Experimental Setup**

The team set up two testing modules based on the block diagram in *Figure 1*.

To ensure eye safety for players during the gameplay, a 10K control potentiometer was connected, allowing us to adjust the intensity of the laser pointer to be dimmer.

__Figure 3:__ Equipment list

![Figure 3](Figures\equipmentList.png)

__Figure 4:__ Assemble of components

![Figure 4](Figures\assemble_setup.jpg)

# **3. RESULT - DISCUSSION**

## **3.1 Unit Test**

### *3.1.1 Face Detection*

The application displays a well performance in real-time face detection, offering prompt feedback on the presence and precise positioning of faces within the video stream. The bounding boxes effectively delineate the detected faces, facilitating effortless identification. Furthermore, the code incorporates a feature that exhibits the confidence score of each detected face.

__Figure 5:__ Face Detection

![Figure 5](Figures\facedetection.png)

### *3.1.2 Face Mesh Detection and Distance Estimation*

The code successfully computes and displays the face mesh and the depth in centimeters on the video stream.

__Figure 6:__ Face Mesh and Distance

![Figure 6](Figures\facemesh.png)

### *3.1.3 Finger Counting*

The application effectively recognizes hand gestures and dynamically updates the overlay image in response to the finger count. Upon detecting a hand, the code analyzes the hand landmarks to determine the number of raised fingers. Moreover, it displays the frames per second (FPS) and the total count of raised fingers on the screen. By integrating hand tracking, gesture recognition, and visual feedback, the code establishes a solid foundation for the development of our interactive game, leveraging hand gestures for input and control.

__Figure 7:__ Finger Detection

![Figure 7](Figures\finger.png)

## **3.2 System Test**

The implemented code has successfully delivered an engaging and interactive finger counting game through the utilization of face and hand detection. Users have enthusiastically reported a positive and enjoyable experience during gameplay. The code precisely tracks hand movements and accurately counts the number of displayed fingers, enabling users to solve equations by showcasing the correct finger count. This achievement is evident in the seamless integration of face and hand detection technology, enhancing the overall gameplay and ensuring precise tracking of hand gestures. Players have found the game both challenging and entertaining, leading to its resounding success in providing an engaging finger counting experience.

The game commences with a welcoming screen that prompts the player to perform a hand countdown from 3 to 0 in order to initiate the game. This initial window serves the dual purpose of allowing players to verify if their distance from the camera is adequate.

__Figure 8:__ Welcome Window

![Figure 8](Figures\welcome.png)

Once the game begins, the window will display essential information such as the remaining time for the player to win, the distance from the camera, and the points they have accumulated. To emerge victorious, the player must locate five hidden 50x50 windows within the game environment. When the midpoint of their face aligns with a window, a laser will be directed downwards, indicating that it no longer tracks the player's movements. Furthermore, once a window is discovered, it will remain visible on the screen throughout the entirety of the game.

__Figure 9:__ Mid-game window

![Figure 9](Figures\point.png)

Every 5 seconds, the code successfully displays an equation on the screen and consistently detects the player's finger count to compare it with the expected result. As the player tackles the question, the timer continues to countdown, adding a sense of urgency to the gameplay.

__Figure 10:__ Calculation

![Figure 10](Figures\calculation.png)

The program also excels in presenting the "YOU WON!" and "GAME OVER!" windows to indicate whether the player has emerged victorious or lost the game within the 60-second time limit. These windows serve as clear indicators of the player's outcome, providing a satisfying conclusion to their gaming experience.

__Figure 11:__ YOU WON window

![Figure 11](Figures\won.png)

__Figure 12:__ GAME OVER window

![Figure 12](Figures\gameover.png)

In general, the game operates smoothly and systematically. All the components, including Detection, Calculation, and Motors, work efficiently in harmony, following the flow chart. As a result, the game maintains an average frame rate of 20 FPS, ensuring a seamless and enjoyable gaming experience.

## **3.3 Discussion**

While the overall flow of the game is well-defined, there are several considerations regarding potential issues and constraints that could impact the gameplay and user experience.

One area of concern is the accuracy of hand detection and finger counting. Insufficient training or execution of the system may result in inaccuracies when detecting the number of fingers on the left hand. This could lead to calculations that are too easy to solve, potentially reducing the game's challenge and making it less engaging for the user. Ensuring that the hand detection algorithm is well-trained and robust enough to accurately count fingers on both hands is crucial.

Another aspect to consider is the potential noise in the detection process when multiple main faces are present in the camera window. The current code does not explicitly handle scenarios where more than one main face is detected simultaneously.

In situations with multiple main faces, the face detection algorithm may struggle to identify the correct face to track for hand detection. This can introduce inaccuracies in the hand detection and finger counting process, ultimately affecting the gameplay.

To address this concern, additional logic can be implemented to prioritize and consistently track a single main face throughout the game. Techniques such as face recognition or tracking algorithms can be employed to identify and follow a specific face, even in the presence of multiple faces. By focusing on a single face, the hand detection and finger counting algorithms would have a clearer reference point, leading to more accurate calculations and a smoother gaming experience.

Furthermore, providing visual or audio cues to users when the system detects multiple main faces could be beneficial. This would prompt users to adjust their positioning or rearrange themselves to ensure that only one main face is visible in the camera window. By offering feedback and guidance, users can actively participate in optimizing the detection process and enhancing the accuracy of the game.

# **4. CONCLUSION**

The project successfully developed an interactive game using computer vision techniques, mechanical knowledge, and electrical expertise. By incorporating face tracking, hand tracking, and finger counting functionalities, the game offers a unique and immersive experience for players.

Throughout the project, valuable learning experiences were gained in implementing face tracking, hand tracking, and finger counting algorithms. The findings demonstrate the successful integration of these features, enhancing interactivity and engagement in the game.

Further study can explore advanced machine learning algorithms to improve accuracy, expand gameplay mechanics, and introduce additional gestures or multiplayer capabilities.

In summary, the project showcases the potential of computer vision in creating immersive games and presents opportunities for future research and enhancement in the field.

# **REFERENCES**

> rizkydermawan1992. (2023). face-detection. [GitHub repository]. Retrieved from https://github.com/rizkydermawan1992/face-detection

> paveldat. (2023). finger_counter. [GitHub repository]. Retrieved from https://github.com/paveldat/finger_counter

> Google. (2023). mediapipe. [GitHub repository]. Retrieved from https://github.com/google/mediapipe

> OpenCV. (2023). opencv. [GitHub repository]. Retrieved from https://github.com/opencv/opencv

> Arduino. Servo library reference. Retrieved from https://www.arduino.cc/reference/en/libraries/servo/

> Computer Vision Zone. Code and Files. Retrieved from https://www.computervision.zone/lessons/code-and-files-13/

> PyFirmata. PyFirmata Documentation. Retrieved from https://pyfirmata.readthedocs.io/en/latest/