<div style="
    border: 3px solid white;
    border-radius: 16px;
    padding: 10px;
    background-color: #4F2683;
    color: white;
    max-width: 1300px;
    margin: auto;
">

<p align="center" style="margin: 0;">
  <img src="WesternEng.png" 
       alt="Western University Logo" width="200">
</p>

<h2 align="center" style="color:white; margin: 6px 0 2px 0;">Western University</h2>
<h3 align="center" style="color:white; margin: 2px 0;">Faculty of Engineering</h3>
<h4 align="center" style="color:white; margin: 2px 0;">Department of Electrical & Computer Engineering</h4>

<hr style="border: 1px solid white; margin: 8px 0;">

<div align="center" style="margin: 0; padding: 0; line-height: 1.3;">

**AISE-3350 — Cyber-Physical Systems Theory**  
Instructor: **Elvis Chen**  
Date: *Dec 5, 2025*  
Group Members: *Jonathan Das, Alex Hazen, Sarthak Bali, Jan Wodnicki*  

</div>

<hr style="border: 1px solid white; margin: 8px 0;">

</div>



## Table Of Contents
***
##### [1. Introduction](#introduction)

<div style="margin-top:-75px;"></div>

##### [2. Methods](#methods)

<div style="margin-top:-65px;"></div>

- [Overview](#overview)
- [Strategic Framework & Loss Minimization](#strategic-framework-and-loss-minimization-logic)
- [Nash Equilibria for Two-Move Subsets](#nash-equilibria-for-all-two-move-subsets)
- [Interpretation of Equilibria](#interpretation-of-equilibria)
- [User-Driven Strategy Selection](#user-driven-strategy-selection)
- [Weighted Strategy Model](#connection-to-the-weighted-strategy-model)
- [Hand Detection & Gesture Classification](#hand-detection-and-gesture-classification)
- [Player Segmentation Logic](#player-segmentation-logic)
- [System Integration](#system-integration)

<div style="margin-top:-50px;"></div>


##### [3. Machine Learning Model](#machine-learning-model)
<div style="margin-top:-65px;"></div>

-   [Dataset Pre-Processing](#dataset-pre-processing)
-   [Model Training](#model-training)
-   [Model Performance on Static Dataset](#model-performance-on-static-dataset)

<div style="margin-top:-50px;"></div>

##### [4. System Integration - Live Usage and Testing](#live-usage)
<div style="margin-top:-65px;"></div>

##### [5. Broader Impacts & Ethics](#broader-impacts-and-ethical-considerations)
<div style="margin-top:-65px;"></div>

##### [6. Conclusion](#conclusion)


***
---
<br><br>


<a id="introduction"></a>
<div style="text-align:center; font-size:45px; font-weight:bold; padding:4px;">
    Introduction
</div>


---
Cyber-physical systems (CPS) consist of three entities, sensing, computing, and actuating working together to enable real-time processing and response. This project integrates computer vision and corresponding game theory to develop a CPS suited to play Rock-Paper-Scissors-Minus-One (RPS-1).

The more commonly known variant Rock-Paper-Scissors (RPS) offers a simple pairwise competition between two players. The “Minus-One” (RPS-1) variant introduces complexity to this decision space by introducing one additional hand per player. Players must present both of their hands and then remove one, with the remaining pair being the decider. This addition makes the game a two-stage, simultaneous, zero-sum game.  

The game of RPS-1 was chosen as it requires the project to apply all pillars of CPS. Sensing using Computer Vision (CV) and Machine Learning (ML) to detect the player’s actions. Computation to identify gestures and compute the game theory principles and payoff matrices. Lastly, a decision recommending what the player should do to maximize their chances of winning.
<br><br>
***
---

<a id="methods"></a>
<div style="text-align:center; font-size:45px; font-weight:bold; padding:4px;">
    Methods
</div>


---
## Overview

The objective of the CPS is (1) to identify the hand shapes being presented by each player for stage one and (2) to determine the optimal action to win the game through a game-theoretical analysis. The workflow consists of three major components: **hand detection and classification**, **enumeration of all possible outcomes**, and **strategy selection**.

The system first detects up to four hands simultaneously and classifies each as *Rock*, *Paper*, or *Scissors*. From these detected shapes, all feasible hand-combinations are computed using Cartesian enumeration. A payoff analysis is then applied to determine which move Player 1 should remove in Stage Two to minimize expected loss and maximize strategic advantage. This modular structure isolates perception from decision-making, allowing both components to be improved independently.

---

## **Strategic Framework and Loss Minimization Logic**

To formally evaluate Stage-Two interactions, each player is restricted to a **two-move subset** of  
$\{R,\,P,\,S\}$. The valid subsets are:

$
RP,\quad RS,\quad PS.
$

This yields exactly **nine** admissible RPS-1 games:

$
\begin{aligned}
&(RP,RP),\; (RP,RS),\; (RP,PS),\\
&(RS,RP),\; (RS,RS),\; (RS,PS),\\
&(PS,RP),\; (PS,RS),\; (PS,PS).
\end{aligned}
$

For each subset pairing, the Stage-Two interaction is modeled as a $2\times2$ zero-sum game with payoff matrix:

$
\begin{array}{c|cc}
 & X & Y \\\hline
A & a & b \\
B & c & d
\end{array}
$

### **Nash Equilibria for All Two-Move Subsets**

The table below summarizes the Nash equilibrium mixing probabilities $(p^*, q^*)$ for all nine admissible two-move subset pairings. Equilibria were computed using:

$
q^* = \frac{d - b}{a - b - c + d}, \qquad 
p^* = \frac{d - c}{a - b - c + d}.
$

| **P1 Subset** | **P2 Subset** | **Equilibrium $(p^*, q^*)$** | **Likely P1 Move** | **Likely P2 Move** |
|--------------|--------------|----------------------------------|---------------------|---------------------|
| RP | RP | $(\tfrac{1}{2}, \tfrac{1}{2})$ | Tie | Tie |
| RP | RS | $(\tfrac{2}{3}, \tfrac{1}{3})$ | Paper | Scissors |
| RP | PS | $(\tfrac{1}{3}, \tfrac{2}{3})$ | Rock | Paper |
| RS | RP | $(\tfrac{1}{3}, \tfrac{2}{3})$ | Scissors | Rock |
| RS | RS | $(\tfrac{1}{2}, \tfrac{1}{2})$ | Tie | Tie |
| RS | PS | $(\tfrac{2}{3}, \tfrac{1}{3})$ | Rock | Scissors |
| PS | RP | $(\tfrac{2}{3}, \tfrac{1}{3})$ | Paper | Rock |
| PS | RS | $(\tfrac{1}{3}, \tfrac{2}{3})$ | Scissors | Rock |
| PS | PS | $(\tfrac{1}{2}, \tfrac{1}{2})$ | Tie | Tie |

---

### **Interpretation of Equilibria**

#### **1. Symmetric subsets ⇒ symmetric equilibria**

When the subset pairings are identical (e.g., $RP$ vs. $RP$ or $PS$ vs. $PS$), no player holds a structural advantage.  
Both players mix uniformly:

$
p^* = q^* = \tfrac{1}{2}.
$

These cases represent balanced strategic interactions with no dominant action.

---

#### **2. Asymmetric subsets create predictable advantages**

In pairings where the two-move subsets differ, one player typically holds a structural advantage based on which of their moves counters the weaker option in the opponent’s set.

For example, in $(RP, PS)$:

- Player 2’s **Paper** is the strongest available move  
- Player 1’s best response is **Rock**

The equilibrium reflects this:

$
p^* = \tfrac{1}{3}, \qquad q^* = \tfrac{2}{3}.
$

Thus, Player 1 is most likely to play **Rock**, while Player 2 is most likely to play **Paper**.

---

#### **3. Most-likely moves follow threat coverage**

Across all asymmetric games:

- The most likely move for each player is the one that **covers** more of the opponent's vulnerabilities.  
- Equilibrium probabilities reflect how “safe” each move is against the opposing subset.

This analysis forms the theoretical basis for the loss-minimization logic used in the CPS and provides users with insight into which move introduces the highest strategic risk.


The **mixed-strategy Nash equilibrium** for this matrix is:

$
q^* = \frac{d - b}{a - b - c + d}, 
\qquad
p^* = \frac{d - c}{a - b - c + d},
$

where $p^*$ and $q^*$ indicate the equilibrium probability of selecting the first move in each subset.

Computing these values across all nine subset pairings reveals the most strategically stable mixture for both players. Symmetric pairings produce uniform mixing, while asymmetric subsets exhibit predictable dominance patterns based on threat coverage. These results form the basis of risk evaluation and inform the user on which move presents the highest loss potential.

---

## **User-Driven Strategy Selection**

In the final CPS implementation, the decision of which move to remove for Stage Two is made **by the user**, not automatically by the system. Rather than enforcing an optimal strategy, the CPS presents users with the computed outcome enumeration, payoff structure, and equilibrium insights, allowing them to select their preferred strategy.

This design transforms the CPS into a **decision-support tool**, where system analysis guides but does not control user action. It increases user agency, introduces natural variability into game outcomes, and mirrors human-in-the-loop operation common in real cyber-physical systems. Additionally, this approach promotes engagement with fundamental game-theoretical concepts—including expected value and equilibrium—while maintaining a transparent and interpretable model.

---

## **Connection to the Weighted Strategy Model**

The system incorporates a weighted-strategy framework in which each payoff $U(m_i,n_j)$ is transformed into a behaviorally informed payoff $U'(m_i,n_j)$. These weights adjust the incentive structure when the predicted opponent move is played, generating a modified payoff matrix:

$
\begin{array}{c|cc}
 & X & Y \\\hline
A & a' & b' \\
B & c' & d'
\end{array}
$

Applying the same Nash formulas to this reweighted matrix:

$
q^* = \frac{d' - b'}{a' - b' - c' + d'},
\qquad
p^* = \frac{d' - c'}{a' - b' - c' + d'},
$

allows the system to integrate behavioral forecasting while preserving theoretical consistency. This ensures that the decision-support engine can adapt to user-selected strategies without altering the underlying equilibrium structure.

---

## **Hand Detection and Gesture Classification**

Three gesture-recognition approaches were evaluated:

1. **Mediapipe Landmark Classifier** — rule-based, lightweight, no training required; rejected due to poor generalization.  
2. **Mediapipe Gesture Recognition Model** — rejected for instability under real-world lighting and gesture overlap.  
3. **Custom YOLOv11 Gesture Detector** — selected for its accuracy, speed, and robustness.

Model training involved automated annotation, dataset restructuring, and fine-tuning a YOLOv11 backbone for 80 epochs with task-appropriate augmentations. The resulting detector outputs symbolic labels suitable for strategic evaluation.

---

## **Player Segmentation Logic**

Player assignment is achieved through spatial partitioning of the camera frame:

- Left half → Player 1  
- Right half → Player 2  

This method reliably maps detected gestures to the correct player under a fixed camera layout.

---

## **System Integration**


For the system to operate as a functional CPS, the outputs of the computer-vision subsystem must be interfaced with the decision engine in a manner consistent with CPS design principles. After gesture detection through the YOLO classifier, the generated symbolic labels are passed into the evaluation functions defined in 'RPSLogic.py`. All possible Stage-Two combinations are then processed through the loss-minimization algorithm, and the resulting recommendation is overlaid on the video feed using OpenCV to provide real-time feedback to the user.

To support modularity, the subsystems for **hand detection**, **hand classification**, **player segmentation**, and **strategic evaluation** were implemented across two separate files. The background logic file encapsulates the full reasoning framework behind the RPS-1 game. It is structured so that it can be called with player inputs and a selected strategy type, returning either a recommended action or a message indicating that Player 1 has already secured a win. This module is intentionally standalone and can also be executed through a command-line interface, making it reusable beyond the current CPS structure.

The second file handles **hand detection**, **gesture classification**, and **player segmentation**. The camera feed is partitioned into quadrants depending on the game stage—four regions during initial detection and two regions during Stage Two. The trained YOLO model is applied to each quadrant to detect and classify hand gestures. These classifications are separated into Player 1 or Player 2 based on their spatial region, after which the logic engine is invoked to compute the optimal move for Player 1 to remove or to determine the winner directly. This file is can also be implemented using uploaded pictures instead of using the live feed.

This modular separation promotes clean organization, easier debugging, and enhanced reusability. By isolating perception and decision logic into independent components, the CPS maintains strong **reproducibility**, **maintainability**, and **extensibility**, aligning with the core software-engineering and system-design requirements of the course.

***
---



<a id="machine-learning-model"></a>
<div style="text-align:center; font-size:45px; font-weight:bold; padding:4px;">
    Machine Learning Model
</div>


---

## **Dataset Pre-Processing**
[View Pre-Processing Step 1](modelTraining\imageannotation.py)\
[View Pre-Processing Step 2](modelTraining\datasplit.py)


In our implementation, we selected **YOLOv11** because it proved to be the most robust model for identifying and labeling up to four hands in a single frame. To make YOLOv11 suitable for our RPS-1 application, we trained it on numerous hand-gesture images consisting of the three target classes: **Rock**, **Paper**, and **Scissors**. Due to the similarity in publicly available datasets on Kaggle, three datasets were merged to reduce the chance of overfitting and increase generalizability.

To account for variations in image formats, `imageannotation.py` automatically handles JPG, JPEG, and PNG files. A pretrained YOLOv11 model is used to localize the hand in each image and draw a bounding box around it. These bounding boxes provide clear supervision signals during training, ensuring the model focuses on the gesture-performing region.

Following annotation, `datasplit.py` divides the dataset into an **80% training** and **20% validation** split, randomly shuffling samples before placement into new directories. Because YOLOv11 requires a YAML configuration file, `datasplit.py` generates this file automatically for training via `rps_model.py`.

---
## **Model Training**  
 [View Training Code](modelTraining\rps_model.py)


The training pipeline uses a pretrained YOLOv11s model as its foundation. Starting from a pretrained backbone significantly accelerates convergence because the model already understands general visual patterns such as edges, shapes, and textures. This allows the network to focus on learning the distinctions between the three gesture classes, even with a limited dataset.

A dataset YAML file defines the training and validation directories as well as the class labels. During the training process, YOLOv11 automatically handles dataloading, batch creation, and optimization, while continuously monitoring performance metrics such as Loss, Precision, Recall, and mean Average Precision (mAP).

### **Data Augmentation Strategy**

To ensure that the model generalizes well to real-world camera conditions, a comprehensive augmentation pipeline is applied during training. These augmentations include:

- **HSV colour variation** to simulate different lighting environments  
- **Rotations, translations, scaling, and shearing** to vary the spatial relationships of the hands  
- **Perspective distortion** to represent off-angle camera viewpoints  
- **Horizontal and vertical flips** to account for mirrored gestures  
- **Mosaic augmentation** to expose the model to multiple contexts simultaneously  
- **MixUp blending** to regularize decision boundaries  

This augmentation strategy minimizes overfitting and enhances robustness against noise, motion, and inconsistent hand positioning. Furthermore, the use of mosic augmentation specifically allows the model to learn from images containing more than one hand which is important to consider given the task at hand and the limitations of the dataset.

### **Hyperparameter Configuration**

Training is performed for 80 epochs at a resolution of 640×640 with a batch size of 16. A cosine learning-rate scheduler is used to smooth the convergence behaviour over time, and early stopping is enabled to prevent unnecessary computation if validation performance plateaus. These hyperparameters offer a practical balance between training speed and detection accuracy.

### **Model Export and Integration**

Upon completion of training, YOLOv11 automatically exports the best checkpoint—based on validation results—into a dedicated project folder. This exported weight file becomes the inference model used in the real-time hand-detection system of the CPS. By integrating this optimized model into the perception pipeline, the system achieves:

- Fast and accurate gesture detection  
- Reliable multi-hand tracking  
- Robust performance in general indoor lighting  
- Smooth interaction with the downstream decision-making engine  

### **Significance to the CPS Project**

The YOLOv11 training pipeline is an essential component of the RPS-1 system. It transforms raw video input into symbolic gesture predictions, enabling the game decision engine to operate on high-confidence inputs. The pipeline demonstrates how modern computer-vision models can be tailored effectively to domain-specific tasks, even with mixed datasets and limited hardware resources.

Overall, this training process provides the foundation for a responsive, accurate, and highly adaptable gesture-recognition subsystem within the broader CPS architecture.

---


## **Model Performance on Static Dataset**

### **F1-Confidence Curve** *(Figure 1)*

<img src="rps_yolo11/F1_curve.png" width="450">

Figure 1 illustrates the F1-confidence curve for the YOLOv11 gesture classifier. The model stabilizes above an **F1 score of 0.95** between confidence intervals of **0.2 and 0.8**, showing excellent robustness. The optimal performance appears at a confidence threshold of **approximately 0.4**, where the model reaches an F1 score of **~0.98**.

The **rock** gesture consistently shows the lowest F1 score, though still high enough to avoid significant gameplay disruption.

---

### **Precision-Recall Curve** *(Figure 2)*

<img src="rps_yolo11/PR_curve.png" width="450">

Figure 2 presents the Precision–Recall curve demonstrating the model's reliability across the full recall spectrum. The model achieves **mAP = 0.984 at IoU = 0.5**, highlighting excellent predictive performance. Paper and scissors gestures achieve AP values of **0.993** and **0.991**, respectively. Although rock underperforms relative to these two, its AP of **0.967** demonstrates strong detectability.

---

## **Confusion Matrices** *(Figures 3 and 4)*

### **Figure 3 — Raw Confusion Matrix**

<img src="rps_yolo11/confusion_matrix.png" width="450">

### **Figure 4 — Normalized Confusion Matrix**

<img src="rps_yolo11/confusion_matrix_normalized.png" width="450">

Figures 3 and 4 further validate classification performance. The raw confusion matrix reveals that the **rock** gesture experiences the highest misclassification count (~40 errors). Conversely, **paper** and **scissors** demonstrate far fewer errors (~10–15).

Normalized results indicate:

- **Paper** accuracy: **1.00**
- **Scissors** accuracy: **0.99**
- **Rock** accuracy: **0.97**

These high values confirm the consistency and reliability of the YOLOv11 detector.


***
---

<a id="live-usage"></a>
<div style="text-align:center; font-size:45px; font-weight:bold; padding:4px;">
    System Integration – Live Usage and Testing
</div>


---
[View Computer Vision Script](RPSCamera.py)\
[View Game Logic](RPSLogic.py)

During live operation, the CPS combines real-time gesture detection with the decision-making engine to evaluate game outcomes. In most tested scenarios, the system successfully identified gestures and correctly determined winners or draws, demonstrating robust performance under typical gameplay conditions.

However, it was observed that two factors influenced the model’s detection accuracy:

Video background complexity
High-contrast or cluttered backgrounds occasionally reduced model confidence or resulted in misclassifications.

Hand angle and orientation
Gestures presented at steep angles, partially occluded, or rotated relative to the camera frame were more likely to be misinterpreted—especially during transitional movement.

Despite these temporary inconsistencies, once the hands stabilized and were fully visible, the system labeled gestures correctly and produced accurate game outcomes. This highlights both the robustness of the perception pipeline and the importance of maintaining a clear background, controlled lighting, and consistent hand orientation for optimal real-time performance. The model could be improved further by using a larger dataset containing a larger variety of images with cluttered backgrounds, varying skin tones, and images with hands at several differant distances.

<img src="Images\2Hands1.png" width="450">
<img src="Images\2HandsWin.png" width="450">
<img src="Images\4Hands1.png" width="450">
<img src="Images\4Hands2.png" width="450">

<br><br><br>

***
---

<a id="broader-impacts-and-ethical-considerations"></a>
<div style="text-align:center; font-size:45px; font-weight:bold; padding:4px;">
    Broader Impacts & Ethics
</div>


---

Although this project may initially appear as a playful gesture-based game, its underlying technologies—computer vision, gesture recognition, real-time decision systems, and human–machine interaction—have broad implications extending well beyond the Rock–Paper–Scissors context.

Gesture-recognition interfaces can transform how users interact with machines by enabling **hands-free**, **intuitive**, and **accessible** control mechanisms. This has applications in:

- Operating rooms (sterile, hands-free environments)  
- Assistive technologies for individuals with limited mobility  
- Robotics and industrial automation  
- Retail systems (touchless kiosks)  
- Educational tools leveraging physical interaction  

By relying on **low-cost webcam hardware** and **open-source ML frameworks**, this project demonstrates that advanced computer vision systems can be developed without expensive infrastructure, lowering adoption barriers for small companies and research groups.

### **Economic and Ethical Considerations**

Widespread adoption of AI-driven, gesture-based systems introduces economic and ethical challenges:

- **Workforce displacement** due to automation  
- The need for **upskilling** to support AI systems  
- **Privacy concerns** regarding image capture  
- **Bias risks** if training datasets are not diverse  
- The need for transparent, explainable AI in public-use interfaces  

Addressing these challenges is essential for responsible implementation and sustaining public trust in vision-based AI systems.
***
---

<a id="conclusion"></a>
<div style="text-align:center; font-size:45px; font-weight:bold; padding:4px;">
    Conclusion
</div>

---


This project demonstrates the integration of computer vision, machine learning, and game-theoretic analysis into a unified real-time decision-making system. The system architecture emphasizes maintainability, reproducibility, and flexibility, ensuring that future extensions or modifications can be incorporated with minimal disruption. A fine-tuned YOLOv11 model provides reliable gesture detection, while the loss-reduction strategy engine enables informed and interpretable decision support during gameplay.

Beyond the context of Rock-Paper-Scissors, this work illustrates how CPS interfaces can enhance accessibility, support touch-free user interaction, and assist intelligent decision-making workflows. Ultimately, the project highlights how principled computational design—combined with affordable sensing technologies—can contribute to resilient, real-time CPS solutions applicable far beyond entertainment-oriented applications.




---
***