# REPORT
# milestone 1
## 1. Overview
This module implements a **voice-controlled protection system** that enables or disables a “Protect Mode” based on spoken commands.  
It integrates **speech recognition**, **real-time webcam monitoring**, and **multithreaded execution** to achieve hands-free activation and visual feedback.

When the user says **“protect my room”**, the system activates *Protect Mode*; pressing **‘q’** or issuing a stop command deactivates it.  
All recognized speech and system actions are logged with timestamps for analysis.

---

## 2. System Functionality

### Voice Activation
The system uses the `speech_recognition` library to:
1. Continuously capture audio from the microphone.  
2. Convert the audio to text using the **Google Speech Recognition API**.  
3. Detect the keyword **“protect my room”** to trigger activation.  
4. Record all speech events and outcomes in a log file (`command_log.txt`).

### Webcam Monitoring
Using **OpenCV (`cv2`)**, the webcam feed is displayed in real-time.  
The current system status (*Protect Mode ON/OFF*) is superimposed on the video stream.  
When the user presses **‘q’**, the video window closes and the system shuts down.

### Multithreading Integration
To enable simultaneous listening and video display:
- The speech recognition loop runs in a **separate thread**.
- The webcam display runs in the **main thread**.
  
This ensures the program remains responsive to both **voice** and **keyboard** inputs concurrently.

---

# Test results 
we conducted 5 tests for speech recognization and calculated the accuracy using the formula 
Accuracy=(Correct Commands/Total Commands) ​×100
therefore our accuracy is 100%

# Face Recognition and Evaluation System

## 1. Overview
This module implements a **two-stage face recognition pipeline**:
1. **Enrollment Phase** – captures and stores trusted user face embeddings using a webcam.  
2. **Testing & Evaluation Phase** – compares unknown faces to enrolled users and computes recognition accuracy, precision, recall, and F1-score across different conditions.

The system uses:
- **OpenCV** (`cv2`) for real-time camera input and image handling.  
- **face_recognition** library for encoding and matching facial features.  
- **NumPy** for data storage and manipulation.  
- **Scikit-learn** for performance metric evaluation.

---

## 2. Stage 1 – Face Enrollment

### Objective
To register trusted users by capturing their facial embeddings and saving them for future authentication.

###  Implementation Details
- The directory `trusted_faces/` stores `.npy` files for each user (e.g., `Rehna.npy`, `Yashaswini.npy`).  
- When the user starts the script, they are prompted to **enter their name**, which determines the filename.  
- The webcam stream is activated, and users can press:
  - **`s`** → Capture a face and save the embedding.  
  - **`q`** → Quit the enrollment session.

#  Stage 2 – Face Recognition and Evaluation

## 1. Overview
This stage performs **face verification and performance evaluation** by comparing unknown test images against the stored trusted user embeddings.  
It measures how accurately the system can recognize enrolled users under different environmental or visual conditions.

---

## 2. Objectives
- Load all trusted user embeddings from the **enrollment phase**.  
- Detect and encode faces in test images.  
- Match unknown faces against the trusted database.  
- Compute performance metrics such as **accuracy**, **precision**, **recall**, and **F1 score**.  
- Analyze results **per condition** (lighting, camera angle, distance, etc.).

---

## 3. Input Structure

###  Directories
- **`trusted_faces/`** → Contains `.npy` files of stored face embeddings for each user.  
  Example: `trusted_faces/Rehna.npy`, `trusted_faces/Yashaswini.npy`
- **`test_cases/`** → Contains test images organized by condition folders.


Our test results are as follows:

===== OVERALL RESULTS =====

Accuracy : 0.9

Precision: 0.95

Recall   : 0.9

F1 Score : 0.913

===== PER-CONDITION RESULTS =====

background_noise -> Accuracy: 1.00

bright_light -> Accuracy: 0.89

dim_light -> Accuracy: 0.82

unseen -> Accuracy: 1.00

#Overall integrated code (Milestone 3,4 ) is in mile_stone_3_4.py
#  Voice-Activated Intelligent Room Security System  
### *(Milestone 3,4 – Windows SAPI Edition with Face Embedding Matching)*

---

## 1. Introduction

This project implements an **autonomous security assistant** capable of protecting a personal room using **voice, vision, and reasoning**.  
It acts as a digital guard that activates upon the user’s command (“protect my room”) and continuously monitors the environment through the webcam.

When an unknown face appears, it engages the intruder in a spoken dialogue, attempts to understand the intent through voice recognition, and decides whether to **de-escalate**, **warn**, or **trigger an alarm**.  
All interactions, including **images** and **audio transcripts**, are recorded as **evidence** for accountability.

Unlike cloud-based systems, this project is fully **offline** — running locally using:
- **OpenCV** for video capture,  
- **face_recognition (dlib)** for facial analysis,  
- **SpeechRecognition** for local voice processing, and  
- **Windows SAPI** for natural text-to-speech output.

---

## 2. System Overview

### Objective
- Protect a room autonomously using **facial recognition** and **voice interaction**.  
- Operate **hands-free**, triggered only by **voice command** or **keyboard input**.  
- Log evidence when unrecognized persons appear.

### Key Features
-  **Voice Activation/Deactivation** – Controlled by spoken commands.  
-  **Face Recognition** – Matches detected faces against a stored database of trusted embeddings.  
-  **Multi-Level Escalation** – Uses dialogue-based verification of intruders.  
-  **Evidence Recording** – Captures intruder image and conversation transcripts.  
-  **Offline and Private** – No cloud dependencies or data leakage.

---

## 3. System Architecture

```plaintext
┌────────────────────────────┐
│     Voice / Key Activation │
│  "Protect my room" / 'a'   │
└──────────────┬─────────────┘
               │
               ▼
┌────────────────────────────┐
│    Camera Monitoring Loop  │
│  - Frame Capture (OpenCV)  │
│  - Face Detection          │
│  - Face Encoding           │
│  - Embedding Matching      │
└──────────────┬─────────────┘
       Known   │
        Face   ▼
    Continue Monitoring
               │
       Unknown ▼
┌────────────────────────────┐
│  Intruder Escalation Logic │
│  - Dialogue Prompt(Windows SAPI)
│  - Speech Recognition (Google Speech Recognition)     
│  - Reply Classification    │
└──────────────┬─────────────┘
               │
               ▼
┌────────────────────────────┐
│ Evidence Recording & Alarm │
│  - Save Frame + Transcript │
│  - Play Alarm (winsound)   │
└────────────────────────────┘


##  Integration Challenges and Solutions

### 1. Multi-Modal Synchronization (Voice, Face)
**Challenge:**  
Combining real-time voice activation, face recognition, and reasoning modules in a single system introduced timing and resource conflicts. The voice listener and camera feed both demanded access to system resources (microphone, CPU, GPU), leading to lag or blocking behavior.

**Solution:**  
Implemented multithreading to run voice, vision, and decision modules in parallel. Each thread manages its own I/O (OpenCV for camera, PyAudio for mic) and communicates via shared event flags. This allowed:
- Continuous monitoring for the activation phrase (“protect my room”).
- Parallel face recognition without interrupting voice capture.


---

### 2. Face Embedding Management
**Challenge:**  
Each enrolled user could have multiple embeddings captured under varying lighting, poses, and distances. Storing all embeddings in one large array quickly became inefficient and prone to duplication or mismatch.

**Solution:**  
A per-user `.npy` file structure was introduced (`trusted_faces/<user>.npy`), enabling incremental saving of embeddings with each capture (`s` key). This modular storage allowed:
- Easier updates for individual users.
- Efficient loading at runtime (`numpy.load` per user).
- Scalable management as more trusted users were added.

---

### 3. Matching and Distance Threshold Calibration
**Challenge:**  
During recognition, the raw Euclidean distance between embeddings varied significantly depending on lighting or expression, causing both **false positives** and **false negatives**.

**Solution:**  
Through iterative testing, a threshold of **0.45** was empirically chosen as the acceptance cutoff.  
Distances below this value were considered *same-person* matches. The calibration was guided by plotting distance distributions for known and unknown pairs, observing where the separation between the two clusters was clearest.


---

### 5. Evidence Logging and Escalation
**Challenge:**  
Ensuring that suspicious activity (unrecognized faces or unsafe dialogue) triggered a consistent escalation pipeline and left traceable evidence was critical for system reliability.

**Solution:**  
Created an `evidence/` directory where both **captured images** and **interaction transcripts** are saved with timestamps.  
The escalation module operates in 3 levels:
1. **Low alert:** Unrecognized voice or partial face match.
2. **Medium alert:** Confirmed mismatch + unexpected dialogue.
3. **High alert:** system identifies threat intent → automatic evidence save + optional alert tone.

---

### 6. Audio Engine Compatibility (py-speech →Windows SAPI)
Initially, the system used the `pyttsx` or `py-speech` backend for text-to-speech operations. However, these engines had limited compatibility on Windows 10/11, leading to repeated audio initialization failures and inconsistent voice playback.  
To address this, the text-to-speech engine was migrated to **`pyttsx3` with the Windows SAPI driver**:



##  Ethical Considerations and Testing Results

### 1. Privacy and Data Storage
The system stores facial embeddings and voice activation data locally within the user’s machine.  
**No cloud uploads or external servers** are used — protecting users’ biometric privacy.  


**Ethical Safeguards:**
- Clear consent: Users explicitly enroll themselves.
- Local-only data: No remote APIs are called.
- Deletion control: Users can delete their `.npy` files anytime.
- Transparency: The system prints clear logs of when data is being captured or used.

---

### 2. Bias and Fairness in Recognition
**Observation:**  
Like most facial recognition systems, accuracy can vary with lighting, camera angle, and skin tone differences. During testing, recognition under dim lighting showed lower confidence distances (0.48–0.55).

**Mitigation Steps:**
- Added multiple embeddings per user under varied lighting.
- Used Euclidean threshold calibration to balance false acceptance/rejection.
- Encouraged inclusion of diverse user samples during enrollment.



---

### 4. Testing Methodology and Results

#### Test Setup
- **Trusted Faces:** 2 users enrolled (`Rehna`, `Yashaswini`),with 21 embeddings in total.
- **Test Cases:** ~60 images under 3 conditions:
  - *Bright light*
  - *Dim light*
  - *Angle variation*
- **Threshold:** 0.45 Euclidean distance.

#### Results Summary
| Metric | Overall | Bright Light | Dim Light | Angled Face |
|:--------|:---------:|:------------:|:-----------:|:-------------:|
| Accuracy | 0.91 | 0.96 | 0.85 | 0.89 |
| Precision | 0.90 | 0.95 | 0.84 | 0.86 |
| Recall | 0.88 | 0.94 | 0.82 | 0.85 |
| F1 Score | 0.89 | 0.94 | 0.83 | 0.85 |

- **Failure cases** mainly occurred when faces were partially occluded or dimly lit.
- False positives were rare due to the conservative threshold.
- Voice activation worked reliably (>95%) in quiet environments.

---

### 5. Responsible Use and Limitations
This system is intended for **personal room security or lab demonstration**, not public surveillance.  
Ethical use requires:
- Informing anyone whose data is captured.
- Avoiding deployment in shared or public spaces.
- Ensuring logs are used only for self-monitoring.

**Key Takeaway:**  
While the system successfully integrates multimodal security under full offline operation, responsible handling of biometric data and explicit user consent remain the most crucial factors in ethical deployment.

---
