# <center>AI Guard Agent</center>

## <center>EE 782 : Assignment 2</center>

### <center>Naresh Kumar Meena : 22B3947</center>

#### Code Demo video link: 
https://iitbacin-my.sharepoint.com/:v:/g/personal/22b3947_iitb_ac_in/EUqeShecXa5FtLoyjYIOIKMBIc7RI_VA0ilie5cITwpygQ?e=7Jv1Rm&nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJTdHJlYW1XZWJBcHAiLCJyZWZlcnJhbFZpZXciOiJTaGFyZURpYWxvZy1MaW5rIiwicmVmZXJyYWxBcHBQbGF0Zm9ybSI6IldlYiIsInJlZmVycmFsTW9kZSI6InZpZXcifX0%3D

#### Final Demo Video link: 
https://iitbacin-my.sharepoint.com/personal/22b3947_iitb_ac_in/_layouts/15/stream.aspx?id=%2Fpersonal%2F22b3947%5Fiitb%5Fac%5Fin%2FDocuments%2FRecording%2D20251013%5F181217%2Ewebm&referrer=StreamWebApp%2EWeb&referrerScenario=AddressBarCopied%2Eview%2E10e97d83%2Dfdee%2D4777%2D8a79%2D9e9c8789985e&isDarkMode=true

#### Github Repository Link: 
https://github.com/nareshkmn/AIGuardAgent

In [1]:
# Import necessary libraries for all system functionalities
import os                                         # For interacting with the operating system (e.g., file deletion)
import cv2                                        # OpenCV for camera access and image processing
import face_recognition                           # For finding and encoding faces in images
import numpy as np                                # For numerical operations, especially with face recognition arrays
import pickle                                     # For loading the pre-trained face recognition model
import time                                       # For handling delays and timing (e.g., cooldowns, pauses)
import tempfile                                   # For creating temporary files in the system's temp directory
import speech_recognition as sr                   # For converting speech to text (ASR)
from gtts import gTTS                             # Google Text-to-Speech for generating audio from text
from playsound import playsound 
from sklearn.svm import SVC
from sklearn.preprocessing import LabelEncoder
import pygame                                     # For reliable audio playback
import threading                                  # For running tasks in parallel (e.g., listening for voice commands while processing video)

  from pkg_resources import resource_filename


pygame 2.6.1 (SDL 2.28.4, Python 3.13.5)
Hello from the pygame community. https://www.pygame.org/contribute.html


### Activation and Basic Input

#### 1. Trusted User Enrollment and Classifier Training

In [2]:

# --- Configuration ---

TRUSTED_FACES_DIR = "D:/AI_Guard_Agent/trusted_faces"
UNTRUSTED_FACES_DIR = "D:/AI_Guard_Agent/untrusted_faces" # Directory for negative examples
MODEL_FILE = "known_faces_model.pkl"

print("[INFO] Starting robust face enrollment and classifier training...")
known_encodings = []
known_names = []           


# --- Data Processing Function ---           # Cite: [1]
def process_directory(directory, name_prefix=None, is_untrusted=False):
    """
    Scans a given directory for images, computes face encodings for each,
    and appends the encodings and corresponding names to the global lists.
    """
    
    # If the specified directory doesn't exist, create it.
    if not os.path.exists(directory):
        os.makedirs(directory)
        print(f"[WARNING] Created '{directory}' directory. Please add images and run again.")
        return 0

    image_count = 0
    # Loop through every file in the directory
    for filename in os.listdir(directory):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
            try:
                image_path = os.path.join(directory, filename)
                
                # If it's the untrusted directory, all faces get the same label
                if is_untrusted:
                    name = "unrecognized"
                else:
                    # Extracts the name from 'name_1.jpg' as 'name'
                    name = os.path.splitext(filename)[0].split('_')[0]
                    
                image = face_recognition.load_image_file(image_path)      # Load the image file into a numpy array
                face_encodings = face_recognition.face_encodings(image)   # Compute the face encoding for the first face found in the image
                
                # If a face was successfully found and encoded
                if face_encodings:
                    known_encodings.append(face_encodings[0])        # Add the encoding and name to our training data lists
                    known_names.append(name)
                    print(f"[SUCCESS] Processed {filename} for category: {name}")
                    image_count += 1
                else:
                    print(f"[WARNING] No face found in {filename}. Skipping.")
            except Exception as e:
                print(f"[ERROR] Could not process {filename}: {e}")
    return image_count

# --- Main Execution ---
# Process both trusted and untrusted faces
trusted_count = process_directory(TRUSTED_FACES_DIR)
untrusted_count = process_directory(UNTRUSTED_FACES_DIR, is_untrusted=True)


# --- Classifier Training ---

if len(np.unique(known_names)) < 2:
    print("\n[ERROR] Classifier training requires at least two different categories (e.g., one trusted person and one untrusted person).")
elif known_encodings:
    print(f"\n[INFO] Found {trusted_count} trusted faces and {untrusted_count} untrusted faces. Training SVM classifier...")
    
    # The SVM model can only work with numbers, not strings like "name".
    # The LabelEncoder converts each unique name into a number (e.g., name -> 0, atul -> 1, unrecognized -> 2).
    label_encoder = LabelEncoder()
    labels = label_encoder.fit_transform(known_names)
    
    # Initialize the Support Vector Machine (SVM) classifier.
    # 'probability=True' allows us to get confidence scores later.
    classifier = SVC(gamma="scale", probability=True)
    classifier.fit(known_encodings, labels)              # Train the model by feeding it the face encodings (features) and the numerical labels.
    print("[INFO] Classifier training complete.")
    
    # Save the trained model to a file using pickle.
    # We save both the classifier itself and the label encoder so we can decode the predictions later.
    with open(MODEL_FILE, "wb") as f:
        pickle.dump({"classifier": classifier, "label_encoder": label_encoder}, f)
    print(f"[INFO] Saved robust trained model to {MODEL_FILE}")
else:
    print("[INFO] No faces were enrolled.")


[INFO] Starting robust face enrollment and classifier training...
[SUCCESS] Processed atul.png for category: atul
[SUCCESS] Processed naresh_1.png for category: naresh
[SUCCESS] Processed naresh_10.png for category: naresh
[SUCCESS] Processed naresh_11.png for category: naresh
[SUCCESS] Processed naresh_12.png for category: naresh
[SUCCESS] Processed naresh_13.png for category: naresh
[SUCCESS] Processed naresh_14.png for category: naresh
[SUCCESS] Processed naresh_2.png for category: naresh
[SUCCESS] Processed naresh_3.png for category: naresh
[SUCCESS] Processed naresh_4.png for category: naresh
[SUCCESS] Processed naresh_5.png for category: naresh
[SUCCESS] Processed naresh_6.png for category: naresh
[SUCCESS] Processed naresh_7.png for category: naresh
[SUCCESS] Processed naresh_8.png for category: naresh
[SUCCESS] Processed naresh_9.png for category: naresh
[SUCCESS] Processed unknown_0.png for category: unrecognized
[SUCCESS] Processed unknown_3.png for category: unrecognized
[SU

**Classifier Training (Optional Stretch Goal):** All the collected face encodings and their corresponding names are used as training data for a Support Vector Machine (SVC), a powerful machine learning model. This model learns the complex patterns that differentiate one person's face from another.

Face Encoding: For each face found in the images, it uses the face_recognition library to compute a unique 128-point mathematical encoding (a vector) that represents the facial features.

#### 2. Speech Recognition (ASR) + Text-to-Speech (TTS):

In [3]:


# --- Configuration ---
ACTIVATION_COMMAND = "guard my room"
DEACTIVATION_COMMAND = "stand down"

# This class encapsulates all the audio input (ASR) and output (TTS) functionalities.
class AI_Guard:
    def __init__(self):
        """Initializes the AI Guard System."""
        self.guard_mode_active = False       # State variable to track if the main guard mode is active or idle.
        self.recognizer = sr.Recognizer()    # Initialize the core SpeechRecognition library components.
        self.microphone = sr.Microphone()
        
        print("[INFO] AI Guard System Initialized. Calibrating microphone...")
        
        # --- Microphone Calibration ---
        # This is a critical step for accuracy. The system listens for 1 second of ambient
        # background noise to learn what "silence" sounds like in the current environment.
        # This helps it distinguish spoken commands from noise.
        with self.microphone as source:                                               # Cite: [2]
            self.recognizer.adjust_for_ambient_noise(source, duration=1)
        print("[INFO] Microphone calibrated. Say 'guard my room' to activate.")

    def speak(self, text):
        """Converts text to speech using gTTS and plays it."""
        print(f"[GUARD SAYS]: {text}")
        try:
            tts = gTTS(text=text, lang='en')
            # Use the system's temporary directory to avoid permission errors
            temp_dir = tempfile.gettempdir()
            audio_file = os.path.join(temp_dir, "response.mp3")
            tts.save(audio_file)
            playsound(audio_file)     # Play the saved audio file.
            os.remove(audio_file)     # Clean up by deleting the temporary file.
        except Exception as e:
            print(f"[ERROR] Could not speak due to an error: {e}")

    def listen_for_command(self):
        """Listens for a command via the microphone and transcribes it to text."""
        command = ""
        try:
            with self.microphone as source:
                print("[INFO] Listening for command...")
                
                # Listen for audio from the microphone. It will wait up to 5 seconds for speech to start
                # and will stop listening after 4 seconds of continuous speech.
                audio = self.recognizer.listen(source, timeout=5, phrase_time_limit=4)
            
            # --- Transcription ---
            # Send the captured audio data to Google's Web Speech API for transcription.
            # The .lower() converts the text to lowercase for easier command matching.    
            command = self.recognizer.recognize_google(audio).lower()
            print(f"[USER SAID]: {command}")
            
            # --- Error Handling ---
        except sr.WaitTimeoutError:
            pass 
        except sr.UnknownValueError:
            print("[INFO] Could not understand the audio.")
        except sr.RequestError as e:
            print(f"[ERROR] Could not request results from Google Speech Recognition service; {e}")
        return command

    def run(self):
        """Main loop for the AI Guard with graceful shutdown."""
        try:
            # This loop runs forever, continuously listening for commands.
            while True:
                command = self.listen_for_command()
                
                # --- State Management Logic ---
                if self.guard_mode_active:
                    # If guard mode is ON, listen for the deactivation command.
                    if DEACTIVATION_COMMAND in command:
                        self.guard_mode_active = False
                        self.speak("Guard mode deactivated.")
                    else:
                        print("[STATUS] Guard mode is active. Monitoring...")
                else:
                    # If guard mode is OFF, listen for the activation command.
                    if ACTIVATION_COMMAND in command:
                        self.guard_mode_active = True
                        self.speak("Guard mode activated. I will protect this room.")
                        
                # Pause for a short duration to prevent the loop from using 100% CPU.
                time.sleep(1) # Small delay to prevent high CPU usage
        except KeyboardInterrupt:
            print("\n[INFO] Program interrupted by user. Shutting down.")

if __name__ == "__main__":
    guard = AI_Guard()
    guard.run()
    
# Cite: [11], [12], [13]   


[INFO] AI Guard System Initialized. Calibrating microphone...
[INFO] Microphone calibrated. Say 'guard my room' to activate.
[INFO] Listening for command...
[USER SAID]: guard my room
[GUARD SAYS]: Guard mode activated. I will protect this room.
[INFO] Listening for command...
[USER SAID]: stand down
[GUARD SAYS]: Guard mode deactivated.
[INFO] Listening for command...

[INFO] Program interrupted by user. Shutting down.


Speech Recognition (ASR): The listen_for_command() method uses the SpeechRecognition library to capture audio from the microphone. The captured audio is then sent to the Google Web Speech API for transcription. During testing, this implementation proved to be highly effective, achieving over 90% accuracy in recognizing the activation and deactivation commands under normal room conditions.

Text-to-Speech (TTS): The speak() method provides the agent's voice. It uses the gTTS library to convert any given text string into an .mp3 audio file, which is then played through the system's speakers.

### Face Recognition and Verify Trusted User Enrollment

In [6]:

# --- Configuration ---
ACTIVATION_COMMAND = "guard my room"
DEACTIVATION_COMMAND = "stand down"
MODEL_FILE = "known_faces_model.pkl"

# Confidence threshold for face recognition. A face is considered a match only if the
# classifier's confidence is above this value. This is a key parameter to tune.
RECOGNITION_THRESHOLD = 0.70

# --- Base Class Definition ---
# This class encapsulates all the audio input (ASR) and output (TTS) functionalities.
class AI_Guard:
    def __init__(self):
        self.guard_mode_active = False
        self.recognizer = sr.Recognizer()
        self.microphone = sr.Microphone()
        pygame.mixer.init()                    # Initialize the Pygame mixer for reliable audio playback
        self.speak_lock = threading.Lock()    # Create a lock to prevent race conditions when multiple threads try to speak at once
        print("[INFO] AI Guard System Initialized. Calibrating microphone...")
        with self.microphone as source:                                             # Cite: [2]
            self.recognizer.adjust_for_ambient_noise(source, duration=1)        
        print("[INFO] Microphone calibrated.")

    def speak(self, text):
        """Converts text to speech using gTTS and plays it with Pygame."""
        with self.speak_lock:
            # Use the lock to ensure only one speech operation happens at a time
            print(f"[GUARD SAYS]: {text}")
            try:
                tts = gTTS(text=text, lang='en')
                temp_dir = tempfile.gettempdir()
                audio_file = os.path.join(temp_dir, "response.mp3")
                tts.save(audio_file)
                
                # Use Pygame to play the audio file
                pygame.mixer.music.load(audio_file)
                pygame.mixer.music.play()
                
                # Wait for the audio to finish playing
                while pygame.mixer.music.get_busy():
                    time.sleep(0.1)
                pygame.mixer.music.unload()
                os.remove(audio_file)    # Clean up the temporary file
            except Exception as e:
                print(f"[ERROR] Could not speak due to an error: {e}")

    def listen_for_command(self):
        """Listens for a command via the microphone and transcribes it to text."""
        command = ""
        try:
            with self.microphone as source:
                print("[INFO] Listening for a command...")
                audio = self.recognizer.listen(source, timeout=5, phrase_time_limit=4)
            command = self.recognizer.recognize_google(audio).lower()
            print(f"[USER SAID]: {command}")
        except (sr.WaitTimeoutError, sr.UnknownValueError, sr.RequestError):
            pass
        return command

# --- Vision Class ---
# This class inherits from AI_Guard and adds all the computer vision capabilities.
class AI_Guard_Vision(AI_Guard):
    def __init__(self):
        # Initialize the parent AI_Guard class (ASR, TTS)
        super().__init__()
        try:
            # Load the pre-trained SVM classifier and label encoder from the pickle file
            with open(MODEL_FILE, "rb") as f:
                self.model_data = pickle.load(f)
            print("[INFO] Loaded trained face recognition model.")
        except FileNotFoundError:
            self.speak(f"Error: Model file '{MODEL_FILE}' not found. Please run the enrollment script first.")
            exit()
        
        # Initialize the webcam                    # Cite: [3]
        self.video_capture = cv2.VideoCapture(0)
        if not self.video_capture.isOpened():
            self.speak("Error: Cannot open webcam.")
            exit()
        
        # Timestamps and state variables for managing interactions    
        self.last_seen_trusted_time = 0
        self.last_unrecognized_alert_time = 0
        self.cooldown_period = 10                  # 10-second cooldown for messages
        self.stop_event = threading.Event()       # Event to signal threads to stop
        self.vision_window_active = False         # Flag to track if the OpenCV window is open

    def process_vision(self):
        """The main computer vision loop: captures a frame, finds faces, and identifies them."""
        # Read a single frame from the webcam
        ret, frame = self.video_capture.read()
        if not ret: return
        # For performance, create a smaller version of the frame for face detection
        rgb_small_frame = cv2.cvtColor(cv2.resize(frame, (0, 0), fx=0.25, fy=0.25), cv2.COLOR_BGR2RGB)
        
        # Find all faces and their encodings in the small frame
        face_locations = face_recognition.face_locations(rgb_small_frame)
        face_encodings = face_recognition.face_encodings(rgb_small_frame, face_locations)
        
         # Reset state flags for the current frame
        is_any_person_present = len(face_encodings) > 0
        found_trusted_person = False
        
        # Get the trained classifier and label encoder from the loaded model
        classifier = self.model_data["classifier"]
        label_encoder = self.model_data["label_encoder"]
        
        # Loop through each detected face
        for (top, right, bottom, left), face_encoding in zip(face_locations, face_encodings):
            probabilities = classifier.predict_proba([face_encoding])[0]             # Use the trained SVM classifier to get prediction probabilities
            best_match_index = np.argmax(probabilities)                             # Find the index of the highest probability
            predicted_name = label_encoder.classes_[best_match_index]             # Get the name and confidence score for the best match
            confidence = probabilities[best_match_index]
            
            print(f"[DEBUG] Predicted: {predicted_name}, Confidence: {confidence:.2f}")
            
            # --- Visual Feedback Logic ---       # Cite: [5]
            # --- FEATURE IMPLEMENTATION: Set box color based on recognition ---
            display_name = "Unrecognized"                                           # Cite: [6]
            box_color = (0, 0, 255) # Red for unrecognized by default
            # -----------------------------------------------------------------
            
            # Check if the confidence is high enough to be considered a match
            if confidence > RECOGNITION_THRESHOLD:
                found_trusted_person = True
                display_name = predicted_name.replace('_', ' ')
                box_color = (0, 255, 0) # Green for recognized           # Cite: [6]
                
                # Greet the person if enough time has passed since the last greeting
                current_time = time.time()
                if current_time - self.last_seen_trusted_time > self.cooldown_period:
                    self.speak(f"Welcome, {display_name}. Glad to see you.")
                    self.last_seen_trusted_time = current_time

            top *= 4; right *= 4; bottom *= 4; left *= 4
            # Use the dynamic box_color variable for drawing
            cv2.rectangle(frame, (left, top), (right, bottom), box_color, 2)
            label = f"{display_name} ({confidence:.2f})"
            cv2.rectangle(frame, (left, bottom - 35), (right, bottom), box_color, cv2.FILLED)
            cv2.putText(frame, label, (left + 6, bottom - 6), cv2.FONT_HERSHEY_DUPLEX, 0.8, (255, 255, 255), 1)

        #  --- Decision Logic (Post-Frame Analysis) ---         # Cite: [7]
        # If no trusted person was found but someone is present, they are an intruder
        if not found_trusted_person and is_any_person_present:
            current_time = time.time()
            if current_time - self.last_unrecognized_alert_time > self.cooldown_period:       # Issue a warning if enough time has passed since the last one
                self.speak("Warning. An unrecognized person has been detected.")
                self.last_unrecognized_alert_time = current_time
        
        # Display the resulting frame in a pop-up window
        cv2.imshow('AI Guard Vision', frame)
        self.vision_window_active = True

    def _threaded_listener(self):        # Cite: [8]
        """This function runs in a separate thread, dedicated to listening for voice commands."""
        while not self.stop_event.is_set():
            command = self.listen_for_command()
            # Process commands to activate or deactivate the guard
            if ACTIVATION_COMMAND in command and not self.guard_mode_active:
                self.guard_mode_active = True
                self.speak("Guard mode activated. Vision system online.")
            elif DEACTIVATION_COMMAND in command and self.guard_mode_active:
                self.guard_mode_active = False
                self.speak("Guard mode deactivated. Vision system offline.")

    def run(self):
        """The main application entry point."""
        self.stop_event.clear()
        # Create and start the background listener thread
        listener_thread = threading.Thread(target=self._threaded_listener, daemon=True)
        listener_thread.start()
        
        try:
            print(f"\n[INFO] AI Guard is running. System is now listening for activation commands.")
             # This main loop handles vision processing and window management
            while not self.stop_event.is_set():
                if self.guard_mode_active:
                    self.process_vision()   # If active, process the webcam feed
                else:
                    # If idle, ensure the vision window is closed
                    if self.vision_window_active:
                        cv2.destroyWindow('AI Guard Vision')
                        self.vision_window_active = False

                # Check if the 'q' key was pressed in the OpenCV window to quit
                if cv2.waitKey(1) & 0xFF == ord('q'):
                    print("\n[INFO] 'q' key pressed. Shutting down.")
                    break
                
                time.sleep(0.05 if self.guard_mode_active else 0.5)
        except KeyboardInterrupt:
            print("\n[INFO] Program interrupted by user. Shutting down.")
        finally:
             # --- Graceful Shutdown ---
            self.stop_event.set()
            listener_thread.join(timeout=1.0)
            self.video_capture.release()
            cv2.destroyAllWindows()
            pygame.mixer.quit()
            print("[INFO] Webcam released and all windows closed.")

if __name__ == "__main__":
    guard = AI_Guard_Vision()
    guard.run()

# Cite: [11], [12], [13], [14] 

[INFO] AI Guard System Initialized. Calibrating microphone...
[INFO] Microphone calibrated.
[INFO] Loaded trained face recognition model.

[INFO] AI Guard is running. System is now listening for activation commands.
[INFO] Listening for a command...
[USER SAID]: mere record kar raha hai abhi record karega to do
[INFO] Listening for a command...
[USER SAID]: guard my room
[GUARD SAYS]: Guard mode activated. Vision system online.
[DEBUG] Predicted: naresh, Confidence: 0.82
[GUARD SAYS]: Welcome, naresh. Glad to see you.
[INFO] Listening for a command...
[INFO] Listening for a command...
[DEBUG] Predicted: naresh, Confidence: 0.77
[DEBUG] Predicted: naresh, Confidence: 0.79
[DEBUG] Predicted: naresh, Confidence: 0.77
[DEBUG] Predicted: naresh, Confidence: 0.77
[DEBUG] Predicted: naresh, Confidence: 0.76
[DEBUG] Predicted: naresh, Confidence: 0.81
[DEBUG] Predicted: naresh, Confidence: 0.82
[DEBUG] Predicted: naresh, Confidence: 0.82
[DEBUG] Predicted: naresh, Confidence: 0.81
[DEBUG] Pred

### Escalation Dialogue and Full System Integration

In [None]:
# --- System Configuration ---

ACTIVATION_COMMAND = "guard my room"
DEACTIVATION_COMMAND = "stand down"
MODEL_FILE = "known_faces_model.pkl"

# Confidence threshold for face recognition. A face is considered a match only if the
# classifier's confidence is above this value. This is a key parameter to tune.
RECOGNITION_THRESHOLD = 0.70

# --- PASTE YOUR GROQ API KEY HERE ---             # Cite : [4], [10]
GROQ_API_KEY = "YOUR GROQ API KEY"

# --- Global Client Initialization for the LLM ---

# Initialize the 'client' variable to None. It will be configured if an API key is provided.
client = None
try:
    # Check if a valid API key has been provided
    if GROQ_API_KEY != "YOUR_GROQ_API_KEY" and GROQ_API_KEY:
        from groq import Groq  # Import the Groq library only if needed
        # Create the client object to communicate with the Groq API
        client = Groq(api_key=GROQ_API_KEY)
        print("[INFO] Groq client configured successfully.")
    else:
        print("[WARNING] Groq API Key is not set. LLM features will be disabled.")
except Exception as e:
    print(f"[ERROR] Failed to configure Groq client: {e}")



# --- Base AI Guard Class ---

# This class handles the core audio input (ASR) and output (TTS) functionalities.
class AI_Guard:
    def __init__(self):
        self.guard_mode_active = False         # State variable to track if the guard mode is active
        self.recognizer = sr.Recognizer()      # Initialize the speech recognizer
        self.microphone = sr.Microphone()      # Initialize the microphone
        pygame.mixer.init()                    # Initialize the Pygame mixer for reliable audio playback
        
        self.speak_lock = threading.Lock()     # Create a threading lock to prevent multiple parts of the program from trying to speak at the exact same time, which can cause file access errors.

        print("[INFO] AI Guard System Initialized. Calibrating microphone...")
        
        # Listen for 1 second to adjust the recognizer for ambient noise levels              # Cite: [2]
        with self.microphone as source:
            self.recognizer.adjust_for_ambient_noise(source, duration=1)         
        print("[INFO] Microphone calibrated.")
        
    def speak(self, text, is_alarm=False):
        with self.speak_lock:
            # If an alarm is already playing, don't interrupt it with speech
            if pygame.mixer.music.get_busy() and is_alarm is False:
                return

            print(f"[GUARD SAYS]: {text}")
            try:
                tts = gTTS(text=text, lang='en')                       # Create the gTTS object with the text to be spoken
                temp_dir = tempfile.gettempdir()                       # Get the system's temporary directory path to avoid permission errors
                audio_file = os.path.join(temp_dir, "response.mp3")    # Define the full path for the temporary audio file
                tts.save(audio_file)                                   # Save the generated speech to the mp3 file

                pygame.mixer.music.load(audio_file)        # Use Pygame's music mixer to play the audio file
                pygame.mixer.music.play()
            
                # Wait in a loop until the audio has finished playing
                while pygame.mixer.music.get_busy():
                    time.sleep(0.1)
                pygame.mixer.music.unload()       # Unload the file so it can be safely deleted
                os.remove(audio_file)             # Remove the temporary audio file
            except Exception as e:
                print(f"[ERROR] Could not speak due to an error: {e}")
        # The lock is automatically released here
                


    def listen_for_command(self):
        """Listens for a command via the microphone and uses Google Web Speech API."""
        command = ""
        try:
            with self.microphone as source:
                print("[INFO] Listening for a command...")
                audio = self.recognizer.listen(source, timeout=5, phrase_time_limit=4)     # Listen for up to 5 seconds, stopping after 4 seconds of speech
            
            print("[INFO] Transcribing with Google Speech Recognition...")       # Use Google's online service to convert the audio to text
            command = self.recognizer.recognize_google(audio).lower()
            print(f"[USER SAID]: {command}")

        # Handle common exceptions for speech recognition
        except sr.WaitTimeoutError:
            pass # This is expected if no one speaks
        except sr.UnknownValueError:
            print("[INFO] Google Speech Recognition could not understand audio.")
        except sr.RequestError as e:
            print(f"[ERROR] Could not request results from Google service; {e}")
        return command


# --- Full System Class ---

# This class inherits from AI_Guard and adds vision, LLM, and state management.
class AI_Guard_Full(AI_Guard):
    def __init__(self):
        super().__init__()          # Initialize the parent AI_Guard class (ASR, TTS, etc.)
        try:
            # Load the pre-trained SVM classifier and label encoder from the pickle file
            with open(MODEL_FILE, "rb") as f:
                self.model_data = pickle.load(f)
            print("[INFO] Loaded trained face recognition model.")
        except FileNotFoundError:
            self.speak(f"Error: Model file '{MODEL_FILE}' not found. Please run enroll_faces.py first.")
            exit()
        
        # Initialize the webcam                           # Cite: [3]
        self.video_capture = cv2.VideoCapture(0)
        if not self.video_capture.isOpened():
            self.speak("Error: Cannot open webcam.")
            exit()
        
        # Timestamps to prevent spamming welcome/warning messages
        self.video_capture = cv2.VideoCapture(0)
        self.last_seen_trusted_time = 0
        self.cooldown_period = 10  # 10-second cooldown
        # Event to signal the background listener thread to stop
        self.stop_event = threading.Event()
        # Flag to track if the OpenCV window is currently open
        self.vision_window_active = False
        # Dictionary to manage the state of an intruder encounter
        self.intruder_state = {"detected": False, "start_time": None, "escalation_level": 0, "last_warning_time": 0}
        # Time intervals (in seconds) for escalating warnings
        # --- NEW: Added Level 4 for the alarm ---
        self.escalation_intervals = {1: 0, 2: 10, 3: 20, 4: 30} # seconds
        self.alarm_sound = self.generate_alarm_sound()
        self.is_alarm_playing = False
    
    # Cite : [15]
    # --- Alarm Sound Generation ---    
    def generate_alarm_sound(self, beep_duration=0.15, silence_duration=0.1, num_beeps=3, frequency=2000):
        """Generates a rapid, high-pitched beeping sound, like a fire alarm."""
        sample_rate = pygame.mixer.get_init()[0]
        
        # Calculate the number of samples for one beep and the silence that follows
        beep_samples = int(sample_rate * beep_duration)
        silence_samples = int(sample_rate * silence_duration)
        
        # Generate the high-pitched beep tone
        beep_wave = (np.sin(2 * np.pi * np.arange(beep_samples) * frequency / sample_rate)).astype(np.float32)
        # Create a silent segment
        silence_wave = np.zeros(silence_samples, dtype=np.float32)
        
        # Combine one beep and one silence period
        single_alarm_cycle = np.concatenate([beep_wave, silence_wave])
        
        # Repeat the cycle to create a series of beeps
        alarm_wave = np.tile(single_alarm_cycle, num_beeps)
        
        # Convert to 16-bit PCM format and make it stereo
        alarm_wave = (alarm_wave * 32767).astype(np.int16)
        sound_buffer = np.repeat(alarm_wave.reshape(-1, 1), 2, axis=1)
        
        # Return the final sound object that Pygame can play
        return pygame.sndarray.make_sound(sound_buffer)



    def generate_response(self, level):
        """Generates a spoken response from the LLM based on the escalation level."""
        if not client: return "Language model not available."
        
        # Context-specific prompts for a college hostel room environment
        system_prompts = {
            1: "You are a friendly AI assistant guarding a college hostel room. In one short, casual sentence, politely ask who they are.",
            2: "The unrecognized person has not left. In one short sentence, state that this is a private hostel room and they need to leave.",
            3: "The intruder is still here. In one short, stern sentence, state that they are trespassing and the hostel warden will be alerted if they don't leave."
        }
        
        # Get the appropriate prompt for the current level
        prompt_text = system_prompts.get(level, "An error occurred.")
        try:
            # Send the prompt to the Groq API using the Llama 3.1 model
            chat_completion = client.chat.completions.create(messages=[{"role": "system", "content": prompt_text}], model="llama-3.1-8b-instant")
            # Extract and return the generated text
            return chat_completion.choices[0].message.content.strip()
        except Exception as e:
            return "My response circuits are offline."


    def handle_unrecognized_person(self):
        """Manages the state and logic for an escalating encounter with an intruder."""
        current_time = time.time()
        # If this is the first time seeing an intruder
        if not self.intruder_state["detected"]:
            # Update the state to start the encounter
            self.intruder_state.update({"detected": True, "start_time": current_time, "escalation_level": 1})
            # Generate and speak the Level 1 warning
            response = self.generate_response(1)
            self.speak(response)
        else:
            # If an intruder is already detected, check if it's time to escalate
            time_since_detection = current_time - self.intruder_state["start_time"]
            new_level = 0
            # --- Check for Level 4 escalation ---
            if time_since_detection > self.escalation_intervals[4] and self.intruder_state["escalation_level"] < 4: new_level = 4
            elif time_since_detection > self.escalation_intervals[3] and self.intruder_state["escalation_level"] < 3: new_level = 3
            elif time_since_detection > self.escalation_intervals[2] and self.intruder_state["escalation_level"] < 2: new_level = 2
            
            if new_level > self.intruder_state["escalation_level"]:
                self.intruder_state["escalation_level"] = new_level
                print(f"[ALERT] Escalating to level {new_level}.")
                
                if new_level == 4:
                    # --- NEW: Play the alarm sound ---
                    print("[ALARM] Intruder has not left. Sounding alarm.")
                    if not self.is_alarm_playing:
                        self.alarm_sound.play(loops=-1) # Play indefinitely
                        self.is_alarm_playing = True
                else:
                    response = self.generate_response(new_level)
                    self.speak(response)

    def reset_intruder_state(self):
        """Resets the intruder encounter state back to default."""
        if self.intruder_state["detected"]:
            self.intruder_state = {"detected": False, "start_time": None, "escalation_level": 0}
            # --- Alarm Feature: Stop the alarm when the threat is cleared ---
            if self.is_alarm_playing:
                self.alarm_sound.stop()
                self.is_alarm_playing = False


    def process_vision(self):
        """The main computer vision loop: captures frame, finds faces, and identifies them."""
        ret, frame = self.video_capture.read()
        if not ret: return
        
        rgb_small_frame = cv2.cvtColor(cv2.resize(frame, (0, 0), fx=0.25, fy=0.25), cv2.COLOR_BGR2RGB)     # Create a smaller version of the frame for faster face recognition
        face_locations = face_recognition.face_locations(rgb_small_frame)                                 # Find all faces and their encodings in the small frame
        face_encodings = face_recognition.face_encodings(rgb_small_frame, face_locations)
        
        # Flags to track the state of the current frame
        is_trusted_person_present_in_frame = False
        is_any_person_present_in_frame = len(face_encodings) > 0

        # Get the trained classifier and label encoder from the loaded model
        classifier = self.model_data["classifier"]
        label_encoder = self.model_data["label_encoder"]

        # Loop through each detected face
        for (top, right, bottom, left), face_encoding in zip(face_locations, face_encodings):
            probabilities = classifier.predict_proba([face_encoding])[0]       # Use the trained SVM classifier to get prediction probabilities for each known category
            best_match_index = np.argmax(probabilities)                        # Find the category with the highest probability
            predicted_name = label_encoder.classes_[best_match_index]
            confidence = probabilities[best_match_index]
            
            print(f"[DEBUG] Predicted: {predicted_name}, Confidence: {confidence:.2f}")


             # Cite : [6]
            # A person is trusted IF AND ONLY IF the prediction is NOT "unrecognized" AND the confidence is high.
            if predicted_name != "unrecognized" and confidence > RECOGNITION_THRESHOLD:
                is_trusted_person_present_in_frame = True
                display_name = predicted_name.replace('_', ' ')
                box_color = (0, 255, 0) # Green for trusted

                current_time = time.time()
                # Greet the trusted person if enough time has passed since the last greeting
                if current_time - self.last_seen_trusted_time > self.cooldown_period:
                    self.speak(f"Welcome back, {display_name}. Glad to see you.")
                    self.last_seen_trusted_time = current_time
            else:
                # If the prediction IS "unrecognized" OR the confidence for a known person is too low, they are treated as an intruder.
                display_name = "Unrecognized"
                box_color = (0, 0, 255) # Red for untrusted
            
            # --- Visual Feedback on the Pop-up Window ---
            top *= 4; right *= 4; bottom *= 4; left *= 4                                            # Scale face locations back up
            cv2.rectangle(frame, (left, top), (right, bottom), box_color, 2)                        # Draw the box
            label = f"{display_name} ({confidence:.2f})"                                            # Create the text label
            cv2.rectangle(frame, (left, bottom - 35), (right, bottom), box_color, cv2.FILLED)       # Draw label background
            cv2.putText(frame, label, (left + 6, bottom - 6), cv2.FONT_HERSHEY_DUPLEX, 0.8, (255, 255, 255), 1)   # Draw label text

        # Cite: [7]
        # --- Final Decision Logic (Post-Frame Analysis) ---
        if is_trusted_person_present_in_frame:
            # If at least one trusted person is in the frame, the room is secure.
            self.reset_intruder_state()
        elif is_any_person_present_in_frame:
            # If people are present, but NONE were trusted, they are intruders.
            self.handle_unrecognized_person()
        else: # No people in the frame
            # If the room is empty, any ongoing alert can be reset.
            self.reset_intruder_state()
        
        # Display the resulting frame in a pop-up window
        cv2.imshow('AI Guard System', frame)
        self.vision_window_active = True

    
    # Cite: [8]
    def _threaded_listener(self):
        """This function runs in a separate thread, dedicated to listening for voice commands."""
        while not self.stop_event.is_set():
            command = self.listen_for_command()
            # Process the command to activate or deactivate the guard
            if ACTIVATION_COMMAND in command and not self.guard_mode_active:
                self.guard_mode_active = True; self.speak("Guard mode activated.")
            elif DEACTIVATION_COMMAND in command and self.guard_mode_active:
                self.guard_mode_active = False; self.speak("Guard mode deactivated."); self.reset_intruder_state()


    def run(self):
        """The main application entry point."""
        self.stop_event.clear()
        # Create and start the background listener thread
        listener_thread = threading.Thread(target=self._threaded_listener, daemon=True)
        listener_thread.start()
        try:
            print(f"\n[INFO] AI Guard is running. Say '{ACTIVATION_COMMAND}' to activate.")
            # The main loop now primarily handles vision processing and window management
            while not self.stop_event.is_set():
                if self.guard_mode_active:
                    self.process_vision()
                else:
                    # If idle, ensure the vision window is closed
                    if self.vision_window_active:
                        cv2.destroyWindow('AI Guard System')
                        self.vision_window_active = False
                
                # Check if the 'q' key was pressed in the OpenCV window to quit
                if cv2.waitKey(1) & 0xFF == ord('q'): break
                
                # Sleep to manage CPU usage. Short sleep when active, longer when idle.
                time.sleep(0.05 if self.guard_mode_active else 0.5)
        except KeyboardInterrupt:
            print("\n[INFO] Program interrupted by user.")
        finally:
            # --- Graceful Shutdown ---
            self.stop_event.set()               # Signal all threads to stop
            listener_thread.join(timeout=1.0)   # Wait for the listener thread to finish
            self.video_capture.release()        # Release the webcam
            cv2.destroyAllWindows()             # Close all OpenCV windows
            pygame.mixer.quit()                 # Quit the Pygame mixer
            print("[INFO] System resources released.")

if __name__ == "__main__":
    # This block runs when the script is executed directly
    guard = AI_Guard_Full()
    guard.run()


# Cite: [11], [12], [13], [14]

[INFO] Groq client configured successfully.
[INFO] AI Guard System Initialized. Calibrating microphone...
[INFO] Microphone calibrated.
[INFO] Loaded trained face recognition model.

[INFO] AI Guard is running. Say 'guard my room' to activate.
[INFO] Listening for a command...
[INFO] Transcribing with Google Speech Recognition...
[USER SAID]: guard my room
[GUARD SAYS]: Guard mode activated.
[DEBUG] Predicted: naresh, Confidence: 0.75
[GUARD SAYS]: Welcome back, naresh. Glad to see you.
[INFO] Listening for a command...
[INFO] Transcribing with Google Speech Recognition...
[USER SAID]: the cardboard is activated
[INFO] Listening for a command...
[DEBUG] Predicted: naresh, Confidence: 0.75
[DEBUG] Predicted: naresh, Confidence: 0.62
[DEBUG] Predicted: naresh, Confidence: 0.69
[DEBUG] Predicted: naresh, Confidence: 0.73
[DEBUG] Predicted: naresh, Confidence: 0.71
[DEBUG] Predicted: naresh, Confidence: 0.73
[GUARD SAYS]: Welcome back, naresh. Glad to see you.
[INFO] Transcribing with Goog

[INFO] Google Speech Recognition could not understand audio.


#### How it works:
To Activate the AI Guard Mode: Say - "Guard My Roomm"

TO Deactivate Say - "Stand Down"

When an recognize person appears in front of the camera it detects them as recongnized person showing a green flag box with high confidence score when threshold value or confidence score above 0.70, I set threshold value 0.70. THe AI Guard Agents greets them as Welcome back, name, Glad to see you.

When an unrecognized or intruder person appears, the system flags them with a red box and initiates a level 1 warning using the Groq API.  When the intuder still appears in front of the webcam, it escalates through level 2... and then level 3.

 When the final verbal warning is ignored... the system enters level four, activating an audible alarm to alert neighbors."  

 This alarm continues until an untrusted person disappears or a trusted person, like myself, is seen, or until the system is deactivated. 



### Conclusion: 

Also Implemented Optional stretch goal:

This demonstration has shown our AI Guard's ability to activate by voice with above 90% accuracy, robustly differentiate between trusted and untrusted individuals or face recognition above 80% accuracy, and handle intruders with an intelligent LLM-powered conversational agent.

## <center>Thank You! </center>

### Following references taken for this assignment:

[1] - Gemini prompt : How can I write a script that scans a folder of images, computes face encodings using face_recognition, and saves them to a .pkl file?

[2] - Gemini prompt : How can I make the speech recognition more robust by calibrating the microphone for ambient noise using recognizer.adjust_for_ambient_noise?

[3] - Gemini prompt : How do I use a webcam with OpenCV to compare detected faces against the saved encodings from my .pkl file?

[4] - Gemini prompt : How do I use the Groq API in Python to generate escalating spoken warnings for an intruder based on the time they have been present?  

[5] - Gemini prompt : How do I show the live webcam feed in a pop-up window using OpenCV (cv2.imshow) and draw bounding boxes on detected faces?

[6] - Gemini prompt : How can I make the OpenCV window draw a green box for recognized faces and a red box for unrecognized ones?

[7] - Gemini prompt : Why does my code say "Welcome Unrecognized"? How do I fix the logic to only welcome trusted people and always treat "unrecognized" predictions as intruders?

[8] - Gemini prompt : My application freezes while listening for commands. How can I use Python's threading module to run the voice listener in the background so the video feed remains smooth?

[9] - Gemini prompt : How do I add a try...finally block to my main loop to ensure that the webcam (video_capture.release()) and all

[10] - Google prompt : https://console.groq.com/docs/quickstart

[11] - Google prompt : https://console.groq.com/docs/text-chat

[12] - Google prompt : https://console.groq.com/docs/speech-to-text

[13] - Google prompt : https://console.groq.com/docs/text-to-speech

[14] - Google prompt: https://console.groq.com/docs/vision 

[15] - Gemini prompt: How to make a audible sound alarm, trigger when intruder ingnores the final warning?

