---

# **Voice-Controlled Personal Assistant Using GPT-2**

## **Introduction**
This project implements a **Voice-Controlled Personal Assistant** named **Samir**, which interacts with the user through voice commands. The assistant can manage tasks, take screenshots, answer questions using a GPT-2 language model, and perform other utility functions such as opening a web browser. This project highlights the integration of speech recognition, text-to-speech conversion, and AI-driven text generation.

---

## **Project Components**

### **1. Key Features**
- **Task Management**: Add and display a list of personal tasks.
- **Screenshot Capture**: Take and save screenshots with a timestamped filename.
- **Web Browsing**: Open a web browser to access the internet.
- **Question Answering**: Respond to user queries using the GPT-2 model.
- **Speech Interaction**: Utilize speech recognition for input and text-to-speech for output.

---

### **2. Libraries Used**
1. **pyttsx3**: Converts text into speech, allowing the assistant to communicate with the user.
2. **SpeechRecognition**: Captures user voice commands via a microphone and converts them into text.
3. **webbrowser**: Opens web pages based on voice commands.
4. **pyautogui**: Captures screenshots of the current screen.
5. **transformers**: Provides access to the GPT-2 model for generating answers to questions.
6. **datetime**: Generates timestamps for naming screenshot files.

---

## **System Workflow**

### **1. Text-to-Speech and local GPT model Initialization**
The assistant initializes text-to-speech engine and GPT model with configurable properties like rate (speed) and volume.



In [None]:
import os
import pyttsx3
import speech_recognition as sr
import webbrowser
import pyautogui
from transformers import pipeline
import datetime



# Initialize text-to-speech engine
engine = pyttsx3.init()
engine.setProperty('rate', 150)
engine.setProperty('volume', 1.0)

def speak(text):
    """Convert text to speech."""
    engine.say(text)
    engine.runAndWait()

# Initialize the local GPT model
print("Loading GPT model... This may take a while.")
gpt_model = pipeline("text-generation", model="gpt2")

# Task list
tasks = []


---

### **2. Speech Recognition**
The assistant listens to user voice input through a microphone and converts it into text. Background noise is accounted for to improve accuracy.

In [None]:
# Recognize speech using microphone
def listen():
    """Listen to voice input and convert it to text."""
    recognizer = sr.Recognizer()
    with sr.Microphone() as source:
        print("Listening...")
        recognizer.adjust_for_ambient_noise(source)
        audio = recognizer.listen(source)
    try:
        print("Recognizing...")
        command = recognizer.recognize_google(audio)
        print(f"You said: {command}")
        return command.lower()
    except sr.UnknownValueError:
        speak("Sorry, I didn't catch that. Could you please repeat?")
        return ""
    except sr.RequestError:
        speak("Sorry, I'm having trouble accessing the speech recognition service.")
        return ""

### **3. Question Answering**
The project uses the GPT-2 language model to answer user queries. The model generates responses based on the input text.


In [None]:
def answer_question(question):
    """Use GPT model to generate an answer for the given question."""
    response = gpt_model(question, max_length=50, num_return_sequences=1)[0]['generated_text']
    return response

### **4. Handling Voice Commands**
The assistant processes user commands to perform tasks such as adding tasks, taking screenshots, opening a browser, and answering questions.

#### **Key Commands:**
- **Add Task**: Adds a task to a task list.
- **Show Tasks**: Displays the list of tasks.
- **Take Screenshot**: Saves a screenshot with a timestamp.
- **Open Browser**: Opens a web browser to Google.
- **Answer Questions**: Uses GPT-2 to respond to queries.
- **Exit**: Terminates the assistant.

In [None]:
# Main function to handle commands
def handle_command(command):
    """Process voice commands and perform tasks."""
    if "add task" in command:
        task = command.replace("add task", "").strip()
        if task:
            tasks.append(task)
            speak(f"Task '{task}' has been added to your list.")
        else:
            speak("Please specify the task to add.")

    elif "show tasks" in command:
        if tasks:
            speak("Here are your tasks:")
            for i, task in enumerate(tasks, start=1):
                speak(f"Task {i}: {task}")
        else:
            speak("Your task list is empty.")

    elif "take screenshot" in command:
        screenshot = pyautogui.screenshot()
        timestamp = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
        filename = f"screenshot_{timestamp}.png"
        screenshot.save(filename)
        speak(f"Screenshot saved as {filename}.")

    elif "open browser" in command:
        webbrowser.open("https://www.google.com")
        speak("Browser opened.")

    elif "what is" in command or "who is" in command or "tell me about" in command:
        response = answer_question(command)
        speak(response)

    elif "exit" in command or "quit" in command:
        speak("Goodbye! Have a great day.")
        exit()

    else:
        speak("I'm not sure how to help with that.")

---

### **5. Main Loop**
The assistant continuously listens for commands and executes corresponding functions, creating an interactive experience.

In [None]:
# Main loop
if __name__ == "__main__":
    speak("Hello! I'm Samir, your personal assistant. How can I help you today?")
    while True:
        user_command = listen()
        if user_command:
            handle_command(user_command)

---

## **How to Run the Project**

### **1. Prerequisites**
Install the required libraries:
```bash
pip install pyttsx3
pip install SpeechRecognition
pip install transformers
pip install pyautogui
pip install pyaudio
```

### **2. Running the Code**
Run the Python script to launch the assistant. Speak into the microphone when prompted, and interact with the assistant using natural language.

---

## **Key Features**

| Feature               | Description                                                                 |
|-----------------------|-----------------------------------------------------------------------------|
| **Task Management**   | Add and display tasks using voice commands.                                |
| **Screenshots**       | Capture and save screenshots with a timestamp.                             |
| **Web Browsing**      | Open Google using a voice command.                                         |
| **Question Answering**| Use GPT-2 to respond to questions with AI-generated text.                  |
| **Speech Interaction**| Provides text-to-speech responses for an interactive experience.           |

---

## **Applications**
- **Personal Productivity**: Manage tasks and perform quick actions hands-free.
- **Learning**: Use the GPT-2 integration to answer general knowledge questions.
- **Utility Tool**: Capture screenshots and browse the web with voice commands.

---

## **Future Enhancements**
- **Enhanced Voice Commands**: Add support for advanced actions like sending emails or setting reminders.
- **Improved Question Answering**: Use a larger GPT model for more accurate and context-aware answers.
- **Multilingual Support**: Incorporate speech recognition and text-to-speech for multiple languages.
- **GUI Integration**: Develop a user-friendly interface for task management and interaction.

---
