#Building Your First Voice Agent - Class 9 Interactive Tutorial
================================================================
This notebook will guide you through creating a complete voice agent from scratch!


# 🎯 Building Your First Voice Agent
 
Welcome to an exciting journey into the world of voice technology! In this notebook, you'll learn to build your own voice agent that can:
- 🎤 Listen to your voice (Speech Recognition)
- 🧠 Understand what you're asking
- 🗣️ Respond with a voice (Text-to-Speech)

By the end of this tutorial, you'll have created your own personal assistant!


## 📚 What You'll Learn
 
1. **Speech Recognition (ASR)** - How computers "hear" and understand speech
2. **Text-to-Speech (TTS)** - How computers can talk back to you
3. **Natural Language Processing** - How computers understand meaning
4. **Integration** - Putting it all together into a working voice agent

Let's start our journey! 🚀


## 🛠️ Step 1: Setting Up Our Tools
First, we need to install the libraries that will help us build our voice agent.

**What each library does:**
- `speech_recognition`: Converts your speech to text (ASR)
- `pyttsx3`: Converts text to speech (TTS)
- `pyaudio`: Handles audio input/output
- `requests`: Gets information from the internet
- `datetime`: Works with dates and times

In [1]:
# Install required libraries (run this cell first!)
import subprocess
import sys

def install_package(package):
    try:
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])
        print(f"✅ {package} installed successfully!")
    except subprocess.CalledProcessError:
        print(f"❌ Failed to install {package}")

# List of packages we need
packages = [
    "speechrecognition",
    "pyttsx3", 
    "pyaudio",
    "requests",
    "wikipedia"
]

print("🔧 Installing required packages...")
for package in packages:
    install_package(package)

print("\n🎉 All packages installed! Let's start coding!")

🔧 Installing required packages...
Collecting speechrecognition
  Downloading speechrecognition-3.14.3-py3-none-any.whl.metadata (30 kB)
Downloading speechrecognition-3.14.3-py3-none-any.whl (32.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m32.9/32.9 MB[0m [31m25.0 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: speechrecognition
Successfully installed speechrecognition-3.14.3
✅ speechrecognition installed successfully!
Collecting pyttsx3
  Downloading pyttsx3-2.98-py3-none-any.whl.metadata (3.8 kB)
Collecting pyobjc>=2.4 (from pyttsx3)
  Downloading pyobjc-11.1-py3-none-any.whl.metadata (25 kB)
Collecting pyobjc-core==11.1 (from pyobjc>=2.4->pyttsx3)
  Downloading pyobjc_core-11.1-cp311-cp311-macosx_10_9_universal2.whl.metadata (2.7 kB)
Collecting pyobjc-framework-libdispatch==11.1 (from pyobjc>=2.4->pyttsx3)
  Downloading pyobjc_framework_libdispatch-11.1-cp311-cp311-macosx_10_9_universal2.whl.metadata (2.4 kB)
Collecting py

  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mBuilding wheel for pyaudio [0m[1;32m([0m[32mpyproject.toml[0m[1;32m)[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m [31m[27 lines of output][0m
  [31m   [0m !!
  [31m   [0m 
  [31m   [0m         ********************************************************************************
  [31m   [0m         Please consider removing the following classifiers in favor of a SPDX license expression:
  [31m   [0m 
  [31m   [0m         License :: OSI Approved :: MIT License
  [31m   [0m 
  [31m   [0m         See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
  [31m   [0m         ********************************************************************************
  [31m   [0m 
  [31m   [0m !!
  [31m   [0m   self._finalize_license_expression()
  [31m   [0m running bdist_wheel
  [31m   [0m running build
  [31m   [0m ru

✅ requests installed successfully!
Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting beautifulsoup4 (from wikipedia)
  Downloading beautifulsoup4-4.13.4-py3-none-any.whl.metadata (3.8 kB)
Collecting soupsieve>1.2 (from beautifulsoup4->wikipedia)
  Downloading soupsieve-2.7-py3-none-any.whl.metadata (4.6 kB)
Downloading beautifulsoup4-4.13.4-py3-none-any.whl (187 kB)
Downloading soupsieve-2.7-py3-none-any.whl (36 kB)
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (pyproject.toml): started
  Building wheel for wikipedia (pyproject.toml): finished with status 'done'
  Created wheel for wikipedia: filename

# 🎤 Step 2: Understanding Speech Recognition (ASR)
Speech Recognition is like giving your computer ears! Let's see how it works.

The Process:

1. Your voice creates sound waves

2. Microphone converts sound waves to electrical signals

3. Computer converts signals to digital data

4. AI analyzes the data and finds words

5. Returns text that matches your speech

In [4]:
# ! brew install portaudio
# ! pip install pyaudio

In [3]:
import speech_recognition as sr
import time

# Create our speech recognizer
print("🎤 Creating Speech Recognizer...")
recognizer = sr.Recognizer()

# Let's test if our microphone works
def test_microphone():
    """Test if our microphone is working"""
    print("\n🔍 Testing microphone...")
    
    # Get list of available microphones
    mic_list = sr.Microphone.list_microphone_names()
    print(f"📱 Found {len(mic_list)} microphones:")
    
    for i, name in enumerate(mic_list[:3]):  # Show first 3
        print(f"  {i}: {name}")
    
    # Test with default microphone
    try:
        with sr.Microphone() as source:
            print("🎙️ Microphone is working!")
            print("📊 Adjusting for background noise... (2 seconds)")
            recognizer.adjust_for_ambient_noise(source, duration=2)
            print("✅ Microphone setup complete!")
            return True
    except Exception as e:
        print(f"❌ Microphone error: {e}")
        return False

# Test our setup
mic_working = test_microphone()

🎤 Creating Speech Recognizer...

🔍 Testing microphone...
📱 Found 9 microphones:
  0: Yeti Stereo Microphone
  1: Yeti Stereo Microphone
  2: USB Audio
🎙️ Microphone is working!
📊 Adjusting for background noise... (2 seconds)
✅ Microphone setup complete!


## 🧪 Experiment 1: Your First Speech Recognition
 
Let's try converting your speech to text! This is the foundation of all voice agents.


In [6]:
def listen_once():
    """Listen to user speech and convert to text"""
    if not mic_working:
        print("❌ Microphone not working. Please check your setup.")
        return "microphone not working"
    
    print("\n🎤 EXPERIMENT: Speech to Text")
    print("=" * 40)
    print("📢 Instructions:")
    print("1. Click 'Run' on this cell")
    print("2. When you see 'Listening...', start speaking")
    print("3. Speak clearly for 3-5 seconds")
    print("4. Watch the magic happen!")
    print()
    
    try:
        with sr.Microphone() as source:
            print("🎧 Listening... Speak now!")
            print("(Speak clearly for 3-5 seconds)")
            
            # Listen for audio with timeout
            audio = recognizer.listen(source, timeout=10, phrase_time_limit=5)
            print("🔄 Processing your speech...")
            
            # Convert speech to text using Google's free service
            text = recognizer.recognize_google(audio)
            
            print(f"\n✨ SUCCESS! You said: '{text}'")
            print(f"📝 Text length: {len(text)} characters")
            return text
            
    except sr.WaitTimeoutError:
        print("⏰ No speech detected. Try speaking louder!")
        return ""
    except sr.UnknownValueError:
        print("🤔 Could not understand the audio. Try speaking more clearly!")
        return ""
    except sr.RequestError as e:
        print(f"🌐 Internet connection error: {e}")
        return ""
    except Exception as e:
        print(f"❌ Unexpected error: {e}")
        return ""

# Try it now!
user_speech = listen_once()


🎤 EXPERIMENT: Speech to Text
📢 Instructions:
1. Click 'Run' on this cell
2. When you see 'Listening...', start speaking
3. Speak clearly for 3-5 seconds
4. Watch the magic happen!

🎧 Listening... Speak now!
(Speak clearly for 3-5 seconds)
🔄 Processing your speech...

✨ SUCCESS! You said: 'hello how's it going'
📝 Text length: 20 characters


## 🎯 Challenge 1: Test Different Scenarios
 
Let's see how well speech recognition works in different situations!


In [8]:
def speech_recognition_challenge():
    """Test speech recognition with different challenges"""
    print("🏆 SPEECH RECOGNITION CHALLENGE")
    print("=" * 50)
    
    challenges = [
        "Say a simple word like 'hello'",
        "Say a long sentence about your favorite hobby",
        "Try speaking very quietly",
        "Try speaking very fast",
        "Say some numbers like 'twelve thirty-four'"
    ]
    
    results = []
    
    for i, challenge in enumerate(challenges, 1):
        print(f"\n📝 Challenge {i}: {challenge}")
        input("   Press Enter when ready to speak...")
        
        result = listen_once()
        results.append(result)
        
        if result:
            print(f"   ✅ Result: '{result}'")
        else:
            print("   ❌ No result")
            
        print("-" * 30)
    
    print("\n📊 CHALLENGE SUMMARY:")
    for i, result in enumerate(results, 1):
        status = "✅ Success" if result else "❌ Failed"
        print(f"Challenge {i}: {status}")
    
    return results

# run the challenge!
challenge_results = speech_recognition_challenge()

🏆 SPEECH RECOGNITION CHALLENGE

📝 Challenge 1: Say a simple word like 'hello'

🎤 EXPERIMENT: Speech to Text
📢 Instructions:
1. Click 'Run' on this cell
2. When you see 'Listening...', start speaking
3. Speak clearly for 3-5 seconds
4. Watch the magic happen!

🎧 Listening... Speak now!
(Speak clearly for 3-5 seconds)
🔄 Processing your speech...

✨ SUCCESS! You said: 'hello'
📝 Text length: 5 characters
   ✅ Result: 'hello'
------------------------------

📝 Challenge 2: Say a long sentence about your favorite hobby

🎤 EXPERIMENT: Speech to Text
📢 Instructions:
1. Click 'Run' on this cell
2. When you see 'Listening...', start speaking
3. Speak clearly for 3-5 seconds
4. Watch the magic happen!

🎧 Listening... Speak now!
(Speak clearly for 3-5 seconds)
🔄 Processing your speech...

✨ SUCCESS! You said: 'so my favorite hobby'
📝 Text length: 20 characters
   ✅ Result: 'so my favorite hobby'
------------------------------

📝 Challenge 3: Try speaking very quietly

🎤 EXPERIMENT: Speech to Text
📢

## 🗣️ Step 3: Understanding Text-to-Speech (TTS)
 
Now let's give our computer a voice! Text-to-Speech is like giving your computer a mouth.
 
**The Process:**
1. Computer receives text
2. AI analyzes the words and grammar
3. Determines pronunciation and emphasis
4. Generates sound waves that match human speech
5. Plays the audio through speakers

In [9]:
import pyttsx3

print("🗣️ Setting up Text-to-Speech...")

# Initialize the TTS engine
tts_engine = pyttsx3.init()

# Let's explore the available voices
def explore_voices():
    """Discover what voices are available"""
    print("\n🎭 EXPLORING AVAILABLE VOICES")
    print("=" * 40)
    
    voices = tts_engine.getProperty('voices')
    print(f"🔍 Found {len(voices)} voices on your system:")
    
    for i, voice in enumerate(voices):
        # Get voice info
        name = voice.name
        gender = "Female" if "female" in name.lower() else "Male" if "male" in name.lower() else "Unknown"
        language = voice.languages[0] if voice.languages else "Unknown"
        
        print(f"\n  Voice {i}:")
        print(f"    Name: {name}")
        print(f"    Gender: {gender}")
        print(f"    Language: {language}")
        print(f"    ID: {voice.id}")
    
    return voices

# Explore available voices
available_voices = explore_voices()

🗣️ Setting up Text-to-Speech...

🎭 EXPLORING AVAILABLE VOICES
🔍 Found 177 voices on your system:

  Voice 0:
    Name: Albert
    Gender: Unknown
    Language: en_US
    ID: com.apple.speech.synthesis.voice.Albert

  Voice 1:
    Name: Alice
    Gender: Unknown
    Language: it_IT
    ID: com.apple.voice.compact.it-IT.Alice

  Voice 2:
    Name: Alva
    Gender: Unknown
    Language: sv_SE
    ID: com.apple.voice.compact.sv-SE.Alva

  Voice 3:
    Name: Amélie
    Gender: Unknown
    Language: fr_CA
    ID: com.apple.voice.compact.fr-CA.Amelie

  Voice 4:
    Name: Amira
    Gender: Unknown
    Language: ms_MY
    ID: com.apple.voice.compact.ms-MY.Amira

  Voice 5:
    Name: Anna
    Gender: Unknown
    Language: de_DE
    ID: com.apple.voice.compact.de-DE.Anna

  Voice 6:
    Name: Bad News
    Gender: Unknown
    Language: en_US
    ID: com.apple.speech.synthesis.voice.BadNews

  Voice 7:
    Name: Bahh
    Gender: Unknown
    Language: en_US
    ID: com.apple.speech.synthesis.voice.

## 🎵 Experiment 2: Making Your Computer Talk
 
Let's make our computer speak with different voices and settings!

In [10]:
def text_to_speech_demo():
    """Demonstrate text-to-speech with different settings"""
    print("\n🎪 TEXT-TO-SPEECH DEMO")
    print("=" * 35)
    
    test_message = "Hello! I am your voice agent. I can speak in different voices and speeds!"
    
    # Get current settings
    current_rate = tts_engine.getProperty('rate')
    current_volume = tts_engine.getProperty('volume')
    
    print(f"🎛️ Current settings:")
    print(f"   Speed: {current_rate} words per minute")
    print(f"   Volume: {current_volume}")
    
    # Demo 1: Normal speech
    print(f"\n🎤 Demo 1: Normal speech")
    tts_engine.setProperty('rate', 200)
    tts_engine.setProperty('volume', 0.8)
    tts_engine.say("Hello! This is normal speech.")
    tts_engine.runAndWait()
    
    # Demo 2: Slow speech
    print(f"\n🐌 Demo 2: Slow speech")
    tts_engine.setProperty('rate', 120)
    tts_engine.say("This... is... slow... speech.")
    tts_engine.runAndWait()
    
    # Demo 3: Fast speech
    print(f"\n🏃 Demo 3: Fast speech")
    tts_engine.setProperty('rate', 300)
    tts_engine.say("This is very fast speech! Can you understand me?")
    tts_engine.runAndWait()
    
    # Demo 4: Different voice (if available)
    if len(available_voices) > 1:
        print(f"\n👥 Demo 4: Different voice")
        tts_engine.setProperty('voice', available_voices[1].id)
        tts_engine.setProperty('rate', 200)
        tts_engine.say("Now I'm speaking with a different voice!")
        tts_engine.runAndWait()
        
        # Reset to first voice
        tts_engine.setProperty('voice', available_voices[0].id)
    
    print("\n✅ Text-to-Speech demo complete!")

# Run the demo
text_to_speech_demo()


🎪 TEXT-TO-SPEECH DEMO
🎛️ Current settings:
   Speed: 200.0 words per minute
   Volume: 1.0

🎤 Demo 1: Normal speech

🐌 Demo 2: Slow speech

🏃 Demo 3: Fast speech

👥 Demo 4: Different voice

✅ Text-to-Speech demo complete!


## 🛠️ Step 4: Building Our Voice Agent Class
 
Now let's put everything together! We'll create a Voice Agent class that can listen, think, and speak.


In [None]:
import datetime
import random
import wikipedia
import speech_recognition as sr
import pyttsx3
import openai
import os
from dotenv import load_dotenv

class VoiceAgent:
    """A simple voice agent that can listen, process, and respond"""
    
    def __init__(self, name="Sophia", enable_tts=True):
        """Initialize the voice agent"""
        self.name = name
        self.recognizer = sr.Recognizer()
        self.microphone = sr.Microphone()
        
        # TTS control
        self.enable_tts = enable_tts
        self.voice_id = None
        self.voice_rate = 175
        self.voice_volume = 0.9
        
        # Find and store the best voice
        if self.enable_tts:
            self.find_best_voice()
        
        # Initialize OpenAI client for GPT-4o-mini
        self.setup_openai()
        
        # Conversation history for context
        self.conversation_history = []
        
        # Agent's knowledge and responses
        self.responses = {
            'greetings': [
                f"Hello! I'm {self.name}, your voice agent!",
                f"Hi there! {self.name} here, ready to help!",
                f"Greetings! I'm {self.name}. How can I assist you?"
            ],
            'farewell': [
                "Goodbye! It was nice talking with you!",
                "See you later! Have a great day!",
                "Farewell! Come back anytime!"
            ],
            'thanks': [
                "You're welcome! Happy to help!",
                "No problem at all!",
                "Glad I could assist you!"
            ],
            'unknown': [
                "I'm still learning. Could you try asking differently?",
                "That's interesting! I don't know about that yet.",
                "I'm not sure about that. Can you teach me?"
            ]
        }
        
        tts_status = "with voice enabled" if self.enable_tts else "in text-only mode"
        print(f"🤖 {self.name} is ready to help {tts_status}!")
    
    def find_best_voice(self):
        """Find and store the best US female voice"""
        try:
            temp_engine = pyttsx3.init()
            voices = temp_engine.getProperty('voices')
            
            if voices:
                print("🔍 Searching for US female voices...")
                
                # Priority 1: Look for US English female voices
                us_female_keywords = [
                    'english (united states)', 'en-us', 'usa', 'american',
                    'zira', 'cortana', 'eva', 'samantha', 'alex'
                ]
                
                for voice in voices:
                    voice_name = voice.name.lower()
                    voice_id = voice.id.lower()
                    
                    # Check if voice is US English and female
                    is_us = any(keyword in voice_name or keyword in voice_id for keyword in us_female_keywords)
                    is_female = any(keyword in voice_name for keyword in ['female', 'woman', 'girl', 'zira', 'cortana', 'eva', 'samantha'])
                    
                    if is_us and is_female:
                        self.voice_id = voice.id
                        print(f"🎭 Selected US female voice: {voice.name}")
                        break
                
                # Fallback to any female voice
                if not self.voice_id:
                    for voice in voices:
                        voice_name = voice.name.lower()
                        if any(keyword in voice_name for keyword in ['female', 'woman', 'girl', 'zira', 'samantha']):
                            self.voice_id = voice.id
                            print(f"🎭 Selected female voice: {voice.name}")
                            break
                
                # Final fallback
                if not self.voice_id and len(voices) > 1:
                    self.voice_id = voices[1].id
                    print(f"🎭 Fallback voice selected: {voices[1].name}")
            
            # Clean up temp engine
            temp_engine.stop()
            del temp_engine
            
        except Exception as e:
            print(f"⚠️ Voice detection error: {e}")
            self.voice_id = None
    
    def setup_openai(self):
        """Setup OpenAI API for GPT-4o-mini from .env file"""
        try:
            # Load environment variables from .env file
            load_dotenv()
            
            # Get API key from environment variable
            api_key = os.getenv('OPENAI_API_KEY')
            
            if not api_key:
                print("❌ OPENAI_API_KEY not found in .env file!")
                print("📝 Please add your API key to .env file:")
                print("   OPENAI_API_KEY=your_api_key_here")
                print("⚠️ Continuing without AI features. Using basic responses only.")
                self.openai_client = None
                return
            
            # Initialize OpenAI client with new API (v1.0+)
            self.openai_client = openai.OpenAI(api_key=api_key)
            print("✅ GPT-4o-mini connected successfully from .env file!")
            
        except ImportError:
            print("❌ python-dotenv not installed!")
            print("📦 Please install it: pip install python-dotenv")
            print("⚠️ Continuing without AI features. Using basic responses only.")
            self.openai_client = None
        except Exception as e:
            print(f"❌ Error setting up OpenAI: {e}")
            print("⚠️ Continuing without AI features. Using basic responses only.")
            self.openai_client = None
    
    def get_ai_response(self, user_input):
        """Get intelligent response from GPT-4o-mini"""
        if not self.openai_client:
            return None
        
        try:
            # Prepare conversation context
            messages = [
                {"role": "system", "content": "You are Sophia, a helpful and friendly voice assistant. Keep responses concise (1-3 sentences) since they will be spoken aloud. Use a warm, encouraging tone."}
            ]
            
            # Add recent conversation history for context
            for msg in self.conversation_history[-6:]:  # Last 6 messages for context
                messages.append(msg)
            
            # Add current user input
            messages.append({"role": "user", "content": user_input})
            
            # Get response from GPT-4o-mini using new API
            response = self.openai_client.chat.completions.create(
                model="gpt-4o-mini",
                messages=messages,
                max_tokens=150,  # Keep responses concise for speech
                temperature=0.7,  # Balanced creativity and consistency
            )
            
            ai_response = response.choices[0].message.content.strip()
            
            # Update conversation history
            self.conversation_history.append({"role": "user", "content": user_input})
            self.conversation_history.append({"role": "assistant", "content": ai_response})
            
            # Keep history manageable
            if len(self.conversation_history) > 20:
                self.conversation_history = self.conversation_history[-20:]
            
            return ai_response
            
        except Exception as e:
            print(f"❌ AI Error: {e}")
            return None
    
    def speak(self, text):
        """Convert text to speech or text-only mode"""
        print(f"🤖 {self.name}: {text}")
        
        if not self.enable_tts:
            print("📝 (Text-only mode)")
            return
        
        try:
            import subprocess
            import platform
            
            # Use system TTS instead of pyttsx3 for Jupyter
            system = platform.system()
            
            if system == "Darwin":  # macOS
                subprocess.run(["say", text], check=True)
            elif system == "Windows":
                subprocess.run(["powershell", "-Command", f"Add-Type -AssemblyName System.Speech; (New-Object System.Speech.Synthesis.SpeechSynthesizer).Speak('{text}')"], check=True)
            elif system == "Linux":
                subprocess.run(["espeak", text], check=True)
            else:
                print("🔇 System TTS not available on this platform")
                
        except Exception as e:
            print(f"🔇 System TTS error: {e}")
    
    def listen(self):
        """Listen to user speech and convert to text"""
        try:
            with self.microphone as source:
                print("🎧 Listening...")
                # Adjust for ambient noise
                self.recognizer.adjust_for_ambient_noise(source, duration=1)
                # Listen for audio
                audio = self.recognizer.listen(source, timeout=5, phrase_time_limit=5)
                
            print("🔄 Processing...")
            # Convert speech to text
            text = self.recognizer.recognize_google(audio).lower()
            print(f"👤 You said: '{text}'")
            return text
            
        except sr.WaitTimeoutError:
            self.speak("I didn't hear anything. Could you try again?")
            return ""
        except sr.UnknownValueError:
            self.speak("I couldn't understand that. Could you speak more clearly?")
            return ""
        except sr.RequestError:
            self.speak("Sorry, I'm having trouble with my speech recognition service.")
            return ""
    
    def get_current_time(self):
        """Get current time"""
        now = datetime.datetime.now()
        time_str = now.strftime("%I:%M %p")
        return f"The current time is {time_str}"
    
    def get_current_date(self):
        """Get current date"""
        today = datetime.date.today()
        date_str = today.strftime("%B %d, %Y")
        return f"Today is {date_str}"
    
    def search_wikipedia(self, query):
        """Search Wikipedia for information"""
        try:
            # Remove common words
            query = query.replace("tell me about", "").replace("what is", "").strip()
            if not query:
                return "I need something to search for!"
            
            print(f"🔍 Searching for: {query}")
            summary = wikipedia.summary(query, sentences=2)
            return summary
        except wikipedia.exceptions.DisambiguationError:
            return f"There are multiple entries for {query}. Could you be more specific?"
        except wikipedia.exceptions.PageError:
            return f"I couldn't find information about {query}."
        except:
            return "I had trouble searching for that information."
    
    def simple_math(self, command):
        """Perform simple math operations"""
        try:
            # Replace words with symbols
            command = command.replace("plus", "+").replace("add", "+")
            command = command.replace("minus", "-").replace("subtract", "-")
            command = command.replace("times", "*").replace("multiply", "*")
            command = command.replace("divided by", "/").replace("divide", "/")
            
            # Extract numbers and operation
            words = command.split()
            numbers = []
            operation = None
            
            for word in words:
                if word.isdigit():
                    numbers.append(int(word))
                elif word in ['+', '-', '*', '/']:
                    operation = word
            
            if len(numbers) == 2 and operation:
                if operation == '+':
                    result = numbers[0] + numbers[1]
                elif operation == '-':
                    result = numbers[0] - numbers[1]
                elif operation == '*':
                    result = numbers[0] * numbers[1]
                elif operation == '/':
                    if numbers[1] != 0:
                        result = numbers[0] / numbers[1]
                    else:
                        return "I can't divide by zero!"
                
                return f"{numbers[0]} {operation} {numbers[1]} equals {result}"
            else:
                return "I need two numbers and an operation to calculate."
                
        except:
            return "I couldn't understand that math problem."
    
    def process_command(self, command):
        """Process user command and generate response with AI integration"""
        command = command.lower().strip()
        
        # Try AI for complex or conversational queries first
        if self.openai_client:
            # Use AI for these types of queries
            ai_triggers = [
                'how are you', 'tell me a story', 'what do you think', 'explain', 'why', 
                'how does', 'can you', 'do you', 'what would', 'i feel', 'i think',
                'opinion', 'advice', 'recommend', 'suggest', 'weather', 'founded',
                'who founded', 'how to', 'what happened', 'when did'
            ]
            
            # Use AI if query is long or contains AI trigger words
            use_ai = (len(command.split()) > 6 or 
                     any(trigger in command for trigger in ai_triggers) or
                     command.endswith('?') or
                     '?' in command)
            
            if use_ai:
                print("🧠 Using AI for intelligent response...")
                ai_response = self.get_ai_response(command)
                if ai_response:
                    return ai_response
        
        # Basic command processing (fast responses)
        if any(word in command for word in ['hello', 'hi', 'hey', 'good morning', 'good afternoon']):
            return random.choice(self.responses['greetings'])
        
        elif any(word in command for word in ['goodbye', 'bye', 'see you', 'farewell']):
            return random.choice(self.responses['farewell'])
        
        elif any(word in command for word in ['thank', 'thanks', 'appreciate']):
            return random.choice(self.responses['thanks'])
        
        elif 'time' in command:
            return self.get_current_time()
        
        elif 'date' in command or 'today' in command:
            return self.get_current_date()
        
        elif any(word in command for word in ['plus', 'minus', 'times', 'divided', 'add', 'subtract', 'multiply', 'divide']) and any(char.isdigit() for char in command):
            return self.simple_math(command)
        
        elif any(phrase in command for phrase in ['tell me about', 'what is', 'who is', 'search for']):
            return self.search_wikipedia(command)
        
        elif 'your name' in command or 'who are you' in command:
            return f"I'm {self.name}, your personal voice agent! I can help you with time, dates, simple math, and general information."
        
        elif 'help' in command or 'what can you do' in command:
            return "I can tell you the time and date, do simple math, search for information, and have conversations with you! Try asking me anything!"
        
        else:
            return random.choice(self.responses['unknown'])

# Create agents for different purposes
agent = VoiceAgent("Sophia", enable_tts=False)  # For testing in Jupyter
agent_with_voice = VoiceAgent("Sophia", enable_tts=True)  # For real conversations

# Test function
def test_voice_agent():
    """Test the voice agent with sample commands (text-only for Jupyter)"""
    print("🧪 TESTING OUR VOICE AGENT (Text-Only Mode)")
    print("=" * 40)
    
    test_commands = [
        "hello",
        "what time is it",
        "what is today's date",
        "what is 15 plus 27",
        "tell me about artificial intelligence",
        "what can you do",
        "how are you feeling today",  # This should trigger AI
        "thank you",
        "goodbye"
    ]
    
    for i, command in enumerate(test_commands, 1):
        print(f"\n🔬 Test {i}: '{command}'")
        response = agent.process_command(command)
        agent.speak(response)
        print("-" * 30)

# Voice conversation function
def start_voice_conversation():
    """Start a real voice conversation with TTS enabled"""
    print("🎉 Starting Voice Conversation with TTS!")
    print("Say 'goodbye' to end the conversation.")
    
    agent_with_voice.speak("Hello! I'm ready to have a conversation with you!")
    
    conversation_count = 0
    max_conversations = 10
    
    while conversation_count < max_conversations:
        user_input = agent_with_voice.listen()
        
        if not user_input:
            continue
            
        if any(word in user_input for word in ['goodbye', 'bye', 'exit', 'quit']):
            agent_with_voice.speak("Goodbye! It was great talking with you!")
            break
        
        response = agent_with_voice.process_command(user_input)
        agent_with_voice.speak(response)
        
        conversation_count += 1
        print(f"💬 Conversation {conversation_count}/{max_conversations}")
        print("-" * 30)
    
    if conversation_count >= max_conversations:
        agent_with_voice.speak("We've had such a wonderful conversation! Let's chat again soon!")
    
    print("\n✅ Conversation ended!")

✅ GPT-4o-mini connected successfully from .env file!
🤖 Sophia is ready to help in text-only mode!
🔍 Searching for US female voices...
🎭 Selected US female voice: Samantha
✅ GPT-4o-mini connected successfully from .env file!
🤖 Sophia is ready to help with voice enabled!


## 🎮 Step 5: Testing Our Voice Agent
 
 Let's test our voice agent with different types of commands!


In [68]:
# test_voice_agent()

start_voice_conversation()

🧪 TESTING OUR VOICE AGENT (Text-Only Mode)

🔬 Test 1: 'hello'
🤖 Sophia: Greetings! I'm Sophia. How can I assist you?
📝 (Text-only mode)
------------------------------

🔬 Test 2: 'what time is it'
🤖 Sophia: The current time is 01:18 PM
📝 (Text-only mode)
------------------------------

🔬 Test 3: 'what is today's date'
🤖 Sophia: Today is June 29, 2025
📝 (Text-only mode)
------------------------------

🔬 Test 4: 'what is 15 plus 27'
🤖 Sophia: 15 + 27 equals 42
📝 (Text-only mode)
------------------------------

🔬 Test 5: 'tell me about artificial intelligence'
🔍 Searching for: artificial intelligence
🤖 Sophia: Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelli

## 🎯 Step 6: Your Turn - Customize Your Agent!
 
Now it's time to make your voice agent unique! Try these customizations:


In [70]:
# 🎨 CUSTOMIZATION PLAYGROUND
# ===========================

# 1. Change your agent's name
my_agent = VoiceAgent("YourNameHere")  # Change this!

# 2. Add new responses
def add_custom_responses():
    """Add your own custom responses"""
    
    # Add jokes
    jokes = [
        "Why don't scientists trust atoms? Because they make up everything!",
        "What do you call a bear with no teeth? A gummy bear!",
        "Why did the math book look so sad? Because it had too many problems!"
    ]
    
    # Add facts about yourself
    personal_facts = [
        "I was created by an awesome Class 9 student!",
        "I love learning new things every day!",
        "My favorite subject is computer science!"
    ]
    
    my_agent.responses['jokes'] = jokes
    my_agent.responses['personal'] = personal_facts

# 3. Add new functionality
def enhanced_process_command(self, command):
    """Enhanced command processing with new features"""
    command = command.lower().strip()
    
    # Original processing
    original_response = VoiceAgent.process_command(self, command)
    
    # New features
    if 'joke' in command or 'funny' in command:
        return random.choice(self.responses.get('jokes', ['I need to learn some jokes!']))
    
    elif 'about you' in command or 'about yourself' in command:
        return random.choice(self.responses.get('personal', ['I am a voice agent!']))
    
    elif 'weather' in command:
        return "I don't have access to weather data yet, but you could add that feature!"
    
    elif 'favorite' in command:
        favorites = [
            "My favorite programming language is Python!",
            "I love helping students learn about technology!",
            "My favorite time is when I'm talking with you!"
        ]
        return random.choice(favorites)
    
    else:
        return original_response

# Apply customizations
add_custom_responses()
# Replace the method (advanced Python technique)
VoiceAgent.process_command = enhanced_process_command

print("🎨 Your agent has been customized!")
print("Try asking your agent:")
print("- 'Tell me a joke'")
print("- 'What's your favorite color?'")
print("- 'Tell me about yourself'")


🔍 Searching for US female voices...
🎭 Selected US female voice: Samantha
✅ GPT-4o-mini connected successfully from .env file!
🤖 YourNameHere is ready to help with voice enabled!
🎨 Your agent has been customized!
Try asking your agent:
- 'Tell me a joke'
- 'What's your favorite color?'
- 'Tell me about yourself'


## 🏆 Step 7: Final Challenge - Build Your Feature!
 
Time for the ultimate challenge! Add your own unique feature to the voice agent.


In [71]:
# 🎯 YOUR CHALLENGE: Add a new feature!
# ====================================

def your_custom_feature():
    """
    YOUR TURN: Create a unique feature for your voice agent!
    
    Ideas:
    1. Password generator
    2. Random compliment generator
    3. Study reminder system
    4. Simple game (like 20 questions)
    5. Unit converter (inches to cm, etc.)
    6. Random fact generator
    7. Motivational quotes
    
    Fill in this function with your code!
    """
    
    # Example: Random motivational quotes
    motivational_quotes = [
        "The future belongs to those who learn more skills and combine them in creative ways!",
        "Every expert was once a beginner. Keep learning!",
        "Technology is best when it brings people together!",
        "The only way to do great work is to love what you do!",
        "Innovation distinguishes between a leader and a follower!"
    ]
    
    # Your code here!
    feature_name = "Motivational Quotes"  # Change this
    
    print(f"🌟 Your custom feature: {feature_name}")
    
    # Example implementation
    quote = random.choice(motivational_quotes)
    return f"Here's some motivation for you: {quote}"

# Test your feature
my_feature_result = your_custom_feature()
print(my_feature_result)

# Now integrate it into your agent by modifying the process_command function!



🌟 Your custom feature: Motivational Quotes
Here's some motivation for you: Innovation distinguishes between a leader and a follower!


## 📊 Step 10: Understanding How It All Works
 
Let's recap what we've built and understand the technology behind it!

### 🔬 UNDERSTANDING VOICE AGENT TECHNOLOGY
==================================================

🔧 Speech Recognition (ASR)
------------------------
📝 Converts your speech into text that computers can understand

🔄 Process:
   1. Microphone captures sound waves from your voice
   2. Audio is converted to digital format
   3. AI models analyze patterns in the audio
   4. System matches patterns to known words
   5. Returns the most likely text transcription

🌍 Real-world applications: Used in Siri, Google Assistant, Alexa, voice typing


🔧 Text-to-Speech (TTS)
--------------------
📝 Converts text into natural-sounding speech

🔄 Process:
   1. System analyzes the text for meaning and grammar
   2. Determines correct pronunciation for each word
   3. Applies natural rhythm and intonation
   4. Generates audio waveforms
   5. Plays the speech through speakers

🌍 Real-world applications: GPS navigation, accessibility tools, audiobooks, virtual assistants


🔧 Natural Language Processing
---------------------------
📝 Helps computers understand the meaning behind words

🔄 Process:
   1. Breaks down sentences into individual words
   2. Identifies the intent (what user wants)
   3. Extracts key information (entities)
   4. Determines appropriate response
   5. Generates helpful output

🌍 Real-world applications: Chatbots, language translation, sentiment analysis

