## Text-to-Speech Utilities

This section provides better alternatives to pyttsx3 for text-to-speech conversion.

### Available TTS Options:

1. **gTTS (Google Text-to-Speech)** - High-quality speech using Google's API
2. **Cloud-based options** - Google Cloud TTS, Amazon Polly, Microsoft Azure
3. **Local options** - Mozilla TTS, ESPnet, Coqui TTS

In [3]:
# Install required packages
!pip install -q gtts edge-tts nest_asyncio

In [8]:
# Configure asyncio for Jupyter notebooks
import asyncio
import nest_asyncio  # Solution for asyncio in notebooks

# Apply nest_asyncio to allow asyncio.run() in notebooks
nest_asyncio.apply()

In [9]:
# Option 1: Google Text-to-Speech (gTTS)
# Install with: pip install gtts

def tts_gtts(text, output_filename, lang='en', slow=False):
    """
    Convert text to speech using Google's Text-to-Speech API (gTTS).
    
    Args:
        text (str): The text to convert to speech
        output_filename (str): Full path to save the output MP3 file
        lang (str): Language code (default: 'en')
        slow (bool): Whether to speak slowly (default: False)
    
    Returns:
        bool: True if successful, False otherwise
    """
    try:
        from gtts import gTTS
        import os
        from pathlib import Path
        
        # Make sure output directory exists
        output_path = Path(output_filename)
        output_path.parent.mkdir(parents=True, exist_ok=True)
        
        # Create gTTS object
        tts = gTTS(text=text, lang=lang, slow=slow)
        
        # Save to file (MP3 format)
        tts.save(str(output_path))
        
        print(f"Audio saved to: {output_path}")
        return True
    except Exception as e:
        print(f"Error in text-to-speech conversion: {str(e)}")
        return False

In [10]:
# Edge TTS implementation - Microsoft Edge browser's TTS engine
import asyncio

async def run_edge_tts(text, output_filename, voice="en-US-AriaNeural"):
    """Convert text to speech using Microsoft Edge TTS
    
    Args:
        text (str): The text to convert to speech
        output_filename (str): Full path to save the output MP3 file
        voice (str): Voice identifier (default: 'en-US-AriaNeural')
        
    Returns:
        bool: True if successful, False otherwise
    """
    try:
        import edge_tts
        from pathlib import Path
        
        # Ensure output directory exists
        output_path = Path(output_filename)
        output_path.parent.mkdir(parents=True, exist_ok=True)
        
        # Set up TTS communication
        communicate = edge_tts.Communicate(text, voice)
        
        # Save audio to file
        await communicate.save(str(output_path))
        
        print(f"Audio saved to: {output_path}")
        return True
        
    except Exception as e:
        print(f"Error in text-to-speech conversion: {str(e)}")
        return False

In [11]:
# Synchronous wrapper for edge-tts
def tts_edge(text, output_filename, voice="en-US-AriaNeural"):
    """Synchronous wrapper for edge-tts"""
    return asyncio.run(run_edge_tts(text, output_filename, voice))

### Comparison of TTS Options

| Library         | Quality | Internet Required | Installation | Output Format |
|----------------|---------|-------------------|-------------|--------------|
| gTTS           | Good    | Yes               | Simple      | MP3          |
| Google Cloud TTS| Excellent | Yes            | Complex     | MP3/WAV/OGG  |
| Edge TTS       | Very Good | Yes (first use) | Simple      | MP3          |
| pyttsx3        | Basic   | No                | Simple      | WAV          |

**Recommendations:**

1. **For quick implementation**: Use gTTS - good quality, easy to use
2. **For production quality**: Use Google Cloud TTS - excellent quality, many voices
3. **For offline use**: Use Edge TTS - good quality with cached voices

All these options are more reliable than pyttsx3 and provide better quality speech output.

In [7]:
# Example usage - Uncomment the one you want to try

text = "Hello, this is a test of the text-to-speech system. This should sound much better than pyttsx3."
output_file = "./tts_output/sample_speech.mp3"

# Try gTTS (Google Text-to-Speech)
# tts_gtts(text, output_file)

# Try Edge TTS (Microsoft Edge TTS) - Jupyter compatible way with await
# Method 1: Using await directly (requires that the run_edge_tts function is defined)
await run_edge_tts(text, output_file)

# Method 2: Or use the synchronous wrapper
# tts_edge(text, output_file)

# Try Google Cloud TTS (requires API credentials)
# tts_google_cloud(text, output_file)

Audio saved to: tts_output/sample_speech.mp3


True

In [None]:
# Fallback option: Using gtts if edge-tts doesn't work
try:
    from gtts import gTTS
    tts = gTTS(text="This is a fallback test using gTTS, which has no asyncio issues.")
    output_path = "./tts_output/fallback_speech.mp3"
    from pathlib import Path
    Path(output_path).parent.mkdir(parents=True, exist_ok=True)
    tts.save(output_path)
    print(f"Fallback audio saved to: {output_path}")
except Exception as e:
    print(f"Error with fallback TTS: {str(e)}")

## Batch Text-to-Speech Conversion for Commands

This section converts a set of predefined commands to speech files that can be used in the application.

In [12]:
# Define texts and output paths for batch conversion
command_texts = [
    "Hello, what can you do?",
    "Find me vegeterian soup recipes and tag the recipes. 5 recipes",
    "I like to know about third recipe in the list but not review",
    "Show the recipe reviews",
    "Get nutriotion information for this recipe",
    "Run the nutrition analysis for the recipe we just discussed.",
    "make this recipe more healthy for low fat diet",
    "What's a good substitute for egg yolks"
]

# Define output paths and labels
command_labels = [
    "1.Intro",
    "2.Recipe Search",
    "3.Get Recipe Info",
    "4.Check the Reviews", 
    "5.Get Nutrition Info",
    "6.Nutrition Analysis",
    "7.Recipe Customization", 
    "8.Search on Internet"
]

# Create output folder
import os
from pathlib import Path

output_folder = Path("./command_voices")
output_folder.mkdir(parents=True, exist_ok=True)

# Function to generate file name from label
def get_filename(label, speaker="Nariman", ext="ogg"):
    sanitized_label = label.replace(" ", "_").replace(".", "")
    return f"{label.split('.')[0]}.{speaker}_{sanitized_label}.{ext}"

In [13]:
# Batch conversion function using Edge TTS
async def batch_convert_edge_tts(texts, labels, output_folder, voice="en-US-AriaNeural", file_ext="ogg"):
    """Convert multiple texts to speech using Edge TTS
    
    Args:
        texts (list): List of texts to convert
        labels (list): List of corresponding labels
        output_folder (Path): Folder to save output files
        voice (str): Voice to use for TTS
        file_ext (str): File extension (ogg or mp3)
        
    Returns:
        list: List of generated file paths
    """
    import edge_tts
    from pathlib import Path
    
    output_files = []
    
    for i, (text, label) in enumerate(zip(texts, labels)):
        # Generate filename from label
        filename = get_filename(label, "Nariman", file_ext)
        output_path = output_folder / filename
        
        # Set up TTS communication
        communicate = edge_tts.Communicate(text, voice)
        
        # Save audio to file
        await communicate.save(str(output_path))
        
        print(f"[{i+1}/{len(texts)}] Saved: {output_path}")
        output_files.append(str(output_path))
    
    return output_files

In [14]:
# Batch conversion function using gTTS (as fallback)
def batch_convert_gtts(texts, labels, output_folder, lang='en', slow=False, file_ext="mp3"):
    """Convert multiple texts to speech using gTTS
    
    Args:
        texts (list): List of texts to convert
        labels (list): List of corresponding labels
        output_folder (Path): Folder to save output files
        lang (str): Language code
        slow (bool): Whether to speak slowly
        file_ext (str): File extension (usually mp3 for gTTS)
        
    Returns:
        list: List of generated file paths
    """
    from gtts import gTTS
    from pathlib import Path
    
    output_files = []
    
    for i, (text, label) in enumerate(zip(texts, labels)):
        # Generate filename from label
        filename = get_filename(label, "Nariman", file_ext)
        output_path = output_folder / filename
        
        # Create gTTS object and save
        tts = gTTS(text=text, lang=lang, slow=slow)
        tts.save(str(output_path))
        
        print(f"[{i+1}/{len(texts)}] Saved: {output_path}")
        output_files.append(str(output_path))
    
    return output_files

In [15]:
# Convert commands using Edge TTS (better quality)
# Choose a good voice - examples:
# - en-US-AriaNeural (female)
# - en-US-GuyNeural (male)
# - en-GB-SoniaNeural (British female)

voice = "en-US-AriaNeural"  # Change as needed
file_ext = "ogg"  # ogg files are typically smaller with good quality

try:
    # Run the batch conversion
    generated_files = await batch_convert_edge_tts(
        command_texts, 
        command_labels, 
        output_folder, 
        voice=voice,
        file_ext=file_ext
    )
    print(f"Successfully generated {len(generated_files)} audio files.")
except Exception as e:
    print(f"Error with Edge TTS: {str(e)}")
    print("Falling back to gTTS...")
    # Fallback to gTTS
    generated_files = batch_convert_gtts(
        command_texts, 
        command_labels, 
        output_folder,
        file_ext="mp3"
    )
    print(f"Successfully generated {len(generated_files)} audio files using fallback.")

[1/8] Saved: command_voices/1.Nariman_1Intro.ogg
[2/8] Saved: command_voices/2.Nariman_2Recipe_Search.ogg
[3/8] Saved: command_voices/3.Nariman_3Get_Recipe_Info.ogg
[4/8] Saved: command_voices/4.Nariman_4Check_the_Reviews.ogg
[5/8] Saved: command_voices/5.Nariman_5Get_Nutrition_Info.ogg
[6/8] Saved: command_voices/6.Nariman_6Nutrition_Analysis.ogg
[7/8] Saved: command_voices/7.Nariman_7Recipe_Customization.ogg
[8/8] Saved: command_voices/8.Nariman_8Search_on_Internet.ogg
Successfully generated 8 audio files.


In [16]:
# Generate voice options list for application
def generate_voice_options_list(output_folder, base_path="/kaggle/input/voices-of-commands-genai-capstone-2025"):
    """Generate a list of voice options for the application"""
    # List all files in the output folder
    files = list(output_folder.glob("*.ogg")) + list(output_folder.glob("*.mp3"))
    
    # Create voice options list
    voice_options = [("Select Voice...", None)]
    
    for file in files:
        # Get the base name without extension
        name = file.stem
        # Generate label from filename
        if "_" in name:
            label_parts = name.split("_", 1)
            if "." in label_parts[0]:
                label = label_parts[0]  # Use numbered label if it exists
            else:
                label = name.replace("_", " ")
        else:
            label = name
            
        # Create the voice option tuple with the kaggle path
        kaggle_path = f"{base_path}/{file.name}"
        voice_options.append((label, kaggle_path))
    
    # Add additional default voice options
    voice_options.append(("Nariman 1", f"{base_path}/Nariman_1.ogg"))
    voice_options.append(("Neda 1", f"{base_path}/Neda_1.ogg"))
    
    return voice_options

# Generate the voice options list
voice_options = generate_voice_options_list(output_folder)

# Print the voice options in the format we need
print("voice_options = [")
for label, path in voice_options:
    print(f"    (\"{label}\", \"{path if path else ''}\"),")
print("]")

voice_options = [
    ("Select Voice...", ""),
    ("7.Nariman", "/kaggle/input/voices-of-commands-genai-capstone-2025/7.Nariman_7Recipe_Customization.ogg"),
    ("5.Nariman", "/kaggle/input/voices-of-commands-genai-capstone-2025/5.Nariman_5Get_Nutrition_Info.ogg"),
    ("6.Nariman", "/kaggle/input/voices-of-commands-genai-capstone-2025/6.Nariman_6Nutrition_Analysis.ogg"),
    ("1.Nariman", "/kaggle/input/voices-of-commands-genai-capstone-2025/1.Nariman_1Intro.ogg"),
    ("4.Nariman", "/kaggle/input/voices-of-commands-genai-capstone-2025/4.Nariman_4Check_the_Reviews.ogg"),
    ("8.Nariman", "/kaggle/input/voices-of-commands-genai-capstone-2025/8.Nariman_8Search_on_Internet.ogg"),
    ("3.Nariman", "/kaggle/input/voices-of-commands-genai-capstone-2025/3.Nariman_3Get_Recipe_Info.ogg"),
    ("2.Nariman", "/kaggle/input/voices-of-commands-genai-capstone-2025/2.Nariman_2Recipe_Search.ogg"),
    ("Nariman 1", "/kaggle/input/voices-of-commands-genai-capstone-2025/Nariman_1.ogg"),
    ("Neda 1

In [17]:
# Check what files were created
import os

print("Generated audio files:")
for i, file in enumerate(sorted(os.listdir(output_folder))):
    file_path = os.path.join(output_folder, file)
    file_size = os.path.getsize(file_path) / 1024  # Size in KB
    print(f"{i+1}. {file} ({file_size:.1f} KB)")

Generated audio files:
1. 1.Nariman_1Intro.ogg (14.2 KB)
2. 2.Nariman_2Recipe_Search.ogg (38.7 KB)
3. 3.Nariman_3Get_Recipe_Info.ogg (23.9 KB)
4. 4.Nariman_4Check_the_Reviews.ogg (15.0 KB)
5. 5.Nariman_5Get_Nutrition_Info.ogg (21.0 KB)
6. 6.Nariman_6Nutrition_Analysis.ogg (25.2 KB)
7. 7.Nariman_7Recipe_Customization.ogg (22.5 KB)
8. 8.Nariman_8Search_on_Internet.ogg (19.1 KB)


## Voice Dataset for Kaggle

### Dataset: Voices of Commands - GenAI Capstone 2025

#### Description
This dataset contains high-quality voice recordings for the ChefBelle AI Kitchen Assistant project, developed as part of the Google GenAI Intensive Course Capstone 2025Q1. The voice recordings represent various user commands and queries related to recipe search, nutrition information, recipe customization, and more.

#### Purpose
The voice files are designed to demonstrate and test speech interaction capabilities of the ChefBelle AI Kitchen Assistant. These files showcase typical user interactions with the AI system, allowing for realistic testing and demonstration of the application's voice response features.

#### File Information
The dataset includes the following voice command recordings:

1. **1.Nariman_intro.ogg** - Basic greeting and capability query ("Hello, what can you do?")
2. **2.Nariman_search.ogg** - Recipe search query ("Find me vegeterian soup recipes and tag the recipes. 5 recipes")
3. **3.Nariman_get_info.ogg** - Specific recipe information request ("I like to know about third recipe in the list but not review")
4. **4.Nariman_review.ogg** - Request for recipe reviews ("Show the recipe reviews")
5. **5.Nariman_nutrition.ogg** - Nutrition information query ("Get nutriotion information for this recipe")
6. **6.Nariman_nutrition_analysis.ogg** - Detailed nutrition analysis request ("Run the nutrition analysis for the recipe we just discussed.")
7. **7.Nariman_customization.ogg** - Recipe modification query ("make this recipe more healthy for low fat diet")
8. **8.Nariman_grounding.ogg** - General cooking knowledge query ("What's a good substitute for egg yolks")
9. **Nariman_1.ogg** - Additional voice sample
10. **Neda_1.ogg** - Alternative voice sample

#### Technical Specifications
- **Audio Format**: OGG Vorbis (.ogg) - Selected for its excellent compression while maintaining good audio quality
- **Voice Type**: Generated using Microsoft Edge TTS (Neural voices)
- **Primary Voice**: en-US-AriaNeural (female voice)
- **Language**: English (US)
- **Quality**: High-quality synthesized speech with natural intonation and rhythm

#### Usage in ChefBelle Project
These voice recordings are used in the user interface of the ChefBelle AI Kitchen Assistant to demonstrate command examples and provide audio feedback. The application loads these files as part of its interactive demo capabilities, showing users the types of questions they can ask the AI assistant.

#### Citation and Credits
This voice dataset was created specifically for the Google GenAI Intensive Course Capstone 2025Q1 project. The synthesized voices are generated using Microsoft's Edge TTS technology and are used for educational and demonstration purposes only.