# TextToSpeech Class Demonstration

This notebook demonstrates the capabilities of the TextToSpeech class for generating synthetic audio data. We'll cover both simple single-speaker text-to-speech and complex multi-speaker dialogue scenarios.

## Features Demonstrated
1. **Single Speaker TTS** - Basic text-to-speech conversion
2. **Multi-Speaker Dialogues** - Complex conversations with multiple voices  
3. **Call Center Example** - Realistic customer service interaction
4. **Random Speaker Assignment** - Automatic voice selection for characters
5. **Snowflake Integration** - Saving audio files to Snowflake stages


## Setup and Imports

First, let's import the necessary modules and set up our TextToSpeech instances.


In [None]:
# install libraries
!pip install --quiet --root-user-action=ignore coqui-tts

In [None]:
# Append parent directory to sys.path for relative imports
import sys
sys.path.append('/home/app')

import os
import numpy as np
import soundfile as sf
import streamlit as st
from snowflake.snowpark.context import get_active_session

from audio.text_to_speech import TextToSpeech
from audio.voices import VOICES


print("✅ Imports successful!")
print(f"Available models in VOICES: {list(VOICES.keys())}")


## Example 1: Simple Single-Speaker Text-to-Speech

Let's start with a basic example using a fast single-speaker model. This is perfect for generating simple announcements or narrations.


In [None]:
# Initialize a fast single-speaker TTS model
print("🔄 Loading single-speaker TTS model...")
tts_simple = TextToSpeech(model="tts_models/en/ljspeech/tacotron2-DDC")
print("✅ Model loaded successfully!")

# Generate simple text-to-speech
text = "Welcome to our synthetic data generation platform. This audio was created using advanced text-to-speech technology."

print(f"🎤 Generating speech for: '{text}'")
audio_data = tts_simple.text_to_speech(text)

print(f"✅ Audio generated! Shape: {audio_data.shape}, Duration: {len(audio_data)/tts_simple.sample_rate:.2f} seconds")

# Play the audio in the notebook
st.audio(audio_data, sample_rate=tts_simple.sample_rate)

## Example 2: Multi-Speaker Dialogue - Call Center Scenario

Now let's create a more complex example with multiple speakers. We'll simulate a customer service call with realistic dialogue between a representative and a customer.


In [None]:
# Initialize multi-speaker, multilingual TTS model
print("🔄 Loading multi-speaker TTS model (this may take a moment)...")
tts_dialogue = TextToSpeech(model="tts_models/multilingual/multi-dataset/xtts_v2")
print("✅ Multi-speaker model loaded successfully!")

# Let's explore the available speakers
print(f"\n📢 Total available speakers: {len(tts_dialogue.tts_model.speakers)}")
print("🎭 Sample female voices:", VOICES["tts_models/multilingual/multi-dataset/xtts_v2"]["female_voices"][:5])
print("🎭 Sample male voices:", VOICES["tts_models/multilingual/multi-dataset/xtts_v2"]["male_voices"][:5])

In [None]:
# Create a realistic call center dialogue
call_center_dialogue = {
    "segments": [
        {
            "text": "Thank you for calling TechCorp customer service. This is Sarah speaking. How can I help you today?",
            "speaker": "Representative"
        },
        {
            "text": "Hi Sarah, I'm having trouble with my recent order. I placed it three days ago but haven't received any tracking information.",
            "speaker": "Customer"
        },
        {
            "text": "I'm sorry to hear that you're experiencing this issue. Let me look up your order right away. Can you please provide me with your order number?",
            "speaker": "Representative"
        },
        {
            "text": "Sure, it's order number T-C-K-2-0-2-4-dash-7-8-9-1.",
            "speaker": "Customer"
        },
        {
            "text": "Perfect, thank you. I can see your order here. It looks like there was a slight delay in processing, but I have good news - your order was shipped yesterday and should arrive tomorrow. Let me send you the tracking number right now.",
            "speaker": "Representative"
        },
        {
            "text": "Oh that's great! Thank you so much for checking on that. I really appreciate your help.",
            "speaker": "Customer"
        },
        {
            "text": "You're very welcome! Is there anything else I can help you with today?",
            "speaker": "Representative"
        },
        {
            "text": "No, that takes care of everything. Thank you again, Sarah.",
            "speaker": "Customer"
        },
        {
            "text": "My pleasure! Have a wonderful day and thank you for choosing TechCorp.",
            "speaker": "Representative"
        }
    ]
}

print("📝 Call center dialogue script created!")
print(f"💬 Total segments: {len(call_center_dialogue['segments'])}")
for i, segment in enumerate(call_center_dialogue['segments'], 1):
    print(f"  {i}. {segment['speaker']}: {segment['text'][:50]}...")


### Version A: Manual Speaker Assignment

First, let's create the dialogue with manually chosen speakers for each role.


In [None]:
# Create dialogue with specific speaker assignment
# Map our character names to actual TTS voices
manual_dialogue = {
    "segments": []
}

for segment in call_center_dialogue["segments"]:
    new_segment = segment.copy()
    if segment["speaker"] == "Representative":
        new_segment["speaker"] = "Claribel Dervla"  # Professional female voice for rep
    elif segment["speaker"] == "Customer": 
        new_segment["speaker"] = "Andrew Chipper"   # Friendly male voice for customer
    manual_dialogue["segments"].append(new_segment)

print("🎤 Generating call center dialogue with manual speaker assignment...")
print("👩‍💼 Representative: Claribel Dervla (female)")
print("👨‍💻 Customer: Andrew Chipper (male)")

dialogue_audio_manual = tts_dialogue.create_dialogue(
    manual_dialogue, 
    language="en"
)

print(f"✅ Dialogue generated! Duration: {len(dialogue_audio_manual)/tts_dialogue.sample_rate:.1f} seconds")

# Play the dialogue
st.audio(dialogue_audio_manual, sample_rate=tts_dialogue.sample_rate)

### Version B: Random Speaker Assignment

Now let's try the same dialogue with automatic random speaker assignment. This feature automatically assigns voices to characters, alternating between male and female voices.


In [None]:
# Generate the same dialogue with random speaker assignment
print("🎲 Generating call center dialogue with random speaker assignment...")
print("🔄 The system will automatically assign voices, alternating gender between speakers")

dialogue_audio_random = tts_dialogue.create_dialogue(
    call_center_dialogue,  # Using original dialogue with character names
    language="en",
    random_speaker=True    # Enable automatic voice assignment
)

print(f"✅ Random dialogue generated! Duration: {len(dialogue_audio_random)/tts_dialogue.sample_rate:.1f} seconds")
print("🎭 Each character was automatically assigned a voice from the available pool")

# Play the random-assigned dialogue
st.audio(dialogue_audio_random, sample_rate=tts_dialogue.sample_rate)


## Example 3: Snowflake Integration

The TextToSpeech class can directly save generated audio files to Snowflake stages. 

**Note**: The following example requires an active Snowflake session.


In [None]:
# Example of saving to Snowflake stage (requires active session)
session = get_active_session()

# Check if we have an active Snowflake session
session = tts_dialogue.check_session()
print("✅ Active Snowflake session found!")

# Generate audio and save directly to stage
announcement_text = "This audio file was generated automatically and saved to Snowflake stage."

print("🎤 Generating audio and saving to Snowflake stage...")
audio_with_stage = tts_dialogue.text_to_speech(
    text=announcement_text,
    speaker="Sofia Hellen",
    language="en",
    stage_location="@AUDIO/generated_announcement.wav"  # Adjust stage name as needed
)

print("🔊 Downloading audio file from Snowflake stage and play it.")
staged_audio_file = session.file.get_stream('@AUDIO/generated_announcement.wav')
st.audio(staged_audio_file)

## Example 4: Saving Audio Files Locally

You can also save the generated audio files locally for further processing or analysis.


In [None]:
# Save generated audio files locally
output_dir = "generated_audio"
os.makedirs(output_dir, exist_ok=True)

# Save the simple TTS example
simple_path = os.path.join(output_dir, "simple_announcement.wav")
sf.write(simple_path, audio_data, tts_simple.sample_rate)
print(f"💾 Simple TTS saved to: {simple_path}")

# Save the manual dialogue
manual_path = os.path.join(output_dir, "call_center_manual.wav")
sf.write(manual_path, dialogue_audio_manual, tts_dialogue.sample_rate)
print(f"💾 Manual dialogue saved to: {manual_path}")

# Save the random dialogue  
random_path = os.path.join(output_dir, "call_center_random.wav")
sf.write(random_path, dialogue_audio_random, tts_dialogue.sample_rate)
print(f"💾 Random dialogue saved to: {random_path}")

print(f"\n📁 All audio files saved in '{output_dir}/' directory")
print("🎧 You can play these files with any audio player or use them in your applications")


# Example 5: Voice Conversion Between Different Speakers

This example demonstrates voice conversion, which transforms the voice characteristics of source audio 
to match a target speaker's voice while preserving the original speech content. This is useful for:
- Creating consistent voice branding across different audio content
- Voice anonymization for privacy protection
- Generating variations of existing audio with different speaker characteristics
- Creating personalized voices for accessibility applications

In [None]:
# Load voice conversion model for converting between different speakers
print("🔄 Loading voice conversion model...")
tts_model = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False).to("cuda")

# Convert voice from source audio to target speaker's voice
print("🎭 Converting voice from source audio to target speaker...")
converted_voice = tts_model.voice_conversion(source_wav='audio/harvard.wav', target_wav='audio/obama_sample1.wav')

# Play the converted audio
print("🔊 Playing converted voice audio...")
st.audio(converted_voice, sample_rate=22050)

## Summary and Next Steps

This notebook demonstrated the key capabilities of the TextToSpeech class:

### ✅ What We Covered
1. **Single-Speaker TTS** - Fast synthesis with `tts_models/en/ljspeech/tacotron2-DDC`
2. **Multi-Speaker Dialogues** - Complex conversations using `tts_models/multilingual/multi-dataset/xtts_v2`
3. **Call Center Simulation** - Realistic customer service interaction
4. **Manual vs Random Speaker Assignment** - Different approaches to voice selection
5. **Voice Conversion** - Convert voices
6. **Snowflake Integration** - Direct saving to Snowflake stages
7. **Local File Operations** - Saving audio files for further use

### 🚀 Potential Applications
- **Customer Service Training**: Generate realistic call center scenarios
- **Content Creation**: Automated narration and announcements  
- **Data Augmentation**: Create diverse audio datasets for ML training
- **Accessibility**: Convert text content to audio for visually impaired users
- **Interactive Systems**: Voice interfaces and conversational AI

### 🔧 Advanced Features to Explore
- Different TTS models for various languages and voice qualities
- Integration with other Snowflake ML features
- Batch processing of large text datasets
- Custom voice cloning and fine-tuning
- Real-time streaming applications

### 📚 Additional Resources
- [Coqui TTS Documentation](https://tts.readthedocs.io/)
- [Snowflake ML Documentation](https://docs.snowflake.com/en/developer-guide/snowpark-ml/index)
- Check the `voices.py` file for all available voice options
