# Text-to-Speech Demo

This notebook demonstrates how to use Esperanto's Text-to-Speech (TTS) providers with OpenAI, ElevenLabs, and Google Cloud TTS.

## Setup

First, make sure you have the necessary API keys in your `.env` file:
```
OPENAI_API_KEY=your_openai_key
ELEVENLABS_API_KEY=your_elevenlabs_key
GOOGLE_APPLICATION_CREDENTIALS=path/to/your/google_credentials.json
```

In [1]:
from pathlib import Path
import os
from dotenv import load_dotenv
from esperanto.factory import AIFactory
from IPython.display import Audio, display

# Load environment variables
load_dotenv()

# Create output directory
output_dir = Path.cwd() / "output"
output_dir.mkdir(exist_ok=True)

# Helper function to play audio
def play_audio(audio_data: bytes, autoplay: bool = False):
    """Display audio player in notebook."""
    display(Audio(audio_data, autoplay=autoplay))

## 1. OpenAI TTS

OpenAI's TTS service offers high-quality voices with different personalities. Let's try them all!

In [11]:
# Initialize OpenAI TTS model
openai_tts = AIFactory.create_tts("openai", model_name="gpt-4o-mini-tts")

print(openai_tts.available_voices)


{'alloy': Voice(name='alloy', id='alloy', gender='NEUTRAL', language_code='en-US', description='Neutral and balanced voice', accent=None, age=None, use_case=None, preview_url=None), 'echo': Voice(name='echo', id='echo', gender='MALE', language_code='en-US', description='Mature and deep voice', accent=None, age=None, use_case=None, preview_url=None), 'fable': Voice(name='fable', id='fable', gender='FEMALE', language_code='en-US', description='Warm and expressive voice', accent=None, age=None, use_case=None, preview_url=None), 'onyx': Voice(name='onyx', id='onyx', gender='MALE', language_code='en-US', description='Smooth and authoritative voice', accent=None, age=None, use_case=None, preview_url=None), 'nova': Voice(name='nova', id='nova', gender='FEMALE', language_code='en-US', description='Energetic and bright voice', accent=None, age=None, use_case=None, preview_url=None), 'shimmer': Voice(name='shimmer', id='shimmer', gender='FEMALE', language_code='en-US', description='Clear and pro

  openai_tts = AIFactory.create_tts("openai", model_name="gpt-4o-mini-tts")


In [12]:

text = "Hello! This is a test of OpenAI's text-to-speech capabilities."

voice = "alloy"
response = openai_tts.generate_speech(
    text=text,
    voice=voice,
    output_file=output_dir / f"openai_{voice}.mp3"
)
play_audio(response.audio_data)
print()




## 2. ElevenLabs TTS

ElevenLabs excels at multilingual TTS with high-quality voices and extensive customization options.

In [None]:
print("Using ElevenLabs' multilingual model with voice customization.\n")

# Initialize ElevenLabs TTS model
elevenlabs_tts = AIFactory.create_tts(
    "elevenlabs",
    model_name="eleven_multilingual_v2",
    # voice_settings={
    #     "voice_stability": 0.5,
    #     "voice_similarity_boost": 0.75
    # }
)
print(elevenlabs_tts.available_voices)


In [None]:
## OpenAI Compatible TTS

The OpenAI-compatible provider allows you to use any TTS endpoint that follows the OpenAI API format.

In [None]:
from esperanto.factory import AIFactory

# Initialize OpenAI-Compatible TTS model with proper configuration
custom_openai_tts = AIFactory.create_text_to_speech(
    "openai-compatible", 
    model_name="speaches-ai/Kokoro-82M-v1.0-ONNX",
    config={
        "base_url": "http://localhost:8000",  # Your OpenAI-compatible endpoint
        "api_key": "your-api-key-if-required"  # Optional, depends on your endpoint
    }
)

# Get available voices (if supported by your endpoint)
print("Available voices:", list(custom_openai_tts.available_voices.keys()))

In [None]:
text = "This is a demonstration of OpenAI-compatible text-to-speech capabilities."

# Use a voice that works with your endpoint
voice = "af_heart"  # Use a specific voice from your endpoint

response = custom_openai_tts.generate_speech(
    text=text,
    voice=voice,
    # output_file=output_dir / f"openai_compatible_{voice}.mp3"
)
play_audio(response.audio_data)
print("Generated speech successfully!")

## Gemini TTS

In [None]:
print("Using Google Cloud TTS with different voice types and audio configurations.\n")

# Initialize Google Cloud TTS model
google_tts = AIFactory.create_tts("google", model_name="gemini-2.5-flash-preview-tts")
google_tts.available_voices



In [None]:
print("Using Google Cloud TTS with different voice types and audio configurations.\n")

# Initialize Google Cloud TTS model
google_tts = AIFactory.create_tts("google", model_name="gemini-2.5-flash-preview-tts")
google_tts.available_voices



In [None]:

text = "This is a demonstration of Google Cloud Text-to-Speech."
response = google_tts.generate_speech(
    text=text,
    voice="achernar",
    output_file=output_dir / "google_normal.wav"
)
play_audio(response.audio_data)
print()

In [None]:

text = "This is a demonstration of Google Cloud Text-to-Speech."
response = await google_tts.agenerate_speech(
    text=text,
    voice="achernar",
    output_file=output_dir / "google_normal.wav"
)
play_audio(response.audio_data)


In [None]:
# Multi-speaker conversation example
conversation_text = """TTS the following conversation between Joe and Jane:
Joe: Como vai hoje, Jane?
Jane: Não muito mal, e você?
Joe: Muito bem! Tenho trabalhado em alguns projetos emocionantes.
Jane: Isso parece interessante! Conte-me mais sobre eles."""

speaker_configs = [
      {"speaker": "Joe", "voice": "rasalgethi"},    # Male, gravelly voice
      {"speaker": "Jane", "voice": "leda"}          # Female, excitable voice
  ]

response = google_tts.generate_multi_speaker_speech(
    text=conversation_text,
    speaker_configs=speaker_configs,
    output_file=output_dir / "google_multi_speaker.wav"
)
play_audio(response.audio_data)

In [None]:
# Async multi-speaker example
response = await google_tts.agenerate_multi_speaker_speech(
    text=conversation_text,
    speaker_configs=speaker_configs,
    output_file=output_dir / "google_multi_speaker_async.wav"
)
play_audio(response.audio_data)


In [None]:
for voice in list(google_tts.available_voices.keys())[:5]:
    print(voice)

    text = "This is a demonstration of Google Cloud Text-to-Speech."
    response = google_tts.generate_speech(
        text=text,
        voice=voice,
        output_file=output_dir / f"google_{voice}.wav"
    )
    play_audio(response.audio_data)
    # print()

## Vertex


In [None]:
from esperanto import AIFactory
print("Using Google Cloud TTS with different voice types and audio configurations.\n")

# Initialize Google Cloud TTS model
vertex_tts = AIFactory.create_tts("vertex", model_name="default")
vertex_tts.available_voices



In [None]:

text = "This is a demonstration of Vertex Cloud Text-to-Speech."
response = await vertex_tts.agenerate_speech(
    text=text,
    voice="en-US-Neural2-A",
    output_file=output_dir / "vertex.wav"
)
play_audio(response.audio_data)


OPen AI Compatible

In [8]:
# Initialize OpenAI TTS model
custom_openai_tts = AIFactory.create_tts("openai-compatible", model_name="speaches-ai/Kokoro-82M-v1.0-ONNX", config={
    "base_url": "http://localhost:8000/v1/"
})

text = "This is a demonstration of Custom Open AI Text-to-Speech."
response = custom_openai_tts.generate_speech(
    text=text,
    voice="af_heart",
    output_file=output_dir / "custom_openai.mp3"
)
play_audio(response.audio_data)



  custom_openai_tts = AIFactory.create_tts("openai-compatible", model_name="speaches-ai/Kokoro-82M-v1.0-ONNX", config={
