# Text-to-Speech Demo

This notebook demonstrates how to use Esperanto's Text-to-Speech (TTS) providers with OpenAI, ElevenLabs, and Google Cloud TTS.

## Setup

First, make sure you have the necessary API keys in your `.env` file:
```
OPENAI_API_KEY=your_openai_key
ELEVENLABS_API_KEY=your_elevenlabs_key
GOOGLE_APPLICATION_CREDENTIALS=path/to/your/google_credentials.json
```

In [3]:
from pathlib import Path
import os
from dotenv import load_dotenv
from esperanto.factory import AIFactory
from IPython.display import Audio, display

# Load environment variables
load_dotenv()

# Create output directory
output_dir = Path.cwd() / "output"
output_dir.mkdir(exist_ok=True)

# Helper function to play audio
def play_audio(audio_data: bytes, autoplay: bool = False):
    """Display audio player in notebook."""
    display(Audio(audio_data, autoplay=autoplay))

## 1. OpenAI TTS

OpenAI's TTS service offers high-quality voices with different personalities. Let's try them all!

In [2]:
# Initialize OpenAI TTS model
openai_tts = AIFactory.create_tts("openai", model_name="tts-1")

print(openai_tts.available_voices)


{'alloy': Voice(name='alloy', id='alloy', gender='NEUTRAL', language_code='en-US', description='Neutral and balanced voice', accent=None, age=None, use_case=None, preview_url=None), 'echo': Voice(name='echo', id='echo', gender='MALE', language_code='en-US', description='Mature and deep voice', accent=None, age=None, use_case=None, preview_url=None), 'fable': Voice(name='fable', id='fable', gender='FEMALE', language_code='en-US', description='Warm and expressive voice', accent=None, age=None, use_case=None, preview_url=None), 'onyx': Voice(name='onyx', id='onyx', gender='MALE', language_code='en-US', description='Smooth and authoritative voice', accent=None, age=None, use_case=None, preview_url=None), 'nova': Voice(name='nova', id='nova', gender='FEMALE', language_code='en-US', description='Energetic and bright voice', accent=None, age=None, use_case=None, preview_url=None), 'shimmer': Voice(name='shimmer', id='shimmer', gender='FEMALE', language_code='en-US', description='Clear and pro

  openai_tts = AIFactory.create_tts("openai", model_name="tts-1")


In [3]:

text = "Hello! This is a test of OpenAI's text-to-speech capabilities."

voice = "alloy"
response = openai_tts.generate_speech(
    text=text,
    voice=voice,
    output_file=output_dir / f"openai_{voice}.mp3"
)
play_audio(response.audio_data)
print()




## 2. ElevenLabs TTS

ElevenLabs excels at multilingual TTS with high-quality voices and extensive customization options.

In [4]:
print("Using ElevenLabs' multilingual model with voice customization.\n")

# Initialize ElevenLabs TTS model
elevenlabs_tts = AIFactory.create_tts(
    "elevenlabs",
    model_name="eleven_multilingual_v2",
    # voice_settings={
    #     "voice_stability": 0.5,
    #     "voice_similarity_boost": 0.75
    # }
)
print(elevenlabs_tts.available_voices)


Using ElevenLabs' multilingual model with voice customization.



  elevenlabs_tts = AIFactory.create_tts(


{'9BWtsMINqrJLrRacOk9x': Voice(name='Aria', id='9BWtsMINqrJLrRacOk9x', gender='FEMALE', language_code='en', description='A middle-aged female with an African-American accent. Calm with a hint of rasp.', accent=None, age=None, use_case=None, preview_url='https://storage.googleapis.com/eleven-public-prod/premade/voices/9BWtsMINqrJLrRacOk9x/405766b8-1f4e-4d3c-aba1-6f25333823ec.mp3'), 'EXAVITQu4vr4xnSDxMaL': Voice(name='Sarah', id='EXAVITQu4vr4xnSDxMaL', gender='FEMALE', language_code='en', description='Young adult woman with a confident and warm, mature quality and a reassuring, professional tone.', accent=None, age=None, use_case=None, preview_url='https://storage.googleapis.com/eleven-public-prod/premade/voices/EXAVITQu4vr4xnSDxMaL/01a3e33c-6e99-4ee7-8543-ff2216a32186.mp3'), 'FGY2WhTYpPnrIDTdsKH5': Voice(name='Laura', id='FGY2WhTYpPnrIDTdsKH5', gender='FEMALE', language_code='en', description='This young adult female voice delivers sunny enthusiasm with a quirky attitude.', accent=None,

In [5]:


# Multilingual text example
text = "Hello! 你好! Hola! नमस्ते! Bonjour! こんにちは!"

voice="JBFqnCBsd6RMkjVDRZzb"

response = elevenlabs_tts.generate_speech(
    text=text,
    voice=voice,  # One of ElevenLabs' default voices
    output_file=output_dir / "elevenlabs_multilingual.mp3"
)

play_audio(response.audio_data)

In [6]:


# Multilingual text example
text = "Hello! 你好! Hola! नमस्ते! Bonjour! こんにちは!"

voice="JBFqnCBsd6RMkjVDRZzb"

response = await elevenlabs_tts.agenerate_speech(
    text=text,
    voice=voice,  # One of ElevenLabs' default voices
    output_file=output_dir / "elevenlabs_multilingual.mp3"
)

play_audio(response.audio_data)

## 3. Google Cloud TTS

Google Cloud TTS provides extensive control over voice parameters and supports a wide range of languages and voices.

In [3]:
print("Using Google Cloud TTS with different voice types and audio configurations.\n")

# Initialize Google Cloud TTS model
google_tts = AIFactory.create_tts("google", model_name="gemini-2.5-flash-preview-tts")
google_tts.available_voices



Using Google Cloud TTS with different voice types and audio configurations.



  google_tts = AIFactory.create_tts("google", model_name="gemini-2.5-flash-preview-tts")


{'achernar': Voice(name='UpbeatAchernar', id='achernar', gender='FEMALE', language_code=None, description='UpbeatAchernar', accent=None, age=None, use_case=None, preview_url=None),
 'achird': Voice(name='ForwardAchird', id='achird', gender='NEUTRAL', language_code=None, description='ForwardAchird', accent=None, age=None, use_case=None, preview_url=None),
 'algenib': Voice(name='ClearAlgenib', id='algenib', gender='MALE', language_code=None, description='ClearAlgenib', accent=None, age=None, use_case=None, preview_url=None),
 'algieba': Voice(name='Easy-goingAlgieba', id='algieba', gender='MALE', language_code=None, description='Easy-goingAlgieba', accent=None, age=None, use_case=None, preview_url=None),
 'alnilam': Voice(name='SoftAlnilam', id='alnilam', gender='MALE', language_code=None, description='SoftAlnilam', accent=None, age=None, use_case=None, preview_url=None),
 'aoede': Voice(name='FirmAoede', id='aoede', gender='FEMALE', language_code=None, description='FirmAoede', accent=N

In [8]:

text = "This is a demonstration of Google Cloud Text-to-Speech."
response = google_tts.generate_speech(
    text=text,
    voice="achernar",
    output_file=output_dir / "google_normal.wav"
)
play_audio(response.audio_data)
print()




In [9]:

text = "This is a demonstration of Google Cloud Text-to-Speech."
response = await google_tts.agenerate_speech(
    text=text,
    voice="achernar",
    output_file=output_dir / "google_normal.wav"
)
play_audio(response.audio_data)


In [4]:
# Multi-speaker conversation example
conversation_text = """TTS the following conversation between Joe and Jane:
Joe: Como vai hoje, Jane?
Jane: Não muito mal, e você?
Joe: Muito bem! Tenho trabalhado em alguns projetos emocionantes.
Jane: Isso parece interessante! Conte-me mais sobre eles."""

speaker_configs = [
      {"speaker": "Joe", "voice": "rasalgethi"},    # Male, gravelly voice
      {"speaker": "Jane", "voice": "leda"}          # Female, excitable voice
  ]

response = google_tts.generate_multi_speaker_speech(
    text=conversation_text,
    speaker_configs=speaker_configs,
    output_file=output_dir / "google_multi_speaker.wav"
)
play_audio(response.audio_data)

In [11]:
# Async multi-speaker example
response = await google_tts.agenerate_multi_speaker_speech(
    text=conversation_text,
    speaker_configs=speaker_configs,
    output_file=output_dir / "google_multi_speaker_async.wav"
)
play_audio(response.audio_data)


In [None]:
for voice in list(google_tts.available_voices.keys())[:5]:
    print(voice)

    text = "This is a demonstration of Google Cloud Text-to-Speech."
    response = google_tts.generate_speech(
        text=text,
        voice=voice,
        output_file=output_dir / f"google_{voice}.wav"
    )
    play_audio(response.audio_data)
    # print()

schedar


sulafat


umbriel


vindemiatrix


zephyr


zubenelgenubi


## Vertex


In [7]:
from esperanto import AIFactory
print("Using Google Cloud TTS with different voice types and audio configurations.\n")

# Initialize Google Cloud TTS model
vertex_tts = AIFactory.create_tts("vertex", model_name="gemini-2.5-flash-preview-tts")
vertex_tts.available_voices



Using Google Cloud TTS with different voice types and audio configurations.



  vertex_tts = AIFactory.create_tts("vertex", model_name="gemini-2.5-flash-preview-tts")


{'en-US-Standard-A': Voice(name='en-US-Standard-A', id='en-US-Standard-A', gender='FEMALE', language_code='en-US', description='Standard English (US) Female Voice A', accent=None, age=None, use_case=None, preview_url=None),
 'en-US-Standard-B': Voice(name='en-US-Standard-B', id='en-US-Standard-B', gender='MALE', language_code='en-US', description='Standard English (US) Male Voice B', accent=None, age=None, use_case=None, preview_url=None),
 'en-US-Neural2-A': Voice(name='en-US-Neural2-A', id='en-US-Neural2-A', gender='FEMALE', language_code='en-US', description='Neural2 English (US) Female Voice A', accent=None, age=None, use_case=None, preview_url=None),
 'en-US-Neural2-B': Voice(name='en-US-Neural2-B', id='en-US-Neural2-B', gender='MALE', language_code='en-US', description='Neural2 English (US) Male Voice B', accent=None, age=None, use_case=None, preview_url=None),
 'en-US-Wavenet-A': Voice(name='en-US-Wavenet-A', id='en-US-Wavenet-A', gender='FEMALE', language_code='en-US', descript

In [8]:

text = "This is a demonstration of Vertex Cloud Text-to-Speech."
response = await vertex_tts.agenerate_speech(
    text=text,
    voice="en-US-Neural2-A",
    output_file=output_dir / "vertex.wav"
)
play_audio(response.audio_data)
