# Text-to-Speech Demo

This notebook demonstrates how to use Esperanto's Text-to-Speech (TTS) providers with OpenAI, ElevenLabs, and Google Cloud TTS.

## Setup

First, make sure you have the necessary API keys in your `.env` file:
```
OPENAI_API_KEY=your_openai_key
ELEVENLABS_API_KEY=your_elevenlabs_key
GOOGLE_APPLICATION_CREDENTIALS=path/to/your/google_credentials.json
```

In [1]:
from pathlib import Path
import os
from dotenv import load_dotenv
from esperanto.factory import AIFactory
from IPython.display import Audio, display

# Load environment variables
load_dotenv()

# Create output directory
output_dir = Path.cwd() / "output"
output_dir.mkdir(exist_ok=True)

# Helper function to play audio
def play_audio(audio_data: bytes, autoplay: bool = False):
    """Display audio player in notebook."""
    display(Audio(audio_data, autoplay=autoplay))

## 1. OpenAI TTS

OpenAI's TTS service offers high-quality voices with different personalities. Let's try them all!

In [2]:
# Initialize OpenAI TTS model
openai_tts = AIFactory.create_tts("openai", model_name="tts-1")

print(openai_tts.available_voices)


{'alloy': Voice(name='alloy', id='alloy', gender='NEUTRAL', language_code='en-US', description='Neutral and balanced voice', accent=None, age=None, use_case=None, preview_url=None), 'echo': Voice(name='echo', id='echo', gender='MALE', language_code='en-US', description='Mature and deep voice', accent=None, age=None, use_case=None, preview_url=None), 'fable': Voice(name='fable', id='fable', gender='FEMALE', language_code='en-US', description='Warm and expressive voice', accent=None, age=None, use_case=None, preview_url=None), 'onyx': Voice(name='onyx', id='onyx', gender='MALE', language_code='en-US', description='Smooth and authoritative voice', accent=None, age=None, use_case=None, preview_url=None), 'nova': Voice(name='nova', id='nova', gender='FEMALE', language_code='en-US', description='Energetic and bright voice', accent=None, age=None, use_case=None, preview_url=None), 'shimmer': Voice(name='shimmer', id='shimmer', gender='FEMALE', language_code='en-US', description='Clear and pro

In [3]:

text = "Hello! This is a test of OpenAI's text-to-speech capabilities."

voice = "alloy"
response = openai_tts.generate_speech(
    text=text,
    voice=voice,
    output_file=output_dir / f"openai_{voice}.mp3"
)
play_audio(response.audio_data)
print()




In [4]:

text = "Hello! This is a test of OpenAI's text-to-speech capabilities."

voice = "alloy"
response = await openai_tts.agenerate_speech(
    text=text,
    voice=voice,
    output_file=output_dir / f"openai_{voice}.mp3"
)
play_audio(response.audio_data)
print()




## 2. ElevenLabs TTS

ElevenLabs excels at multilingual TTS with high-quality voices and extensive customization options.

In [5]:
print("Using ElevenLabs' multilingual model with voice customization.\n")

# Initialize ElevenLabs TTS model
elevenlabs_tts = AIFactory.create_tts(
    "elevenlabs",
    model_name="eleven_multilingual_v2",
    voice_settings={
        "voice_stability": 0.5,
        "voice_similarity_boost": 0.75
    }
)
print(elevenlabs_tts.available_voices)


Using ElevenLabs' multilingual model with voice customization.

{'9BWtsMINqrJLrRacOk9x': Voice(name='Aria', id='9BWtsMINqrJLrRacOk9x', gender='FEMALE', language_code='en', description=None, accent=None, age=None, use_case=None, preview_url='https://storage.googleapis.com/eleven-public-prod/premade/voices/9BWtsMINqrJLrRacOk9x/405766b8-1f4e-4d3c-aba1-6f25333823ec.mp3'), 'CwhRBWXzGAHq8TQ4Fs17': Voice(name='Roger', id='CwhRBWXzGAHq8TQ4Fs17', gender='MALE', language_code='en', description=None, accent=None, age=None, use_case=None, preview_url='https://storage.googleapis.com/eleven-public-prod/premade/voices/CwhRBWXzGAHq8TQ4Fs17/58ee3ff5-f6f2-4628-93b8-e38eb31806b0.mp3'), 'EXAVITQu4vr4xnSDxMaL': Voice(name='Sarah', id='EXAVITQu4vr4xnSDxMaL', gender='FEMALE', language_code='en', description=None, accent=None, age=None, use_case=None, preview_url='https://storage.googleapis.com/eleven-public-prod/premade/voices/EXAVITQu4vr4xnSDxMaL/01a3e33c-6e99-4ee7-8543-ff2216a32186.mp3'), 'FGY2WhTYpPnrIDTd

In [6]:


# Multilingual text example
text = "Hello! 你好! Hola! नमस्ते! Bonjour! こんにちは!"

voice="JBFqnCBsd6RMkjVDRZzb"

response = elevenlabs_tts.generate_speech(
    text=text,
    voice=voice,  # One of ElevenLabs' default voices
    output_file=output_dir / "elevenlabs_multilingual.mp3"
)

play_audio(response.audio_data)

In [7]:


# Multilingual text example
text = "Hello! 你好! Hola! नमस्ते! Bonjour! こんにちは!"

voice="JBFqnCBsd6RMkjVDRZzb"

response = await elevenlabs_tts.agenerate_speech(
    text=text,
    voice=voice,  # One of ElevenLabs' default voices
    output_file=output_dir / "elevenlabs_multilingual.mp3"
)

play_audio(response.audio_data)

## 3. Google Cloud TTS

Google Cloud TTS provides extensive control over voice parameters and supports a wide range of languages and voices.

In [8]:
print("Using Google Cloud TTS with different voice types and audio configurations.\n")

# Initialize Google Cloud TTS model
google_tts = AIFactory.create_tts("google", model_name="neural2")
google_tts.available_voices



Using Google Cloud TTS with different voice types and audio configurations.



{'af-ZA-Standard-A': Voice(name='af-ZA-Standard-A', id='af-ZA-Standard-A', gender=<SsmlVoiceGender.FEMALE: 2>, language_code='af-ZA', description='af-ZA-Standard-A - af-ZA', accent=None, age=None, use_case=None, preview_url=None),
 'am-ET-Standard-A': Voice(name='am-ET-Standard-A', id='am-ET-Standard-A', gender=<SsmlVoiceGender.FEMALE: 2>, language_code='am-ET', description='am-ET-Standard-A - am-ET', accent=None, age=None, use_case=None, preview_url=None),
 'am-ET-Standard-B': Voice(name='am-ET-Standard-B', id='am-ET-Standard-B', gender=<SsmlVoiceGender.MALE: 1>, language_code='am-ET', description='am-ET-Standard-B - am-ET', accent=None, age=None, use_case=None, preview_url=None),
 'am-ET-Wavenet-A': Voice(name='am-ET-Wavenet-A', id='am-ET-Wavenet-A', gender=<SsmlVoiceGender.FEMALE: 2>, language_code='am-ET', description='am-ET-Wavenet-A - am-ET', accent=None, age=None, use_case=None, preview_url=None),
 'am-ET-Wavenet-B': Voice(name='am-ET-Wavenet-B', id='am-ET-Wavenet-B', gender=<Ss

In [9]:

text = "This is a demonstration of Google Cloud Text-to-Speech."
response = google_tts.generate_speech(
    text=text,
    voice="en-US-Neural2-A",
    output_file=output_dir / "google_normal.mp3"
)
play_audio(response.audio_data)
print()




In [10]:

text = "This is a demonstration of Google Cloud Text-to-Speech."
response = await google_tts.agenerate_speech(
    text=text,
    voice="en-US-Neural2-A",
    output_file=output_dir / "google_normal.mp3"
)
play_audio(response.audio_data)
print()


