# **Notebook_03_Text_to_Speech**
# **Introduction**
Text-to-Speech (TTS) technology represents a key aspect of audio generation, converting written text into audible speech. This functionality supports applications ranging from virtual assistants and accessibility aids to automated content narration. Swarmauri’s SDK provides a seamless interface for TTS, encapsulated in the OpenAIAudioTTS class, allowing users to synthesize natural-sounding audio from textual inputs.

**In this notebook, we’ll dive into the configuration and usage of TTS with Swarmauri SDK:**

We’ll begin by setting up the TTS feature in OpenAIAudioTTS to generate audio directly from text.
Next, we’ll experiment with language, tone, and speech rate customization, which allows for more dynamic and context-specific applications.

Finally, we’ll explore how to integrate Swarmauri TTS in multilingual setups, using specific examples of generating speech in different languages for a global user base.

This practical exploration will demonstrate how to leverage Swarmauri’s high-quality audio models, making it easier to integrate TTS features for diverse audiences while maintaining clear and responsive audio output. This functionality is particularly beneficial for applications in accessibility, customer support, and automated information delivery.

**Import of dependencies**

In [9]:
import os
from dotenv import load_dotenv
from swarmauri.llms.concrete.OpenAIAudioTTS import OpenAIAudioTTS
from pathlib import Path

**Setup**

In [22]:
# Load environment variables
load_dotenv()

# Initialize OpenAI Audio TTS
API_KEY = os.getenv("OPENAI_API_KEY")
llm = OpenAIAudioTTS(api_key=API_KEY)

**Set up paths**

In [37]:
root_dir = Path.cwd()
output_dir = os.path.join(root_dir, "output")
os.makedirs(output_dir, exist_ok=True)


In [24]:

print("Model Configuration:")
print(f"Default Model: {llm.name}")
print(f"Default Voice: {llm.voice}")

Model Configuration:
Default Model: tts-1
Default Voice: alloy


**Available models**

In [25]:
print("\nAvailable Models:")
available_models = llm.allowed_models
for model in available_models:
    print(f"- {model}")


Available Models:
- tts-1
- tts-1-hd


## **1. Basic Text-to-Speech Generation**

**Voices to demonstrate**

In [26]:
voices = ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]

**In this section, we'll generate audio by modifying parameters like model name and voice. We'll save the audio to a file and observe how changing these parameters impacts the output.**

Assign the selected model and voice to the llm object


In [39]:

llm.name = "tts-1"
llm.voice = "alloy"

Output file path



In [40]:
output_path = os.path.join(output_dir,"tts-1_alloy_output.mp3")

Generate audio using the predict method

In [None]:
sample_text = "Hello, this is a test of streaming text-to-speech output."
print("\nText-to-Speech Demonstration:")
audio_file = llm.predict(
    text=sample_text, 
    audio_path=output_path
)
print("\nDemonstration complete! Check the output directory for generated audio files.")



## **2. Generating Audio with Different Voices and Models**

In [None]:
print("\nText-to-Speech Demonstrations:")
sample_text = "Hello, this is a test of streaming text-to-speech output."

for model_name in available_models:
    llm.name = model_name
    
    for voice in voices:
        llm.voice = voice
        
        # Output file path
        output_path = os.path.join(output_dir, f"{model_name}_{voice}_output.mp3")
        
        print(f"\nGenerating audio - Model: {model_name}, Voice: {voice}")
        
        # Predict method
        audio_file = llm.predict(
            text=sample_text, 
            audio_path=output_path
        )
       print("\nDemonstration complete! Check the output directory for generated audio files.")

## **5. Model Serialization Validation**

In [20]:

print("\nModel Serialization:")
model_json = llm.model_dump_json()
validated_model = OpenAIAudioTTS.model_validate_json(model_json)
print(f"Original Model ID: {llm.id}")
print(f"Validated Model ID: {validated_model.id}")
print(f"Validation Successful: {llm.id == validated_model.id}")


Model Serialization:
Original Model ID: c5047e78-cc32-4503-9915-2d78749c804b
Validated Model ID: c5047e78-cc32-4503-9915-2d78749c804b
Validation Successful: True


## **Conclusion**

The text-to-speech capabilities offered by Swarmauri’s OpenAIAudioTTS module prove invaluable in creating accessible and engaging user experiences. Through this notebook, we’ve illustrated the ease of using Swarmauri SDK to convert text into speech and customize parameters for tailored audio outputs. 

The flexibility provided by Swarmauri’s TTS enables integration into applications that require fast, reliable, and contextually appropriate speech synthesis, paving the way for enhanced user interactions. This foundational knowledge prepares users to explore deeper customization, enabling fully personalized audio generation workflows in advanced scenarios.

In [None]:
from swarmauri.utils.print_notebook_metadata  import print_notebook_metadata

print_notebook_metadata(author_name = "Dominion John " , github_username = "DOMINION-JOHN1" )