# ElevenLabs STT-TTS Integration with Maxim SDK

This notebook demonstrates how to integrate Maxim SDK tracing with ElevenLabs Speech-to-Text (STT) and Text-to-Speech (TTS) operations.

## Overview

The integration enables you to:
- Trace STT operations (speech → text)
- Trace TTS operations (text → speech)
- Link both operations under a single trace
- Attach audio files (input and output) to traces
- Monitor the complete voice pipeline in Maxim dashboard


## Step 1: Load Environment Variables


In [None]:
import os
from dotenv import load_dotenv

load_dotenv()

# Configuration
ELEVENLABS_API_KEY = os.getenv("EL_API_KEY")

if not ELEVENLABS_API_KEY:
    raise ValueError("ELEVENLABS_API_KEY environment variable is not set")

print("✅ Environment variables loaded")


## Step 2: Initialize Maxim SDK and Instrument ElevenLabs


In [None]:
from maxim import Maxim
from maxim.logger.elevenlabs import instrument_elevenlabs

# Initialize Maxim logger
# This automatically picks up MAXIM_API_KEY and MAXIM_LOG_REPO_ID from environment variables
logger = Maxim().logger()

# Instrument ElevenLabs STT/TTS methods (one-line integration)
instrument_elevenlabs(logger)

print("✅ Maxim SDK initialized and ElevenLabs instrumented")


## Step 3: Initialize ElevenLabs Client


In [None]:
from elevenlabs.client import ElevenLabs

# Initialize ElevenLabs client
client = ElevenLabs(api_key=ELEVENLABS_API_KEY)

print("✅ ElevenLabs client initialized")


## Step 4: Create a Unified Trace for STT-TTS Pipeline


In [None]:
from uuid import uuid4
from maxim.logger.components.trace import TraceConfigDict
from elevenlabs.core import RequestOptions

# Create a shared trace ID for the entire pipeline
trace_id = str(uuid4())

trace = logger.trace(
    TraceConfigDict(
        id=trace_id,
        name="STT-TTS Pipeline Agent",
        tags={"provider": "elevenlabs", "operation": "pipeline"},
    )
)

# Create request options with trace_id header for both STT and TTS
request_options = RequestOptions(
    additional_headers={
        "x-maxim-trace-id": trace_id
    }
)

print(f"✅ Trace created with ID: {trace_id}")


## Step 5: Speech-to-Text (STT)

Convert audio input to text. The instrumentation automatically:
- Adds the audio file as an input attachment
- Sets the transcript as the trace input


In [None]:
# Option 1: Use sample audio file if available
audio_file_path = os.path.join("files", "sample_audio.wav")

if os.path.exists(audio_file_path):
    print(f"Processing audio file: {audio_file_path}")
    with open(audio_file_path, "rb") as audio_file:
        transcript = client.speech_to_text.convert(
            file=audio_file,
            model_id="scribe_v1",
            request_options=request_options
        )
    
    # Extract transcript text
    if isinstance(transcript, str):
        transcript_text = transcript
    elif hasattr(transcript, "text"):
        transcript_text = transcript.text
    elif isinstance(transcript, dict) and "text" in transcript:
        transcript_text = transcript["text"]
    else:
        transcript_text = str(transcript)
    
    print(f"✅ Transcript: {transcript_text}")
else:
    # Option 2: Use dummy transcript for testing
    print("Sample audio file not found. Using dummy transcript for demonstration.")
    transcript_text = "Hello, how are you?"
    trace.set_input(transcript_text)
    print(f"✅ Using dummy transcript: {transcript_text}")


## Step 6: Mock LLM Processing

In a real scenario, this would call an actual LLM API to process the transcript.


In [None]:
def mock_llm(transcript: str) -> str:
    """Mock LLM that generates a response based on the user's transcript."""
    transcript_lower = transcript.lower()
    
    if "hello" in transcript_lower or "hi" in transcript_lower:
        return "Hello! How can I help you today?"
    elif "weather" in transcript_lower:
        return "I'm sorry, I don't have access to weather information right now."
    elif "time" in transcript_lower:
        return "I don't have access to the current time, but I'm here to help with other questions!"
    elif "goodbye" in transcript_lower or "bye" in transcript_lower:
        return "Goodbye! Have a great day!"
    else:
        return f"I heard you say: {transcript}. How can I assist you further?"

# Process transcript with mock LLM
response_text = mock_llm(transcript_text)
print(f"✅ LLM Response: {response_text}")


## Step 7: Text-to-Speech (TTS)

Convert LLM response to audio. The instrumentation automatically:
- Sets the response text as the trace output
- Adds the generated audio file as an output attachment


In [None]:
from elevenlabs.play import play

# Convert LLM response text to speech
audio_output = client.text_to_speech.convert(
    text=response_text,
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    model_id="eleven_multilingual_v2",
    output_format="mp3_44100_128",
    request_options=request_options
)

print("✅ Audio generated successfully")
print("Playing audio...")
play(audio_output)


## Step 8: Complete the Trace


In [None]:
trace.end()

print("\n=== Pipeline Complete ===")
print("Check your Maxim dashboard to see the unified trace with:")
print("- Input: User speech transcript (set by STT instrumentation)")
print("- Output: LLM response text (set by TTS instrumentation)")
print("- Input attachment: User speech audio file (added by STT instrumentation)")
print("- Output attachment: Assistant speech audio file (added by TTS instrumentation)")
print(f"- Trace ID: {trace_id}")

# Cleanup
logger.cleanup()
