# Voice Agent using OpenAI Agents SDK with Maxim Logging

This notebook demonstrates how to build a voice assistant using OpenAI's Agents SDK with Maxim tracing for observability.

The assistant includes:
- **Triage Agent**: Routes queries to appropriate specialized agents
- **Search Agent**: Performs web search for real-time information
- **Account Agent**: Provides account information via function calling
- **Voice Pipeline**: Converts text-based agents to voice-based interactions


## Setup and Imports

First, let's import all necessary libraries and set up environment variables.


In [None]:
import os

import dotenv
import numpy as np
import sounddevice as sd
from agents import Agent, Runner, add_trace_processor, function_tool, set_default_openai_key
from agents.extensions.handoff_prompt import prompt_with_handoff_instructions
from agents.voice import AudioInput, SingleAgentVoiceWorkflow, TTSModelSettings, VoicePipeline, VoicePipelineConfig

# Optional: Import WebSearchTool if available
try:
    from agents import WebSearchTool
    WEB_SEARCH_AVAILABLE = True
except ImportError:
    WEB_SEARCH_AVAILABLE = False
    WebSearchTool = None
    print("‚ö†Ô∏è WebSearchTool not available")

dotenv.load_dotenv()

# Environment variables
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
MAXIM_API_KEY = os.getenv("MAXIM_API_KEY")
MAXIM_LOG_REPO_ID = os.getenv("MAXIM_LOG_REPO_ID")

if not OPENAI_API_KEY:
    raise ValueError("OPENAI_API_KEY environment variable is not set")

# Set OpenAI API key for agents
set_default_openai_key(OPENAI_API_KEY)


### Initializing Maxim SDK

- Maxim SDK automatically picks up `MAXIM_API_KEY` and `MAXIM_LOG_REPO_ID` from environment variables
- The `MaximOpenAIAgentsTracingProcessor` automatically traces all agent interactions
- Learn more [here](https://www.getmaxim.ai/docs/observe/concepts#log-repository)


In [None]:
from maxim import Maxim, Config
from maxim.logger.openai.agents import MaximOpenAIAgentsTracingProcessor

# Creating a new logger instance
# It automatically initializes using MAXIM_API_KEY and MAXIM_LOG_REPO_ID from env variables
logger = Maxim(Config()).logger()

# Add Maxim trace processor to automatically trace all agent interactions
add_trace_processor(MaximOpenAIAgentsTracingProcessor(logger))
print("‚úÖ Maxim logging enabled for Agents SDK")


## Defining Tools

We'll create a custom tool for account information retrieval.


In [None]:
@function_tool
def get_account_info(user_id: str) -> dict:
    """Return account info for a given user ID."""
    return {
        "user_id": user_id,
        "name": "Bugs Bunny",
        "account_balance": "¬£72.50",
        "membership_status": "Gold Executive"
    }


## Defining Agents

We'll create specialized agents for different use cases, optimized for voice output.


In [None]:
# Voice-optimized system prompt
VOICE_SYSTEM_PROMPT = """
[Output Structure]

Your output will be delivered in an audio voice response, please ensure that every response meets these guidelines:

1. Use a friendly, human tone that will sound natural when spoken aloud.
2. Keep responses short and segmented‚Äîideally one to two concise sentences per step.
3. Avoid technical jargon; use plain language so that instructions are easy to understand.
4. Provide only essential details so as not to overwhelm the listener.
"""

# Search Agent
search_agent = Agent(
    name="SearchAgent",
    instructions=VOICE_SYSTEM_PROMPT + (
        "You immediately provide an input to the WebSearchTool to find up-to-date information on the user's query."
    ),
    tools=[WebSearchTool()] if WEB_SEARCH_AVAILABLE else [],
)

# Account Agent
account_agent = Agent(
    name="AccountAgent",
    instructions=VOICE_SYSTEM_PROMPT + (
        "You provide account information based on a user ID using the get_account_info tool."
    ),
    tools=[get_account_info],
)

# Triage Agent
triage_agent = Agent(
    name="VoiceAssistant",
    instructions=prompt_with_handoff_instructions("""
You are the virtual assistant for Acme Shop. Welcome the user and ask how you can help.

Based on the user's intent, route to:
- AccountAgent for account-related queries
- SearchAgent for anything requiring real-time web search
"""),
    handoffs=[account_agent, search_agent],
)


## Voice Pipeline Configuration

Configure the text-to-speech settings for natural, friendly voice output.


In [None]:
# Custom TTS model settings for natural voice output
custom_tts_settings = TTSModelSettings(
    instructions=(
        "Personality: upbeat, friendly, persuasive guide. "
        "Tone: Friendly, clear, and reassuring, creating a calm atmosphere and making the listener feel confident and comfortable. "
        "Pronunciation: Clear, articulate, and steady, ensuring each instruction is easily understood while maintaining a natural, conversational flow. "
        "Tempo: Speak at a moderate, natural pace - not too fast, not too slow. Use strategic pauses between sentences and after important points. Include brief pauses before and after questions to allow the listener to process the information. "
        "Emotion: Warm and supportive, conveying empathy and care, ensuring the listener feels guided and safe throughout the journey."
    )
)

voice_pipeline_config = VoicePipelineConfig(tts_settings=custom_tts_settings)


## Voice Assistant Function

This function handles the voice interaction loop, recording audio input and playing audio responses.


In [None]:
async def voice_assistant_with_maxim():
    """Run voice assistant with Maxim tracing (automatic via trace processor)."""
    # Use a standard sample rate for better quality and compatibility
    # 16000 Hz is commonly used for speech recognition and provides good quality
    SAMPLE_RATE = 16000  # Standard sample rate for speech
    
    print("üé§ Voice Assistant Ready!")
    print("Press Enter to speak your query (or type 'exit' to quit)")
    print(f"üìä Audio settings: {SAMPLE_RATE} Hz, mono, 16-bit")
    print("-" * 60)
    
    while True:
        # Check for input to either provide voice or exit
        cmd = input("\nPress Enter to speak (or type 'exit' to quit): ")
        
        if cmd.lower() == "exit":
            print("üëã Exiting...")
            break
        
        try:
            print("üé§ Listening...")
            recorded_chunks = []
            
            # Start streaming from microphone with fixed sample rate for better quality
            with sd.InputStream(
                samplerate=SAMPLE_RATE,
                channels=1,
                dtype='int16',
                blocksize=4096,  # Larger block size for better quality
                callback=lambda indata, frames, time, status: recorded_chunks.append(indata.copy())
            ):
                input()  # Wait for Enter key
            
            # Concatenate chunks into single buffer
            recording = np.concatenate(recorded_chunks, axis=0)
            
            # Ensure recording is in the correct format (flatten if needed)
            if recording.ndim > 1:
                recording = recording.flatten()
            
            # Create audio input
            audio_input = AudioInput(buffer=recording)
            
            # Create pipeline
            pipeline = VoicePipeline(
                workflow=SingleAgentVoiceWorkflow(triage_agent),
                config=voice_pipeline_config
            )
            
            # Run the pipeline
            # Maxim tracing is automatic via MaximOpenAIAgentsTracingProcessor
            print("ü§î Processing...")
            result = await pipeline.run(audio_input)
            
            # Transfer the streamed result into chunks of audio
            response_chunks = []
            transcript_parts = []
            
            async for event in result.stream():
                if event.type == "voice_stream_event_audio":
                    response_chunks.append(event.data)
                elif event.type == "voice_stream_event_text":
                    # Capture transcript for display
                    if hasattr(event, 'text'):
                        transcript_parts.append(event.text)
            
            response_audio = np.concatenate(response_chunks, axis=0)
            transcript = " ".join(transcript_parts) if transcript_parts else "Audio response generated"
            
            # Play response
            # Ensure response audio is in correct format
            if response_audio.ndim > 1:
                response_audio = response_audio.flatten()
            
            print("üîä Assistant is responding...")
            # Use the same sample rate for playback as recording
            sd.play(response_audio, samplerate=SAMPLE_RATE)
            sd.wait()
            
            print(f"‚úÖ Response: {transcript[:100]}..." if len(transcript) > 100 else f"‚úÖ Response: {transcript}")
            print("üìä Interaction automatically traced to Maxim")
            
        except KeyboardInterrupt:
            print("\nüëã Interrupted by user")
            break
        except Exception as e:
            print(f"‚ùå Error: {e}")
            import traceback
            traceback.print_exc()
    
    # Cleanup
    try:
        logger.cleanup()
    except Exception as e:
        print(f"‚ö†Ô∏è Logger cleanup error: {e}")


## Text-based Testing (Optional)

Test the agents with text input before using voice. This is useful for development and debugging.


In [None]:
async def test_agents_text():
    """Test agents with text input (for development/debugging)."""
    examples = [
        "What's my ACME account balance? My user ID is 1234567890",  # Account Agent test
        "What's trending in duck hunting gear right now?",  # Search Agent test
    ]
    
    try:
        for query in examples:
            print(f"\nüë§ User: {query}")
            # Maxim tracing is automatic via MaximOpenAIAgentsTracingProcessor
            result = await Runner.run(triage_agent, query)
            print(f"ü§ñ Assistant: {result.final_output}")
            print("üìä Interaction automatically traced to Maxim")
            print("-" * 60)
    
    except Exception as e:
        print(f"‚ùå Error: {e}")
        import traceback
        traceback.print_exc()
        raise
    
    finally:
        try:
            logger.cleanup()
        except Exception as e:
            print(f"‚ö†Ô∏è Logger cleanup error: {e}")

# Uncomment to run text tests
# await test_agents_text()


## Run Voice Assistant

Run the voice assistant to start interacting via voice. Make sure you have a microphone connected and speakers/headphones for audio output.


In [None]:
# Run the voice assistant
await voice_assistant_with_maxim()


## Maxim Dashboard

You can view the trace of the agents' interactions on the [Maxim](https://www.getmaxim.ai) dashboard, which provides detailed insights and visualizations of the entire process, including:

- Complete conversation traces
- Audio input/output metadata
- Agent handoffs and routing decisions
- Tool usage and function calls
- Performance metrics

![](maxim-dashboard.png)
