# Gemini Realtime Voice Agent with Tools (LiveKit Agents)

This notebook demonstrates a real-time voice assistant using **LiveKit Agents** framework with:
- **Gemini Realtime API**: Native audio model with server-side turn detection
- **Echo Cancellation**: Uses LiveKit's AudioProcessingModule to prevent feedback
- **GoogleSearch**: Built-in Gemini tool for live web search (grounding)
- **Custom function tools**: Calculator, current time
- **Local audio I/O**: Microphone input and speaker output via sounddevice

## Requirements

```bash
pip install sounddevice livekit livekit-agents livekit-plugins-google python-dotenv
```

**macOS:** You may need PortAudio:
```bash
brew install portaudio
```

## Usage
1. Set `GOOGLE_GEMINI_API_KEY` environment variable
2. Run all cells in order
3. Speak into your microphone
4. Try: "Search for the latest AI news" or "What's 25 times 17?"

In [1]:
# Install dependencies if needed
# !pip install sounddevice livekit livekit-agents livekit-plugins-google python-dotenv

In [2]:
import os
import asyncio
import math
import threading
import logging
from datetime import datetime
from dotenv import load_dotenv

import sounddevice as sd
import numpy as np
from livekit import rtc

# Load environment variables
load_dotenv()

# Set up logging (INFO level - set to DEBUG for verbose output)
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("voice_agent")

# Audio configuration (matching LiveKit agents console mode)
SAMPLE_RATE = 24000
CHANNELS = 1
FRAME_SAMPLES = 240  # 10ms frames
BLOCK_SIZE = 2400    # 100ms blocks

print(f"Sample rate: {SAMPLE_RATE} Hz")
print(f"Block size: {BLOCK_SIZE} samples ({BLOCK_SIZE/SAMPLE_RATE*1000:.0f}ms)")

Sample rate: 24000 Hz
Block size: 2400 samples (100ms)


In [3]:
# Verify API key
api_key = os.environ.get("GOOGLE_GEMINI_API_KEY")
if not api_key:
    raise ValueError(
        "GOOGLE_GEMINI_API_KEY not set.\n"
        "Set it with: export GOOGLE_GEMINI_API_KEY='your-key'"
    )
print("API key found!")

API key found!


In [4]:
# List available audio devices
print("Available audio devices:")
print(sd.query_devices())
print(f"\nDefault input: {sd.default.device[0]}")
print(f"Default output: {sd.default.device[1]}")

Available audio devices:
> 0 My boss, Core Audio (1 in, 0 out)
< 1 My boss, Core Audio (0 in, 2 out)
  2 YZâ€™s iPhone Microphone, Core Audio (1 in, 0 out)
  3 MacBook Air Microphone, Core Audio (1 in, 0 out)
  4 MacBook Air Speakers, Core Audio (0 in, 2 out)

Default input: 0
Default output: 1


## Import LiveKit Agents Components

In [5]:
from livekit.agents import Agent, AgentSession, utils
from livekit.agents.voice import io
from livekit.agents.voice.io import AudioOutputCapabilities
from livekit.agents.voice.events import RunContext  # For tool context
from livekit.agents.llm import function_tool
from livekit.plugins import google
from livekit.plugins.google.tools import GoogleSearch
from livekit import rtc

print("LiveKit Agents components imported!")

LiveKit Agents components imported!


## Define Custom Tools

In [6]:
# =============================================================================
# Custom Tools
# =============================================================================
# 
# NOTE: The LiveKit Agents framework uses a state machine that pauses speech
# scheduling after the agent finishes speaking. This means background tasks
# CANNOT inject speech after the main turn completes.
#
# The supported patterns are:
# 1. Fast tools: Return immediately (like calculate, get_current_time)
# 2. Slow tools: Block until completion (framework handles the flow)
#
# For slow operations, the agent will naturally speak acknowledgment
# ("Let me search for that...") while the tool runs.
# =============================================================================

@function_tool
async def calculate(expression: str) -> str:
    """Evaluate a mathematical expression.
    
    Args:
        expression: Math expression (e.g., '2 + 2', 'sqrt(16)', 'sin(pi/2)')
    """
    allowed = {
        'sqrt': math.sqrt, 'sin': math.sin, 'cos': math.cos,
        'tan': math.tan, 'log': math.log, 'log10': math.log10,
        'exp': math.exp, 'pi': math.pi, 'e': math.e,
        'abs': abs, 'round': round, 'pow': pow,
    }
    try:
        result = eval(expression, {'__builtins__': {}}, allowed)
        return f"The result of {expression} is {result}"
    except Exception as e:
        return f"Error: {e}"


@function_tool
async def get_current_time(timezone: str = "local") -> str:
    """Get the current date and time.
    
    Args:
        timezone: Timezone name (e.g., 'UTC', 'US/Pacific'). Defaults to local.
    """
    try:
        if timezone and timezone != "local":
            import pytz
            tz = pytz.timezone(timezone)
            now = datetime.now(tz)
            return f"The time in {timezone} is {now.strftime('%Y-%m-%d %H:%M:%S %Z')}"
        else:
            now = datetime.now()
            return f"The local time is {now.strftime('%Y-%m-%d %H:%M:%S')}"
    except Exception:
        now = datetime.utcnow()
        return f"The UTC time is {now.strftime('%Y-%m-%d %H:%M:%S')} UTC"


@function_tool
async def slow_web_search(ctx: RunContext, query: str) -> str | None:
    """Search the web for information (demonstrates slow tool handling).
    
    This tool simulates a slow web search that takes 3 seconds.
    The framework handles the flow: agent can speak while this runs,
    user can interrupt, and result is spoken when ready.
    
    Args:
        ctx: RunContext for speech handle access
        query: The search query
    """
    print(f"[Tool] slow_web_search starting for: {query}")
    
    # Create the slow task
    async def _do_search():
        await asyncio.sleep(3)  # Simulate API delay
        return f"Top results for '{query}': 1) AI advances in 2024, 2) New language models released, 3) Major tech announcements"
    
    # Start the task
    search_task = asyncio.ensure_future(_do_search())
    
    # Wait for either: task completion OR user interruption
    # This lets the agent speak naturally while we wait
    await ctx.speech_handle.wait_if_not_interrupted([search_task])
    
    if ctx.speech_handle.interrupted:
        print(f"[Tool] slow_web_search interrupted for: {query}")
        search_task.cancel()
        return None  # Return None to skip tool reply
    
    result = search_task.result()
    print(f"[Tool] slow_web_search completed for: {query}")
    return result


print("Tools defined:")
print("  - calculate: Fast math evaluation")
print("  - get_current_time: Fast time lookup")
print("  - slow_web_search: Slow search (3s) with interruption support")

Tools defined:
  - calculate: Fast math evaluation
  - get_current_time: Fast time lookup
  - slow_web_search: Slow search (3s) with interruption support


## Agent Transfer (Multi-Agent Handoff)

This demonstrates agent transfer with Gemini Realtime. When the user asks for specialized help,
the main agent can transfer to a specialist agent.

**How it works:**
- A tool returns an `Agent` (or `tuple[Agent, str]`) to trigger handoff
- The framework pauses the current agent and activates the new one
- The new agent inherits the conversation context

In [7]:
# =============================================================================
# Multi-Agent Setup with Transfer
# =============================================================================
# 
# Agent transfer works by returning an Agent from a tool function.
# The framework will pause the current agent and activate the new one.
# =============================================================================

# Store agents in a dict for easy lookup
agents: dict[str, Agent] = {}

# Get API key for creating realtime models
gemini_api_key = os.environ.get("GOOGLE_GEMINI_API_KEY")

# Create a separate realtime model for the math expert (can use different voice)
math_expert_model = google.realtime.RealtimeModel(
    api_key=gemini_api_key,
    model="gemini-2.5-flash-native-audio-preview-12-2025",
    voice="Charon",  # Different voice for math expert
    temperature=0.3,  # Lower temperature for precise math
)

# Tool to transfer back to main agent
@function_tool
async def transfer_to_main_agent() -> Agent:
    """Transfer the conversation back to the main assistant.
    
    Use this when you've finished helping with the specialized task
    and the user wants general assistance again.
    """
    print("[Transfer] Returning to main agent")
    return agents["main"]


# Math Expert Agent - specialized for complex calculations
math_expert = Agent(
    instructions="""You are a MATH EXPERT assistant. You speak with confidence about mathematics.

Your specialty is:
- Complex mathematical calculations
- Explaining mathematical concepts
- Step-by-step problem solving
- Statistical analysis

Guidelines:
- ALWAYS respond in English
- Be precise and accurate
- Explain your reasoning step by step
- Use the calculate tool for actual computation
- When the user is done with math questions, offer to transfer them back to the main assistant
- Use transfer_to_main_agent when the user wants general help

You have a more serious, professorial tone compared to the main assistant.
""",
    llm=math_expert_model,
    tools=[
        calculate,
        transfer_to_main_agent,
    ],
)
agents["math_expert"] = math_expert


# Tool to transfer to math expert
@function_tool
async def transfer_to_math_expert() -> tuple[Agent, str]:
    """Transfer the conversation to a math specialist.
    
    Use this when the user has complex math questions or needs
    detailed mathematical explanations.
    """
    print("[Transfer] Transferring to math expert")
    # Return tuple of (Agent, handoff message)
    return agents["math_expert"], "Transferring you to our math expert who can help with complex calculations."


print("Agent transfer setup complete!")
print("  - math_expert: Specialized agent for math (voice: Charon)")
print("  - transfer_to_math_expert: Tool to handoff to math expert")
print("  - transfer_to_main_agent: Tool to return to main assistant")

Agent transfer setup complete!
  - math_expert: Specialized agent for math (voice: Charon)
  - transfer_to_math_expert: Tool to handoff to math expert
  - transfer_to_main_agent: Tool to return to main assistant


## Audio I/O Classes

These classes connect sounddevice to the LiveKit Agents framework.

In [8]:
class NotebookAudioInput(io.AudioInput):
    """Audio input from microphone via sounddevice."""
    
    def __init__(self, loop: asyncio.AbstractEventLoop):
        super().__init__(label="Notebook Microphone")
        self._loop = loop
        self._audio_ch: utils.aio.Chan[rtc.AudioFrame] = utils.aio.Chan()
        self._attached = True
    
    def push_frame(self, frame: rtc.AudioFrame) -> None:
        """Push audio frame from sounddevice callback."""
        if self._attached:
            try:
                self._audio_ch.send_nowait(frame)
            except Exception:
                pass
    
    async def __anext__(self) -> rtc.AudioFrame:
        return await self._audio_ch.__anext__()
    
    def close(self):
        self._attached = False
        self._audio_ch.close()


class NotebookAudioOutput(io.AudioOutput):
    """Audio output to speaker via sounddevice.
    
    Supports pause/resume for false interruption handling.
    Properly tracks playback state with on_playback_started/on_playback_finished.
    """
    
    def __init__(self, loop: asyncio.AbstractEventLoop):
        super().__init__(
            label="Notebook Speaker",
            capabilities=io.AudioOutputCapabilities(pause=True),  # Enable pause support
            next_in_chain=None,
            sample_rate=SAMPLE_RATE,
        )
        self._loop = loop
        self._buffer = bytearray()
        self._lock = threading.Lock()
        self._closed = False
        
        # Playback tracking - CRITICAL for proper session coordination
        self._pushed_duration: float = 0.0
        self._capture_start: float = 0.0
        self._flush_task: asyncio.Task | None = None
        self._output_empty_ev = asyncio.Event()
        self._output_empty_ev.set()
        self._interrupted_ev = asyncio.Event()
        
        # Pause tracking for false interruption handling
        self._paused_at: float | None = None
        self._paused_duration: float = 0.0
    
    @property
    def paused(self) -> bool:
        """Check if audio output is paused."""
        return self._paused_at is not None
    
    @property
    def audio_lock(self) -> threading.Lock:
        return self._lock
    
    @property
    def audio_buffer(self) -> bytearray:
        return self._buffer
    
    def mark_output_empty(self) -> None:
        """Signal that output buffer is empty."""
        self._output_empty_ev.set()
    
    async def capture_frame(self, frame: rtc.AudioFrame) -> None:
        """Capture audio frame from agent for playback."""
        await super().capture_frame(frame)
        if self._closed:
            return
        
        # Wait for any pending flush to complete
        if self._flush_task and not self._flush_task.done():
            logger.warning("capture_frame called while flush in progress")
            await self._flush_task
        
        # Signal playback started on first frame
        if not self._pushed_duration:
            self._capture_start = time.monotonic()
            self.on_playback_started(created_at=time.time())
            logger.debug("Playback started")
        
        # Track total pushed duration and add to buffer
        self._pushed_duration += frame.duration
        with self._lock:
            self._buffer.extend(frame.data)
            self._output_empty_ev.clear()
    
    def flush(self) -> None:
        """Flush buffered audio, marking segment complete."""
        super().flush()
        if self._pushed_duration:
            if self._flush_task and not self._flush_task.done():
                logger.warning("flush called while previous flush in progress")
                self._flush_task.cancel()
            
            # Wait for playout to complete
            self._flush_task = asyncio.create_task(self._wait_for_playout())
    
    async def _wait_for_playout(self) -> None:
        """Wait for audio to finish playing, then signal playback_finished."""
        async def _wait_buffered_audio() -> None:
            while len(self._buffer) > 0:
                await self._output_empty_ev.wait()
                await asyncio.sleep(0)
        
        wait_for_interruption = asyncio.create_task(self._interrupted_ev.wait())
        wait_for_playout = asyncio.create_task(_wait_buffered_audio())
        
        try:
            await asyncio.wait(
                [wait_for_playout, wait_for_interruption],
                return_when=asyncio.FIRST_COMPLETED,
            )
            interrupted = wait_for_interruption.done()
        finally:
            wait_for_playout.cancel()
            wait_for_interruption.cancel()
        
        # Account for any paused time
        if self._paused_at is not None:
            self._paused_duration += time.monotonic() - self._paused_at
            self._paused_at = None
        
        # Calculate actual played duration
        if interrupted:
            played_duration = time.monotonic() - self._capture_start - self._paused_duration
            played_duration = min(max(0, played_duration), self._pushed_duration)
            logger.debug(f"Playback interrupted after {played_duration:.2f}s")
        else:
            played_duration = self._pushed_duration
            logger.debug(f"Playback completed: {played_duration:.2f}s")
        
        # Signal playback finished - CRITICAL for session coordination
        self.on_playback_finished(playback_position=played_duration, interrupted=interrupted)
        
        # Reset state for next segment
        self._pushed_duration = 0.0
        self._paused_at = None
        self._paused_duration = 0.0
        self._interrupted_ev.clear()
        with self._lock:
            self._output_empty_ev.set()
    
    def clear_buffer(self) -> None:
        """Clear the buffer and signal interruption."""
        with self._lock:
            self._buffer.clear()
            self._output_empty_ev.set()
        
        # Signal interruption if we were playing
        if self._pushed_duration:
            self._interrupted_ev.set()
    
    def pause(self) -> None:
        """Pause audio playback."""
        super().pause()
        if self._paused_at is None:
            self._paused_at = time.monotonic()
            logger.debug("Playback paused")
    
    def resume(self) -> None:
        """Resume audio playback."""
        super().resume()
        if self._paused_at is not None:
            self._paused_duration += time.monotonic() - self._paused_at
            self._paused_at = None
            logger.debug("Playback resumed")
    
    def get_audio(self, num_bytes: int) -> bytes:
        """Get audio data for sounddevice output callback."""
        with self._lock:
            # If paused, return silence
            if self.paused:
                return bytes(num_bytes)
            
            if len(self._buffer) >= num_bytes:
                data = bytes(self._buffer[:num_bytes])
                del self._buffer[:num_bytes]
                return data
            else:
                # Return what we have + zero padding
                data = bytes(self._buffer) + bytes(num_bytes - len(self._buffer))
                self._buffer.clear()
                # Mark empty in the event loop
                try:
                    self._loop.call_soon_threadsafe(self.mark_output_empty)
                except RuntimeError:
                    pass
                return data
    
    def close(self):
        self._closed = True
        self.clear_buffer()


# Need to import time for pause tracking
import time

print("Audio I/O classes defined (with proper playback tracking)")
print("- on_playback_started() called when first audio frame received")
print("- on_playback_finished() called when playback completes or is interrupted")

Audio I/O classes defined (with proper playback tracking)
- on_playback_started() called when first audio frame received
- on_playback_finished() called when playback completes or is interrupted


## Create the Agent

In [9]:
# Create Gemini Realtime Model for main agent
# Using default server-side turn detection (works well with echo cancellation)
realtime_model = google.realtime.RealtimeModel(
    api_key=gemini_api_key,
    model="gemini-2.5-flash-native-audio-preview-12-2025",
    voice="Puck",
    temperature=0.8,
    # Default uses server-side turn detection
)

# Create Main Agent with tools including transfer capability
agent = Agent(
    instructions="""You are a helpful voice assistant. ALWAYS respond in English.

You have access to:
1. **Google Search**: Built-in Gemini web search for current information
2. **Calculator**: Evaluate math expressions (sqrt, sin, cos, log, pi, etc.)
3. **Current Time**: Get the current date and time in any timezone
4. **Slow Web Search**: Demo tool that takes 3 seconds - shows how the framework handles slow operations
5. **Math Expert Transfer**: Transfer to a specialized math expert for complex calculations

Guidelines:
- ALWAYS speak in English, regardless of what language the user speaks
- Be conversational and friendly
- Keep responses concise (this is voice)
- Use Google Search for factual questions or current events
- Use calculator for simple math
- For COMPLEX math questions or when the user wants detailed mathematical explanations,
  use transfer_to_math_expert to hand them off to our math specialist
- When using slow_web_search, tell the user you're searching while you wait for results
""",
    llm=realtime_model,
    tools=[
        GoogleSearch(),          # Built-in Gemini web search
        calculate,               # Custom math tool (fast)
        get_current_time,        # Custom time tool (fast)
        slow_web_search,         # Slow tool demo (blocks with interruption support)
        transfer_to_math_expert, # Agent transfer to math specialist
    ],
)

# Store main agent in the agents dict for transfer back
agents["main"] = agent

print(f"Main agent created with {len(agent.tools)} tools:")
print("  - GoogleSearch (built-in)")
print("  - calculate (fast)")  
print("  - get_current_time (fast)")
print("  - slow_web_search (slow - 3s, with interruption support)")
print("  - transfer_to_math_expert (agent handoff)")
print()
print(f"Model: {realtime_model.model}")
print(f"Voice: Puck")
print(f"Turn detection: Server-side (default)")
print()
print("Agent Transfer Test:")
print("  Say 'I need help with complex math' to trigger transfer to math expert")
print("  The math expert uses voice 'Charon' - you should hear a different voice!")

Main agent created with 5 tools:
  - GoogleSearch (built-in)
  - calculate (fast)
  - get_current_time (fast)
  - slow_web_search (slow - 3s, with interruption support)
  - transfer_to_math_expert (agent handoff)

Model: gemini-2.5-flash-native-audio-preview-12-2025
Voice: Puck
Turn detection: Server-side (default)

Agent Transfer Test:
  Say 'I need help with complex math' to trigger transfer to math expert
  The math expert uses voice 'Charon' - you should hear a different voice!


## Main Voice Assistant Function

In [None]:
async def run_voice_assistant(
    input_device: int | str | None = None,
    output_device: int | str | None = None,
    duration: float | None = None,
):
    """
    Run the voice assistant with local audio I/O.
    
    Uses LiveKit's AudioProcessingModule for echo cancellation to prevent
    the microphone from picking up speaker output.
    """
    loop = asyncio.get_running_loop()
    
    audio_input = NotebookAudioInput(loop)
    audio_output = NotebookAudioOutput(loop)
    
    logger.info(f"Audio output can_pause: {audio_output.can_pause}")
    
    apm = rtc.AudioProcessingModule(
        echo_cancellation=True,
        noise_suppression=True,
        high_pass_filter=True,
        auto_gain_control=True,
    )
    print("Echo cancellation enabled via AudioProcessingModule")
    
    session_active = True
    input_delay = 0.0
    output_delay = 0.0
    is_first_agent = True  # Track if this is the initial agent start
    
    def input_callback(indata, frames, time_info, status):
        nonlocal input_delay
        if not session_active:
            return
        
        input_delay = time_info.currentTime - time_info.inputBufferAdcTime
        total_delay = output_delay + input_delay
        try:
            apm.set_stream_delay_ms(int(total_delay * 1000))
        except RuntimeError:
            pass
        
        num_frames = frames // FRAME_SAMPLES
        for i in range(num_frames):
            start = i * FRAME_SAMPLES
            end = start + FRAME_SAMPLES
            chunk = indata[start:end, 0]
            
            frame = rtc.AudioFrame(
                data=chunk.tobytes(),
                samples_per_channel=FRAME_SAMPLES,
                sample_rate=SAMPLE_RATE,
                num_channels=CHANNELS,
            )
            apm.process_stream(frame)
            loop.call_soon_threadsafe(audio_input.push_frame, frame)
    
    def output_callback(outdata, frames, time_info, status):
        nonlocal output_delay
        if not session_active:
            outdata[:] = 0
            return
            
        output_delay = time_info.outputBufferDacTime - time_info.currentTime
        num_bytes = frames * CHANNELS * 2
        
        with audio_output.audio_lock:
            is_paused = audio_output.paused
        
        if is_paused:
            outdata[:] = 0
            silence = np.zeros(FRAME_SAMPLES, dtype=np.int16)
            num_frames = frames // FRAME_SAMPLES
            for i in range(num_frames):
                render_frame = rtc.AudioFrame(
                    data=silence.tobytes(),
                    samples_per_channel=FRAME_SAMPLES,
                    sample_rate=SAMPLE_RATE,
                    num_channels=CHANNELS,
                )
                apm.process_reverse_stream(render_frame)
            return
        
        data = audio_output.get_audio(num_bytes)
        audio_samples = np.frombuffer(data, dtype=np.int16)
        outdata[:, 0] = audio_samples
        
        num_frames = frames // FRAME_SAMPLES
        for i in range(num_frames):
            start = i * FRAME_SAMPLES
            end = start + FRAME_SAMPLES
            chunk = outdata[start:end, 0]
            render_frame = rtc.AudioFrame(
                data=chunk.tobytes(),
                samples_per_channel=FRAME_SAMPLES,
                sample_rate=SAMPLE_RATE,
                num_channels=CHANNELS,
            )
            apm.process_reverse_stream(render_frame)
    
    if input_device is None:
        input_device = sd.default.device[0]
    if output_device is None:
        output_device = sd.default.device[1]
    
    print("="*60)
    print("Gemini Voice Assistant Ready!")
    print("="*60)
    print(f"Input:  {sd.query_devices(input_device)['name']}")
    print(f"Output: {sd.query_devices(output_device)['name']}")
    print()
    print("Try saying:")
    print("  - 'What time is it?'")
    print("  - 'What is 25 times 17?'")
    print("  - 'I need help with complex math' (transfers to math expert!)")
    print("  - 'Use slow search for AI news'")
    print("="*60)
    print()
    
    input_stream = sd.InputStream(
        callback=input_callback,
        device=input_device,
        channels=CHANNELS,
        samplerate=SAMPLE_RATE,
        blocksize=BLOCK_SIZE,
        dtype='int16',
    )
    
    output_stream = sd.OutputStream(
        callback=output_callback,
        device=output_device,
        channels=CHANNELS,
        samplerate=SAMPLE_RATE,
        blocksize=BLOCK_SIZE,
        dtype='int16',
    )
    
    try:
        input_stream.start()
        output_stream.start()
        print("Audio streams started")
        
        session = AgentSession(
            allow_interruptions=True,
            min_interruption_duration=0.5,
            min_interruption_words=0,
            resume_false_interruption=True,
            false_interruption_timeout=1.0,
            min_endpointing_delay=0.5,
            max_endpointing_delay=3.0,
        )
        
        session.input.audio = audio_input
        session.output.audio = audio_output
        
        @session.on("user_input_transcribed")
        def on_user_input(ev):
            if ev.is_final:
                print(f"You: {ev.transcript}")
        
        @session.on("agent_speech_transcribed") 
        def on_agent_speech(ev):
            if ev.is_final:
                print(f"Assistant: {ev.transcript}")
                print("---")
        
        @session.on("function_tools_executed")
        def on_tools_executed(ev):
            for call, output in ev.zipped():
                print(f"[Tool] {call.name}")
        
        @session.on("agent_started")
        def on_agent_started(ev):
            nonlocal is_first_agent
            current_agent = ev.agent
            
            # Skip greeting for the initial agent start (handled separately)
            if is_first_agent:
                is_first_agent = False
                print(f"[Agent] Initial agent started")
                return
            
            print(f"[Agent] Transferred to new agent")
            
            # Generate a greeting for the new agent after transfer
            # Determine which agent this is and provide appropriate instructions
            if current_agent is agents.get("math_expert"):
                greeting = "You just took over from the main assistant. Briefly introduce yourself as the math expert and ask how you can help with their math question."
            elif current_agent is agents.get("main"):
                greeting = "You just returned from the math expert. Briefly welcome the user back and ask if there's anything else you can help with."
            else:
                greeting = "Briefly introduce yourself to the user."
            
            # Schedule the greeting (async)
            asyncio.create_task(session.generate_reply(instructions=greeting))
        
        @session.on("error")
        def on_error(ev):
            print(f"[Error] {ev.error}")

        await session.start(agent=agent)
        print("Session started")
        print()
        
        await session.generate_reply(
            instructions="Greet the user briefly in English. Mention you can search the web, do math, tell time, and transfer to a math expert for complex calculations."
        )
        
        if duration:
            await asyncio.sleep(duration)
        else:
            while session_active:
                await asyncio.sleep(1)
    
    except asyncio.CancelledError:
        print("\nSession cancelled.")
    except KeyboardInterrupt:
        print("\nSession interrupted.")
    except Exception as e:
        print(f"\nError: {e}")
        import traceback
        traceback.print_exc()
    finally:
        session_active = False
        input_stream.stop()
        output_stream.stop()
        input_stream.close()
        output_stream.close()
        audio_input.close()
        audio_output.close()
        print("Session ended.")


print("Voice assistant function defined. Run the next cell to start!")

## Run the Voice Assistant

Run the cell below to start. Speak into your microphone!

**To stop:** Press the stop button in Jupyter or interrupt the kernel.

In [None]:
# Run the voice assistant
# Set duration=60 for 60 seconds, or None to run indefinitely
await run_voice_assistant(duration=None)

INFO:voice_agent:Audio output can_pause: True
INFO:livekit:livekit_ffi::server:139:livekit_ffi::server - initializing ffi server v0.12.42
INFO:livekit:livekit_ffi::cabi:50:livekit_ffi::cabi - initializing ffi server v0.12.42


Echo cancellation enabled via AudioProcessingModule
Gemini Voice Assistant Ready!
Input:  My boss
Output: My boss

Try saying:
  - 'What time is it?'
  - 'What is 25 times 17?'
  - 'I need help with complex math' (transfers to math expert!)
  - 'Use slow search for AI news'

Audio streams started
Session started

You: I I I need to help with the complex mass.
[Transfer] Transferring to math expert
[Tool] transfer_to_math_expert
You: hello.
You: was 25 times 11.
[Tool] calculate
You: please transfer back to the assistant.
[Transfer] Returning to main agent
[Tool] transfer_to_main_agent
You: Hallo.
You: Oh nice. Thank you. Bye.


## Alternative: Run with Specific Devices

Use this if you need to specify audio devices.

In [None]:
# Run with specific devices (use device IDs from the list above)
# await run_voice_assistant(input_device=0, output_device=1, duration=120)

## Notes

### Available Gemini Voices
Puck (default), Charon, Kore, Fenrir, Aoede, Achernar, Alnilam, Callirrhoe, Zephyr

### Tools
- **GoogleSearch**: Built-in Gemini grounding with live web search
- **calculate**: Math expressions with sqrt, sin, cos, log, pi, e
- **get_current_time**: Current time in any timezone

### Troubleshooting
- **No audio**: Check microphone permissions and device selection
- **API errors**: Verify `GOOGLE_GEMINI_API_KEY` is set
- **PortAudio errors**: `brew install portaudio` on macOS

### Echo Cancellation
The key to reliable local audio I/O is **echo cancellation**. Without it, the microphone picks up speaker output, causing:
- False speech detection (server thinks user is speaking when agent is)
- Interrupted responses ("speech not done in time after interruption")

This notebook uses `rtc.AudioProcessingModule` from LiveKit:
- `process_reverse_stream()` - feed output audio as AEC reference
- `process_stream()` - remove echo from microphone input

This is the same approach used by LiveKit's official `console` mode.