Skip to content

Granular Per-Agent Speech Configuration #3116

@qyuo

Description

@qyuo

Is your feature request related to a problem?
In multi-agent conversational applications, distinguishing between different agents is crucial for user experience. Assigning a unique voice to each agent provides clear audio cues, helping users identify which persona is speaking and making the interaction more intuitive.

Currently, the speech configuration is set globally for the entire session. This prevents developers from defining per-agent voices, which is a significant limitation, especially in bidirectional streaming contexts.

Describe the solution you'd like
I propose adding an optional speech_config parameter to the individual agent's configuration object. This would allow developers to specify unique voice and other speech settings on a per-agent basis.

Example Implementation:

llm_agent.LlmAgent(
    model=models.Gemini(
        model="gemini-live-2.5-flash-preview",
        retry_options=types.HttpRetryOptions(initial_delay=1, attempts=2),
    ),
    name="recommendation_agent",
    description="Recommends Fantasy Football Picks",
    instruction="Provide data-driven draft recommendations",
    speech_config=types.SpeechConfig(
        voice_config=types.VoiceConfig(
            prebuilt_voice_config=types.PrebuiltVoiceConfig(
                voice_name="Kore",
            )
        )
    ),
)

Metadata

Metadata

Assignees

Labels

core[Component] This issue is related to the core interface and implementationlive[Component] This issue is related to live, voice and video chat

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions