-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
ADK Bug Report:
Summary
The ADK's built-in /run_live WebSocket endpoint does not expose RunConfig options for proactivity, enable_affective_dialog, or session_resumption. This prevents developers from using Gemini Live's proactive audio features through the standard ADK endpoint.
ADK Version
- Version: 1.23.0
- File:
src/google/adk/cli/adk_web_server.py - Line: 1655
Current Behavior
The /run_live endpoint hardcodes a minimal RunConfig:
@app.websocket("/run_live")
async def run_agent_live(
websocket: WebSocket,
app_name: str,
user_id: str,
session_id: str,
modalities: List[Literal["TEXT", "AUDIO"]] = Query(default=["AUDIO"]),
) -> None:
# ...
run_config = RunConfig(response_modalities=modalities) # Line 1655This means:
- Model waits for user input - Without
proactivity=ProactivityConfig(proactive_audio=True), the model will not speak first - No emotional awareness -
enable_affective_dialogis not enabled - Session resumption not configured -
session_resumptionis not passed
Expected Behavior
The /run_live endpoint should accept optional query parameters (or request body) to configure these RunConfig options:
@app.websocket("/run_live")
async def run_agent_live(
websocket: WebSocket,
app_name: str,
user_id: str,
session_id: str,
modalities: List[Literal["TEXT", "AUDIO"]] = Query(default=["AUDIO"]),
proactive_audio: bool = Query(default=False),
enable_affective_dialog: bool = Query(default=False),
enable_session_resumption: bool = Query(default=False),
) -> None:
# ...
run_config = RunConfig(
response_modalities=modalities,
proactivity=types.ProactivityConfig(proactive_audio=proactive_audio) if proactive_audio else None,
enable_affective_dialog=enable_affective_dialog if enable_affective_dialog else None,
session_resumption=types.SessionResumptionConfig() if enable_session_resumption else None,
)Impact
Without these options exposed:
- Agents cannot greet users first - The agent waits silently until the user speaks, creating an awkward UX
- Emotional awareness disabled - The model cannot adapt to user emotions
- Session resumption broken - Multi-connection sessions don't work properly
Workaround
We created a custom /julia_run_live endpoint that mirrors /run_live but passes the full RunConfig:
run_config = RunConfig(
response_modalities=modalities,
proactivity=types.ProactivityConfig(proactive_audio=True),
enable_affective_dialog=True,
session_resumption=types.SessionResumptionConfig(),
input_audio_transcription=types.AudioTranscriptionConfig(),
output_audio_transcription=types.AudioTranscriptionConfig(),
)See: routers/julia_run_live.py in our codebase.
Suggested Fix
Option A: Query Parameters (Minimal Change)
Add query parameters to the existing endpoint:
@app.websocket("/run_live")
async def run_agent_live(
websocket: WebSocket,
app_name: str,
user_id: str,
session_id: str,
modalities: List[Literal["TEXT", "AUDIO"]] = Query(default=["AUDIO"]),
proactive_audio: bool = Query(default=False),
affective_dialog: bool = Query(default=False),
session_resumption: bool = Query(default=False),
) -> None:Option B: RunConfig in Initial Message
Allow passing RunConfig as the first WebSocket message before streaming begins:
{
"run_config": {
"proactivity": {"proactive_audio": true},
"enable_affective_dialog": true,
"session_resumption": {}
}
}Option C: Agent-Level Default Config
Allow agents to specify default RunConfig in their agent.py that gets used by /run_live:
root_agent = Agent(
# ...
default_live_config=RunConfig(
proactivity=types.ProactivityConfig(proactive_audio=True),
enable_affective_dialog=True,
)
)Related Code
The RunConfig class already supports these options (added in a recent release):
# From src/google/adk/agents/run_config.py
class RunConfig(BaseModel):
# ...
enable_affective_dialog: Optional[bool] = None
proactivity: Optional[types.ProactivityConfig] = None
session_resumption: Optional[types.SessionResumptionConfig] = NoneAnd they are correctly passed to the LLM in basic.py:
# From src/google/adk/flows/llm_flows/basic.py:79-81
llm_request.live_connect_config.proactivity = (
invocation_context.run_config.proactivity
)The infrastructure is in place - it just needs to be exposed through the /run_live endpoint.
Environment
- ADK Version: 1.23.0
- Python: 3.11+
- Gemini Model: gemini-2.0-flash-live (native audio)
Steps to Reproduce
- Create an ADK agent with Gemini Live support
- Connect to
/run_liveWebSocket endpoint - Observe that the model waits silently for user input
- Expected: Model should greet the user if
proactive_audio=True
Additional Context
The ADK documentation (Part 5: Bidi-streaming) shows how to use proactivity:
run_config = RunConfig(
proactivity=types.ProactivityConfig(proactive_audio=True),
enable_affective_dialog=True,
)But there's no way to pass these options through the built-in /run_live endpoint, forcing developers to create custom endpoints.