fix: Google Vertex AI realtime crash when calling tools#1130
fix: Google Vertex AI realtime crash when calling tools#1130toubatbrian merged 2 commits intolivekit:mainfrom
Conversation
🦋 Changeset detectedLatest commit: 6f41ecd The changes in this PR will be included in the next version bump. This PR includes changesets to release 21 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
I was experiencing several issues with |
|
This thread in the official Google forum is quite relevant: https://discuss.ai.google.dev/t/gemini-live-api-websocket-error-1008-operation-is-not-implemented-or-supported-or-enabled/114644/56 |
| this.markCurrentGenerationDone(); | ||
| } | ||
|
|
||
| private handleToolCallCancellation(cancellation: types.LiveServerToolCallCancellation): void { |
There was a problem hiding this comment.
🔴 handleToolCallCancellation does not clear toolCallPending, permanently blocking audio input
The new toolCallPending flag is set to true when a toolCall arrives (realtime_api.ts:1032, realtime_api.ts:1509), and is only cleared in two places: when a tool_response is successfully sent (realtime_api.ts:988) or when the session closes (realtime_api.ts:483). However, handleToolCallCancellation at line 1533 only logs a warning and does not clear toolCallPending.
When the Gemini server cancels tool calls (e.g., because the user interrupted while tools were being processed), the server sends a toolCallCancellation event. This can trigger input_speech_started (via the interrupted server event), which interrupts the current speech handle. The interrupted speech handle cancels tool execution (agents/src/voice/agent_activity.ts:2277) and returns without ever calling updateChatCtx with the tool results (agents/src/voice/agent_activity.ts:2361). Since no tool_response is sent, toolCallPending remains true, causing pushAudio to silently drop all audio frames (realtime_api.ts:655), startUserActivity to no-op (realtime_api.ts:748), and all realtime_input events to be dropped in the send task (realtime_api.ts:994). The session is effectively deaf until it restarts.
(Refers to lines 1533-1540)
Was this helpful? React with 👍 or 👎 to provide feedback.
Description
Fixes two issues when using the Gemini Live 2.5 Flash model (Vertex AI) with tool calls:
1008 WebSocket crash: When Gemini sends a
tool_call, it rejects anysendRealtimeInputuntil the tool response is sent. If audio or activity events are sent during tool execution, the WebSocket closes with code 1008. This PR gates all realtime input with atoolCallPendingflag and clears it intry/finallyafter sending the tool response.Voice interruption on tool call: Previously, receiving a tool call immediately closed the current generation (and its audio channel), cutting off the agent’s speech. The previous generation is now kept open until the server sends
turnComplete, so playback finishes before the next turn.Changes Made
toolCallPending; set totruewhen a tool call is received, cleared infinallyaftersendToolResponse. Skip queuing/sending realtime input (audio,activityStart,activityEnd) whiletoolCallPendingis true inpushAudio,startUserActivity, andsendTask(realtime_inputcase).generationPendingTurnComplete. When a tool call starts a new generation, the previous generation is no longer closed immediately; it is stored and closed only whenturnCompleteis received inhandleServerContent, so its audio stream stays open until the turn actually ends.markCurrentGenerationDone(keepFunctionChannelOpen?, gen?): Added optional second parameter so a specific generation (the pending one) can be closed onturnCompleteinstead of always closing the current one.startNewGeneration(): If the previous generation had an openfunctionChannel(tool-call path), setgenerationPendingTurnComplete = previousGenand do not callmarkCurrentGenerationDone(); otherwise keep existing “Finalizing previous” behavior.handleToolCall(): RemovedmarkCurrentGenerationDone()so the generation is not closed when the tool call is received; it is closed later onturnComplete.closeActiveSession(): CleartoolCallPendingand, if set, closegenerationPendingTurnCompleteto avoid leaking state on session close.src/livekit_realtime_api_fix.tswith the same logic for use when patching@livekit/agents-plugin-google(e.g. via patch-package).Pre-Review Checklist
Testing
schedule_booking), speech completes without being cut off and WebSocket no longer closes with 1008.restaurant_agent.tsandrealtime_agent.tswork properly (for major changes)Additional Notes
node_modules/@livekit/agents-plugin-google(src + dist .js and .cjs) and mirrored insrc/livekit_realtime_api_fix.tsfor creating a patch. Consider upstreaming to livekit/agents-js so a future release includes these fixes and the patch can be removed.Note to reviewers: Please ensure the pre-review checklist is completed before starting your review.