Skip to content

fix: Google Vertex AI realtime crash when calling tools#1130

Merged
toubatbrian merged 2 commits intolivekit:mainfrom
adriablancafort:main
Mar 14, 2026
Merged

fix: Google Vertex AI realtime crash when calling tools#1130
toubatbrian merged 2 commits intolivekit:mainfrom
adriablancafort:main

Conversation

@adriablancafort
Copy link
Copy Markdown
Contributor

@adriablancafort adriablancafort commented Mar 13, 2026

Description

Fixes two issues when using the Gemini Live 2.5 Flash model (Vertex AI) with tool calls:

  1. 1008 WebSocket crash: When Gemini sends a tool_call, it rejects any sendRealtimeInput until the tool response is sent. If audio or activity events are sent during tool execution, the WebSocket closes with code 1008. This PR gates all realtime input with a toolCallPending flag and clears it in try/finally after sending the tool response.

  2. Voice interruption on tool call: Previously, receiving a tool call immediately closed the current generation (and its audio channel), cutting off the agent’s speech. The previous generation is now kept open until the server sends turnComplete, so playback finishes before the next turn.

Changes Made

  • Tool-call gating: Added toolCallPending; set to true when a tool call is received, cleared in finally after sendToolResponse. Skip queuing/sending realtime input (audio, activityStart, activityEnd) while toolCallPending is true in pushAudio, startUserActivity, and sendTask (realtime_input case).
  • Deferred generation close on tool call: Added generationPendingTurnComplete. When a tool call starts a new generation, the previous generation is no longer closed immediately; it is stored and closed only when turnComplete is received in handleServerContent, so its audio stream stays open until the turn actually ends.
  • markCurrentGenerationDone(keepFunctionChannelOpen?, gen?): Added optional second parameter so a specific generation (the pending one) can be closed on turnComplete instead of always closing the current one.
  • startNewGeneration(): If the previous generation had an open functionChannel (tool-call path), set generationPendingTurnComplete = previousGen and do not call markCurrentGenerationDone(); otherwise keep existing “Finalizing previous” behavior.
  • handleToolCall(): Removed markCurrentGenerationDone() so the generation is not closed when the tool call is received; it is closed later on turnComplete.
  • closeActiveSession(): Clear toolCallPending and, if set, close generationPendingTurnComplete to avoid leaking state on session close.
  • Patch file: Added src/livekit_realtime_api_fix.ts with the same logic for use when patching @livekit/agents-plugin-google (e.g. via patch-package).

Pre-Review Checklist

  • Build passes: All builds (lint, typecheck, tests) pass locally
  • AI-generated code reviewed: Removed unnecessary comments and ensured code quality
  • Changes explained: All changes are properly documented and justified above
  • Scope appropriate: All changes relate to the PR title, or explanations provided for why they're included
  • Video demo: A small video demo showing changes works as expected and did not break any existing functionality using Agent Playground (if applicable)

Testing

  • Manually tested with Gemini Live 2.5 Flash on Vertex AI: agent speaks, triggers a tool (e.g. schedule_booking), speech completes without being cut off and WebSocket no longer closes with 1008.
  • Automated tests added/updated (if applicable)
  • All tests pass
  • Make sure both restaurant_agent.ts and realtime_agent.ts work properly (for major changes)

Additional Notes

  • Logic is applied in node_modules/@livekit/agents-plugin-google (src + dist .js and .cjs) and mirrored in src/livekit_realtime_api_fix.ts for creating a patch. Consider upstreaming to livekit/agents-js so a future release includes these fixes and the patch can be removed.
  • Reference: GitHub discussion on the Gemini 1008 error (gate realtime input during tool execution). Add the link here if you have it.

Note to reviewers: Please ensure the pre-review checklist is completed before starting your review.

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Mar 13, 2026

🦋 Changeset detected

Latest commit: 6f41ecd

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 21 packages
Name Type
@livekit/agents-plugin-google Patch
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugin-xai Patch
@livekit/agents-plugins-test Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@adriablancafort
Copy link
Copy Markdown
Contributor Author

I was experiencing several issues with gemini-live-2.5-flash-native-audio on Google Vertex AI. After I made these changes the issues were fixed.
I would really appreciate if the LiveKit team can allocate some effort in making sure the Gemini Realtime is working good, because it is one of the best technologies I have tried so far for AI Voice Agents. Thanks!

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

@adriablancafort
Copy link
Copy Markdown
Contributor Author

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

this.markCurrentGenerationDone();
}

private handleToolCallCancellation(cancellation: types.LiveServerToolCallCancellation): void {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 handleToolCallCancellation does not clear toolCallPending, permanently blocking audio input

The new toolCallPending flag is set to true when a toolCall arrives (realtime_api.ts:1032, realtime_api.ts:1509), and is only cleared in two places: when a tool_response is successfully sent (realtime_api.ts:988) or when the session closes (realtime_api.ts:483). However, handleToolCallCancellation at line 1533 only logs a warning and does not clear toolCallPending.

When the Gemini server cancels tool calls (e.g., because the user interrupted while tools were being processed), the server sends a toolCallCancellation event. This can trigger input_speech_started (via the interrupted server event), which interrupts the current speech handle. The interrupted speech handle cancels tool execution (agents/src/voice/agent_activity.ts:2277) and returns without ever calling updateChatCtx with the tool results (agents/src/voice/agent_activity.ts:2361). Since no tool_response is sent, toolCallPending remains true, causing pushAudio to silently drop all audio frames (realtime_api.ts:655), startUserActivity to no-op (realtime_api.ts:748), and all realtime_input events to be dropped in the send task (realtime_api.ts:994). The session is effectively deaf until it restarts.

(Refers to lines 1533-1540)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@toubatbrian toubatbrian merged commit 5801dd7 into livekit:main Mar 14, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants