Skip to content

LLMs that support bidirectional mode #3252

@TanejaAnkisetty

Description

@TanejaAnkisetty

My team is building a voice based chatbot with bidirectional streaming via WebSocket. The backend has Agentic workflow using ADK and Gemini models. Currently we are using the gemini-2.0-flash-exp model. It is clear that it is an experimental model and hence it has low quota and Provisioned Throughput is not supported. Hence, the recommendation was to migrate to a stable model. But I'm unable to find a LLM that supports our requirements.

Following are the different errors I have received when I tested various LLMs -

  1. gemini-2.0-flash : ERROR echo_ai.consumer consumer agent_to_client: Error during agent streaming: received 1007 (invalid frame payload data) gemini-2.0-flash is not supported in the live api.
  2. gemini-2.5-flash : websockets.exceptions.ConnectionClosedError: received 1007 (invalid frame payload data) gemini-2.5-flash is not supported in the live api.; then sent 1007 (invalid frame payload data) gemini-2.5-flash is not supported in the live api
  3. gemini-2.0-flash-live-preview-04-09 : google.genai.errors.ClientError: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': 'gemini-2.0-flash-live-preview-04-09 is not supported in the generateContent API.', 'status': 'INVALID_ARGUMENT'}}
  4. gemini-live-2.5-flash-preview-native-audio : google.genai.errors.ClientError: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': 'gemini-live-2.5-flash-preview-native-audio is not supported in the generateContent API.', 'status': 'INVALID_ARGUMENT'}}

Following is the high level workflow of the solution -

  • User speaks to frontend (browser)
  • Audio is translated into PCM and then sent to WebSocket to backend
  • Backend is powered by ADK. The ASCII text (of the user input) is provided to LLMs
  • LLMs provide textual response, then sent to WebSocket and sent to frontend
  • Frontend translates into audio and then plays the audio on the browser

An important feature that is absolutely required in out workflow is the bidirectional mode - where we can provide end users with the experience of natural, human-like voice conversations, including the ability for the user to interrupt the agent's responses with voice commands. https://google.github.io/adk-docs/streaming/

Currently, I found only gemini-2.0-flash-exp LLM that works for our solution without throwing any errors except for the 429 error. Every other stable LLM, I tried gave me errors as shown above. So could you please suggest the alternative for gemini-2.0-flash-exp LLM , so that my workflow works without any errors and without 429 resource exhaustion errors.

Metadata

Metadata

Assignees

Labels

live[Component] This issue is related to live, voice and video chat

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions