LLMs that support bidirectional mode

My team is building a voice based chatbot with bidirectional streaming via WebSocket. The backend has Agentic workflow using ADK and Gemini models. Currently we are using the gemini-2.0-flash-exp model. It is clear that it is an experimental model and hence it has low quota and Provisioned Throughput is not supported. Hence, the recommendation was to migrate to a stable model. But I'm unable to find a LLM that supports our requirements. 

Following are the different errors I have received when I tested various LLMs - 

1. **gemini-2.0-flash :** ERROR echo_ai.consumer consumer agent_to_client: Error during agent streaming: received 1007 (invalid frame payload data) gemini-2.0-flash is not supported in the live api.
2. **gemini-2.5-flash :** websockets.exceptions.ConnectionClosedError: received 1007 (invalid frame payload data) gemini-2.5-flash is not supported in the live api.; then sent 1007 (invalid frame payload data) gemini-2.5-flash is not supported in the live api
3. **gemini-2.0-flash-live-preview-04-09 :** google.genai.errors.ClientError: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': 'gemini-2.0-flash-live-preview-04-09 is not supported in the generateContent API.', 'status': 'INVALID_ARGUMENT'}}
4. **gemini-live-2.5-flash-preview-native-audio :** google.genai.errors.ClientError: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': 'gemini-live-2.5-flash-preview-native-audio is not supported in the generateContent API.', 'status': 'INVALID_ARGUMENT'}}

Following is the high level workflow of the solution -

- User speaks to frontend (browser) 
- Audio is translated into PCM and then sent to WebSocket to backend
- Backend is powered by ADK. The ASCII text (of the user input) is provided to LLMs
- LLMs provide textual response, then sent to WebSocket and sent to frontend 
- Frontend translates into audio and then plays the audio on the browser

An important feature that is absolutely required in out workflow is the bidirectional mode - where we can provide end users with the experience of natural, human-like voice conversations, including the ability for the user to interrupt the agent's responses with voice commands. https://google.github.io/adk-docs/streaming/

Currently, I found only gemini-2.0-flash-exp LLM that works for our solution without throwing any errors except for the 429 error. Every other stable LLM, I tried gave me errors as shown above. So could you please suggest the alternative for gemini-2.0-flash-exp LLM , so that my workflow works without any errors and without 429 resource exhaustion errors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLMs that support bidirectional mode #3252

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LLMs that support bidirectional mode #3252

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions