fix(google realtime): support gemini-3.1-flash-live-preview#5251
fix(google realtime): support gemini-3.1-flash-live-preview#5251
Conversation
Gemini 3.1 rejects send_client_content after the first model turn. Route generate_reply through send_realtime_input, drop mid-session LiveClientContent to prevent 1007 errors, add history_config for initial context seeding, and skip empty server_content events. New reconnect_on_update option enables session restart on update_instructions/update_chat_ctx for restricted models.
JiwaniZakir
left a comment
There was a problem hiding this comment.
In generate_reply, the restricted-model branch sends instructions directly as a LiveClientRealtimeInput text prompt, but the original non-restricted path framed those same instructions as a role="model" content turn followed by a role="user" placeholder turn. These have meaningfully different semantics — the instructions were intended to prime the model's perspective, not arrive as user speech — so the behavior diverges silently for callers passing instructions to generate_reply on a restricted model.
In _send_task, when reconnect_on_update=True triggers _mark_restart_needed() upon receiving a LiveClientContent, the content in msg.turns is discarded. The reconnect re-seeds history via system_instruction/initial_history, but any turn data specific to that particular LiveClientContent message (e.g., a one-off model-role instruction) is lost without any log warning, making this a silent data loss path that could be hard to debug.
The RESTRICTED_CLIENT_CONTENT_MODELS membership check is now scattered across at least three call sites (generate_reply, _send_task, and implicitly _build_connect_config per the comment). Centralizing this behind a small helper method like _is_restricted_client_content_model() would reduce the risk of a future model being added to the frozenset but missing one of the branching locations.
davidzhao
left a comment
There was a problem hiding this comment.
I don't think this type of workaround is a good idea. let's engage the deepmind folks to understand the best path forward
| conn_options: APIConnectOptions = DEFAULT_API_CONNECT_OPTIONS, | ||
| http_options: NotGivenOr[types.HttpOptions] = NOT_GIVEN, | ||
| thinking_config: NotGivenOr[types.ThinkingConfig] = NOT_GIVEN, | ||
| reconnect_on_update: bool = False, |
There was a problem hiding this comment.
should this be a user specified option? when do you want this to be True unless the model requires it?
| # See: https://ai.google.dev/gemini-api/docs/models/gemini-3.1-flash-live-preview#migrating | ||
| if self._opts.model in RESTRICTED_CLIENT_CONTENT_MODELS: | ||
| prompt = instructions if is_given(instructions) else "." | ||
| self._send_client_event(types.LiveClientRealtimeInput(text=prompt)) |
There was a problem hiding this comment.
this isn't going to work. realtime input is coming from the end user, but generate_reply instructions needs to be coming from the model itsel
Adds working support for gemini-3.1-flash-live-preview.
Gemini 3.1 changed how send_client_content works. It's now only allowed for initial history seeding (with history_config.initial_history_in_client_content=true).
After the first model turn, all text input must go through send_realtime_input. This broke generate_reply, and any mid-session LiveClientContent (from update_instructions, update_chat_ctx, etc.) gets rejected with a 1007 error.
What this PR does:
Tested locally: greeting via generate_reply, bidirectional audio, function tool calling, update_instructions and update_chat_ctx with reconnect all work.
Known limitation: mid-session update_instructions and update_chat_ctx require a reconnect on 3.1 - there's no way around this since the model simply doesn't accept send_client_content after the first turn. Google's migration guide confirms this is by design.

Ref: https://ai.google.dev/gemini-api/docs/models/gemini-3.1-flash-live-preview#migrating
Related: #5234