Gemini Realtime: transcribed user message not added to transcript after tool call

When using the Gemini Realtime model with `input_audio_transcription` enabled, the user input immediately preceding a tool call is not being added to the transcript.

### Cause
The `RealtimeSession._handle_tool_calls` method always marks the current generation as done, effectively erasing the current (non-final) user input transcription. Since this handler runs  _before_ the  Gemini model sends the `generation_complete` message, the transcribed `input_audio_transcription_completed` event is never emitted with `is_final` set to `True`, and the message is never added to the transcript. 

### Thoughts
I tested removing the `_mark_current_generation_done()` call at the end of the `RealtimeSession._handle_tool_calls()` method, and this did fix the issue, without raising any other problems that I could see. Since Gemini sends the `generation_complete` message after the tool call is complete, isn't it redundant to mark the generation done in the tool call handler anyways?

Thanks for your time, and please let me know if there's an opportunity for a PR.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemini Realtime: transcribed user message not added to transcript after tool call #2432

Cause

Thoughts

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gemini Realtime: transcribed user message not added to transcript after tool call #2432

Description

Cause

Thoughts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions