Skip to content

Gemini Realtime: transcribed user message not added to transcript after tool call #2432

@fredvollmer

Description

@fredvollmer

When using the Gemini Realtime model with input_audio_transcription enabled, the user input immediately preceding a tool call is not being added to the transcript.

Cause

The RealtimeSession._handle_tool_calls method always marks the current generation as done, effectively erasing the current (non-final) user input transcription. Since this handler runs before the Gemini model sends the generation_complete message, the transcribed input_audio_transcription_completed event is never emitted with is_final set to True, and the message is never added to the transcript.

Thoughts

I tested removing the _mark_current_generation_done() call at the end of the RealtimeSession._handle_tool_calls() method, and this did fix the issue, without raising any other problems that I could see. Since Gemini sends the generation_complete message after the tool call is complete, isn't it redundant to mark the generation done in the tool call handler anyways?

Thanks for your time, and please let me know if there's an opportunity for a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions