feat: Gemini Live real-time voice, web image search, and follow-up fixes by frapeti · Pull Request #35 · flutterclaw/flutterclaw

frapeti · 2026-03-30T00:17:09Z

Summary

This branch adds Gemini Live real-time voice: WebSocket session, PCM capture and playback, LiveAgentLoop, Riverpod wiring, and chat UI (live voice button and overlay). It also includes follow-up fixes and related features merged on this branch (web image search, terminal/sandbox reliability, session and usage fixes, localization, and CI).

Base: master
Merge strategy: ordinary merge or squash per repo preference.

What changed

Gemini Live (core)

Models: supportsLive flag and Gemini Live entries in the model catalog.
Services: Gemini Live WebSocket client, PCM streaming, playback queue (in-memory WAV source), voice selection (settings UI, set_live_voice tool, config), connect/end SFX, high-sensitivity server-side speech detection where applicable.
Agent: LiveAgentLoop and hooks in AgentLoop; system prompt and transcript alignment with chat; buffered transcript persistence on stop/disconnect; instruction to speak before request_user_action.
Providers: Riverpod providers for live voice state and lifecycle.
UI: Chat input bar, live voice overlay (glass chrome, haptics), subtle border on the message stack during live voice, session/tool wiring.

Related features and fixes

Web: web_image_search tool (headless browser + Brave API fallback), markdown image extraction in fetch; agent guidance to use web_image_search for photo requests.
OpenAI provider: Request usage metadata in streamed chat completions.
Agent/session: Merge usage stats (sum cache read/write tokens); session_status accepts injected __session_key; invalidate model capability providers after config edits.
Terminal / sandbox: Android sandbox drains process output before closing streams; streamed shell output trimmed only on final CLEAR payload; terminal replaces buffer from CLEAR JSON and shows streaming wait state.
Localization & tooling: Live voice overlay strings, locale sweep, credentials/security/auth-related strings, CI checks.

How to test

flutter pub get && flutter analyze && flutter test
Manual: start a chat, enable live voice, verify mic/playback, disconnect/reconnect, and that transcripts persist as expected.
Optional: exercise web_image_search and terminal/sandbox flows if your review scope includes those paths.

Notes

Project-wide analyzer infos/warnings may still appear; this branch should not introduce new analyzer errors.
Large diff (~10k additions) is expected given new service layer, UI, and tools.

- Serialize WAV queue, dedicated AudioPlayer with load config, larger preroll/segments - Mic suppress during assistant PCM and local playback; failsafe unmute; playback-end detection (saw playing before idle/complete) - Exclude agent CRUD tools from Live API setup to avoid session instability - reloadHistory after stopSession from ChatScreen/LiveVoiceOverlay (fix Riverpod cycle) - ChatNotifier: stream live transcripts to chat list; session messageStream dedupe during calls - l10n: liveVoiceBargeInHint; regenerate localizations - Plus existing branch changes (gateway, catalog, onboarding, settings, etc.) Made-with: Cursor

- Use buildSystemPromptForAgent for all Live sessions; voiceBootstrap only triggers bootstrap sendText - Serialize full getContextMessages with tail-priority character budget - Pass userLanguage from chat when starting a call - Add live_session_transcript helper and unit tests Made-with: Cursor

- app_providers: call sessionManager.getOrCreate() before starting LiveAgentLoop so addMessage() no longer silently drops all transcripts and tool calls (was the root cause of session loss and missing tool pills) - live_voice_overlay: add _networkTurnComplete + _playlistLoadedIntoPlayer flags to prevent premature pipeline reset when the player exhausts its preroll segments before the next segment arrives (fixes first-turn cut) - live_voice_overlay: add _resumeFromGap() — seeks to the newly queued segment and calls play() without re-calling setAudioSource (which would restart from the beginning), eliminating mid-turn audio gaps - live_voice_overlay: add onError handler to agentEvents.listen() so stream errors don't silently kill the overlay - live_voice_overlay: disable automaticallyWaitsToMinimizeStalling on iOS, reduce Android bufferForPlaybackDuration to 250ms for near-instant start Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

In voice sessions the model was silently calling web_browse/ request_user_action (CAPTCHA, login prompts) without first telling the user what to do. Added an explicit rule to voiceNote so the model always announces the required action aloud before opening the browser. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Eliminates WAV file I/O and ConcatenatingAudioSource segment boundaries in favour of a single continuous StreamAudioSource (_LivePcmStreamSource) that streams 24kHz PCM directly to just_audio. Key changes: - Remove dart:io / path_provider imports and all temp-file logic - Drop _pcmBuffer, _livePlaylist, _kFlushBytes, _kStartThreshold, _flushSegment, _writeAndQueueWavInternal, _resumeFromGap, _buildWav - Add _LivePcmStreamSource (StreamAudioSource) with streaming WAV header (0xFFFFFFFF sizes for unknown-length) and iOS range-request support - _feedPcm buffers PCM in-memory and triggers _startStreamPlayback once _kPrerollBytes (1s) are available; _liveAudioGeneration prevents stale callbacks across back-to-back turns - LiveTurnComplete seals the stream so the player reaches ProcessingState .completed naturally — no seek, no gap logic, no debounce races - _armPlaybackEndListener simplified: no mid-turn gap/_networkTurnComplete needed; debounce 180ms on completion before re-enabling mic Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…oSource" This reverts commit 48b3225.

- Add AgentsDefaults.liveVoiceName (default 'Puck', persisted as live_voice_name in JSON) with full fromJson/toJson/copyWith support - Pass liveVoiceName to LiveSessionConfig in startSession() so the chosen voice is used on every call - Add voice dropdown to Settings → Providers & Models → Voice call section (30 Gemini Live voices with personality labels) - Add SetLiveVoiceTool (set_live_voice) so the model can change the voice on the user's behalf mid-conversation; takes effect next call - Register SetLiveVoiceTool in toolRegistryProvider - Add liveVoiceNameLabel l10n key (en + all 18 generated locales) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add missing/updated translations across locales, regenerate generated localizations, and improve the l10n audit script to catch untranslated strings. Made-with: Cursor

Replace temp-file based live audio segment handling with an in-memory StreamAudioSource pipeline, improving playback continuity and mic suppression recovery during live turns. Made-with: Cursor

Made-with: Cursor

Register WebImageSearchTool; default session_status to active session key; invalidate live capability when credentials reload; prefer CLEAR chunk text for tool pills. Made-with: Cursor

Made-with: Cursor

Add CallSfxService hooks, frosted header, accent bar, longer preroll segments, pending-flush handling, and refined speaking/tool states. Made-with: Cursor

Made-with: Cursor

Set PYTHONUNBUFFERED for sandbox shells; wait for reader futures before closing pipes, with a forced close path on timeout. Made-with: Cursor

…yload Made-with: Cursor

… state Made-with: Cursor

Made-with: Cursor

…ep, CI checks - Add ARB keys for live voice HUD (status, badge, tooltip, fallback title) across locales. - Localize credentials, security settings, Bedrock auth segments, QR scan, OAuth, browser overlay, and app init error. - Add scripts/check_l10n_keys.py for ARB parity and report_l10n_untranslated.py for review. - Add GitHub workflow to run key parity on l10n changes. - Regenerate app_localizations. Made-with: Cursor

frapeti added 7 commits March 29, 2026 23:13

feat(models): supportsLive flag and Gemini Live catalog entries

4ef3857

feat(services): Gemini Live WebSocket, PCM capture, and playback

1a0a9a6

feat(agent): LiveAgentLoop and AgentLoop integration

7c2305d

feat(providers): wire Riverpod for live voice

841333e

feat(ui): chat, input bar, live voice overlay, and model settings

d458d66

frapeti force-pushed the feat/gemini-live-voice branch from 38edf23 to 672ed00 Compare March 30, 2026 02:13

frapeti and others added 22 commits March 30, 2026 01:30

Revert "feat(live): replace file-based audio pipeline with StreamAudi…

26d3b5d

…oSource" This reverts commit 48b3225.

chore(l10n): complete locale strings and audit tooling

da793a3

Add missing/updated translations across locales, regenerate generated localizations, and improve the l10n audit script to catch untranslated strings. Made-with: Cursor

feat(live): migrate voice playback queue to in-memory WAV source

539a95a

Replace temp-file based live audio segment handling with an in-memory StreamAudioSource pipeline, improving playback continuity and mic suppression recovery during live turns. Made-with: Cursor

feat(openai): request usage metadata in streamed chat completions

c79c503

Made-with: Cursor

feat(gemini-live): enable high-sensitivity server-side speech detection

151ca0d

Made-with: Cursor

fix(session-tools): accept injected __session_key for session_status

48a81ab

Made-with: Cursor

feat(web): add web_image_search and markdown img extraction in fetch

1eb3a71

Made-with: Cursor

feat(agent): instruct model to use web_image_search for photo requests

99ae788

Made-with: Cursor

fix(agent): sum cache read/write tokens when merging usage stats

f025af8

Made-with: Cursor

fix(live-agent): persist buffered transcripts on stop and disconnect

4056044

Made-with: Cursor

feat(app): register web_image_search and tighten session/tool UI wiring

9d43209

Register WebImageSearchTool; default session_status to active session key; invalidate live capability when credentials reload; prefer CLEAR chunk text for tool pills. Made-with: Cursor

feat(audio): add programmatic call connect/end sound effects

c461f96

Made-with: Cursor

feat(live-ui): glass chrome, SFX, haptics, and steadier PCM playback

5be4215

Add CallSfxService hooks, frosted header, accent bar, longer preroll segments, pending-flush handling, and refined speaking/tool states. Made-with: Cursor

feat(chat): subtle border around message stack during live voice

4ee344a

Made-with: Cursor

fix(settings): invalidate model capability providers after config edits

e20b0cc

Made-with: Cursor

fix(android-sandbox): drain process output before closing streams

3130992

Set PYTHONUNBUFFERED for sandbox shells; wait for reader futures before closing pipes, with a forced close path on timeout. Made-with: Cursor

fix(sandbox-tools): trim streamed shell output only at final CLEAR pa…

bb84fcb

…yload Made-with: Cursor

fix(terminal): replace buffer from CLEAR JSON and show streaming wait…

e4567df

… state Made-with: Cursor

frapeti added 2 commits March 31, 2026 22:04

feat(web): image search via headless browser, Brave API fallback

1e1d7d5

Made-with: Cursor

frapeti changed the title ~~feat: Gemini Live voice (real-time)~~ feat: Gemini Live real-time voice, web image search, and follow-up fixes Apr 1, 2026

frapeti merged commit c7bc4ee into master Apr 1, 2026
3 checks passed

frapeti deleted the feat/gemini-live-voice branch April 1, 2026 01:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Gemini Live real-time voice, web image search, and follow-up fixes#35

feat: Gemini Live real-time voice, web image search, and follow-up fixes#35
frapeti merged 31 commits intomasterfrom
feat/gemini-live-voice

frapeti commented Mar 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

frapeti commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Gemini Live (core)

Related features and fixes

How to test

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

frapeti commented Mar 30, 2026 •

edited

Loading