feat: Gemini Live real-time voice, web image search, and follow-up fixes#35
Merged
feat: Gemini Live real-time voice, web image search, and follow-up fixes#35
Conversation
- Serialize WAV queue, dedicated AudioPlayer with load config, larger preroll/segments - Mic suppress during assistant PCM and local playback; failsafe unmute; playback-end detection (saw playing before idle/complete) - Exclude agent CRUD tools from Live API setup to avoid session instability - reloadHistory after stopSession from ChatScreen/LiveVoiceOverlay (fix Riverpod cycle) - ChatNotifier: stream live transcripts to chat list; session messageStream dedupe during calls - l10n: liveVoiceBargeInHint; regenerate localizations - Plus existing branch changes (gateway, catalog, onboarding, settings, etc.) Made-with: Cursor
- Use buildSystemPromptForAgent for all Live sessions; voiceBootstrap only triggers bootstrap sendText - Serialize full getContextMessages with tail-priority character budget - Pass userLanguage from chat when starting a call - Add live_session_transcript helper and unit tests Made-with: Cursor
38edf23 to
672ed00
Compare
- app_providers: call sessionManager.getOrCreate() before starting LiveAgentLoop so addMessage() no longer silently drops all transcripts and tool calls (was the root cause of session loss and missing tool pills) - live_voice_overlay: add _networkTurnComplete + _playlistLoadedIntoPlayer flags to prevent premature pipeline reset when the player exhausts its preroll segments before the next segment arrives (fixes first-turn cut) - live_voice_overlay: add _resumeFromGap() — seeks to the newly queued segment and calls play() without re-calling setAudioSource (which would restart from the beginning), eliminating mid-turn audio gaps - live_voice_overlay: add onError handler to agentEvents.listen() so stream errors don't silently kill the overlay - live_voice_overlay: disable automaticallyWaitsToMinimizeStalling on iOS, reduce Android bufferForPlaybackDuration to 250ms for near-instant start Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In voice sessions the model was silently calling web_browse/ request_user_action (CAPTCHA, login prompts) without first telling the user what to do. Added an explicit rule to voiceNote so the model always announces the required action aloud before opening the browser. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eliminates WAV file I/O and ConcatenatingAudioSource segment boundaries in favour of a single continuous StreamAudioSource (_LivePcmStreamSource) that streams 24kHz PCM directly to just_audio. Key changes: - Remove dart:io / path_provider imports and all temp-file logic - Drop _pcmBuffer, _livePlaylist, _kFlushBytes, _kStartThreshold, _flushSegment, _writeAndQueueWavInternal, _resumeFromGap, _buildWav - Add _LivePcmStreamSource (StreamAudioSource) with streaming WAV header (0xFFFFFFFF sizes for unknown-length) and iOS range-request support - _feedPcm buffers PCM in-memory and triggers _startStreamPlayback once _kPrerollBytes (1s) are available; _liveAudioGeneration prevents stale callbacks across back-to-back turns - LiveTurnComplete seals the stream so the player reaches ProcessingState .completed naturally — no seek, no gap logic, no debounce races - _armPlaybackEndListener simplified: no mid-turn gap/_networkTurnComplete needed; debounce 180ms on completion before re-enabling mic Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…oSource" This reverts commit 48b3225.
- Add AgentsDefaults.liveVoiceName (default 'Puck', persisted as live_voice_name in JSON) with full fromJson/toJson/copyWith support - Pass liveVoiceName to LiveSessionConfig in startSession() so the chosen voice is used on every call - Add voice dropdown to Settings → Providers & Models → Voice call section (30 Gemini Live voices with personality labels) - Add SetLiveVoiceTool (set_live_voice) so the model can change the voice on the user's behalf mid-conversation; takes effect next call - Register SetLiveVoiceTool in toolRegistryProvider - Add liveVoiceNameLabel l10n key (en + all 18 generated locales) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add missing/updated translations across locales, regenerate generated localizations, and improve the l10n audit script to catch untranslated strings. Made-with: Cursor
Replace temp-file based live audio segment handling with an in-memory StreamAudioSource pipeline, improving playback continuity and mic suppression recovery during live turns. Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Register WebImageSearchTool; default session_status to active session key; invalidate live capability when credentials reload; prefer CLEAR chunk text for tool pills. Made-with: Cursor
Made-with: Cursor
Add CallSfxService hooks, frosted header, accent bar, longer preroll segments, pending-flush handling, and refined speaking/tool states. Made-with: Cursor
Made-with: Cursor
Set PYTHONUNBUFFERED for sandbox shells; wait for reader futures before closing pipes, with a forced close path on timeout. Made-with: Cursor
…yload Made-with: Cursor
… state Made-with: Cursor
Made-with: Cursor
…ep, CI checks - Add ARB keys for live voice HUD (status, badge, tooltip, fallback title) across locales. - Localize credentials, security settings, Bedrock auth segments, QR scan, OAuth, browser overlay, and app init error. - Add scripts/check_l10n_keys.py for ARB parity and report_l10n_untranslated.py for review. - Add GitHub workflow to run key parity on l10n changes. - Regenerate app_localizations. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This branch adds Gemini Live real-time voice: WebSocket session, PCM capture and playback,
LiveAgentLoop, Riverpod wiring, and chat UI (live voice button and overlay). It also includes follow-up fixes and related features merged on this branch (web image search, terminal/sandbox reliability, session and usage fixes, localization, and CI).Base:
masterMerge strategy: ordinary merge or squash per repo preference.
What changed
Gemini Live (core)
supportsLiveflag and Gemini Live entries in the model catalog.set_live_voicetool, config), connect/end SFX, high-sensitivity server-side speech detection where applicable.LiveAgentLoopand hooks inAgentLoop; system prompt and transcript alignment with chat; buffered transcript persistence on stop/disconnect; instruction to speak beforerequest_user_action.Related features and fixes
web_image_searchtool (headless browser + Brave API fallback), markdown image extraction in fetch; agent guidance to useweb_image_searchfor photo requests.session_statusaccepts injected__session_key; invalidate model capability providers after config edits.How to test
flutter pub get && flutter analyze && flutter testweb_image_searchand terminal/sandbox flows if your review scope includes those paths.Notes