fix(voice): suppress "Welcome back" on quick reconnects (<60s) by liususan091219 · Pull Request #515 · sonichi/sutando

liususan091219 · 2026-04-24T21:37:22Z

Summary

When Gemini Live session drops and reconnects within 60 seconds (network blip, not a real away), silently resume instead of re-greeting the user. Eliminates the jarring "Welcome back" every few sentences that interrupts long voice sessions.

What changed

src/task-bridge.ts — new getSecondsSinceLastTurn() helper reads the timestamp of the most recent conversation.log line (any turn or SESSION_END marker)
src/voice-agent.ts — in MainAgent.greeting, when getRecentConversation() returns non-empty (we're in a reconnect), check the gap:
- < 60s → tell Gemini to stay silent, no "Welcome back"
- ≥ 60s or null → keep existing "Welcome back briefly" behavior

The existing REPLAYED-HISTORY injection is preserved in both branches — only the post-replay user-facing utterance differs. Clean exits (SESSION_END logged) still fall through to the normal fresh-greeting path.

Why this scope

bodhi has been fixed several times around reconnect state machines already; this PR doesn't touch bodhi. It's a thin application-layer UX improvement: the app already knows when the last turn happened (conversation.log), we just weren't reading that timestamp to distinguish "network blip reconnect" from "user came back later."

Test plan

TypeScript compiles (npx tsc --noEmit clean)
Voice agent restarted locally with this branch; Susan confirmed "好像welcome back没了" during a live test
Manual verification of slow-reconnect path: disconnect >60s → "Welcome back" still fires

🤖 Generated with Claude Code

When Gemini Live session drops and reconnects within 60s (network blip, not a real away), silently resume instead of re-greeting the user. Eliminates the jarring "Welcome back" every few sentences that's been annoying during long voice sessions. Changes: - task-bridge.ts: add getSecondsSinceLastTurn() helper that reads the timestamp of the most recent conversation.log line (any turn or SESSION_END marker) - voice-agent.ts: in MainAgent.greeting, when getRecentConversation() returns non-empty (= we're in a reconnect), check the gap: * < 60s: tell Gemini to stay silent, no "Welcome back" * >= 60s or null: keep existing "Welcome back" briefly behavior Existing REPLAYED-HISTORY injection is preserved in both branches — only the post-replay user-facing utterance differs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sonichi

Joint MacBook + Mini cold review — 1 issue confirmed.

Finding (Mini): getSecondsSinceLastTurn() reads any role, not just user/assistant turns.

Verified against src/task-bridge.ts:393 — logConversation('core-agent', '[task:...] ...') writes a line every time the result watcher delivers a task result. So the conversation log contains:

user / assistant voice turns
core-agent task-result lines (from the proactive loop or any background task)
SESSION_END markers

getSecondsSinceLastTurn() reads the last line regardless of role, so a recent task-result write makes a long-away user look like a quick reconnect — and the "Welcome back" gets suppressed when it shouldn't.

Concrete failure mode: user is away 5 minutes, proactive loop ships a task result 30s before they reconnect → getSecondsSinceLastTurn() returns 30 → isQuickReconnect=true → silent reconnect even though the user expected a greeting.

Suggested fix: walk the log from the end backwards, skipping core-agent and SESSION_END lines, return the timestamp of the most recent user or assistant line. Optionally bound the search to lines after the most recent SESSION_END (mirrors how getRecentConversation() already trims).

This is a UX nit (greeting suppressed when it shouldn't be), not a correctness bug — but worth fixing before merge since it's a 5-line change. Comment review, not request-changes.

…ines Addresses sonichi's review on PR #515. The previous impl read the LAST line of conversation.log regardless of role. Since `task-bridge.ts:393` writes a `core-agent|[task:...]` line on every task-result delivery (proactive loop, background tasks), a recent task result made a long-away user look like a quick reconnect and the "Welcome back" was wrongly suppressed. Fix: walk the log backwards, skip `core-agent` and any non-dialogue lines, stop at SESSION_END. Mirrors how getRecentConversation already trims at the boundary. Adds tests/seconds-since-last-turn.test.ts (7 cases including the exact failure mode described in the review). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

liususan091219 · 2026-04-25T14:06:49Z

getSecondsSinceLastTurn() reads the last line regardless of role, so a recent task-result write makes a long-away user look like a quick reconnect — and the "Welcome back" gets suppressed when it shouldn't.

Suggested fix: walk the log from the end backwards, skipping core-agent and SESSION_END lines, return the timestamp of the most recent user or assistant line. Optionally bound the search to lines after the most recent SESSION_END (mirrors how getRecentConversation() already trims).

Addressed in cd6473b. getSecondsSinceLastTurn() now walks conversation.log backwards, skipping core-agent task-result lines, and stops at SESSION_END so it doesn't reach back into a cleanly-ended prior session. The 5-min-away + recent core-agent line failure mode is handled. Added tests/seconds-since-last-turn.test.ts (7 cases) covering that exact scenario plus boundary conditions.

— Maddy (MacBook bot)

Pulls in the bodhi #14 silent-context-injection fix on Gemini reconnect. Combined with PR #515 (merged into main, now in this branch via merge), this covers both reconnect paths: - Gemini-internal reconnect: silent injection drops "Say I'm back" prompt - Client reconnect <60s: getSecondsSinceLastTurn suppresses "Welcome back" Voice-agent verified: skill-loader still fires 3 tools on boot (PID 23502). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(mode-indicator): full 3-mode UI recovered from Apr 21 session jsonl Restores the 3-mode (active / meeting / presenter) UI that was lost when `git reset --hard HEAD~2` on dev/apr-21-local-ships at 2026-04-23 06:22 PT discarded uncommitted working-tree edits alongside two sync-memory commits. Recovered by replaying 15 Edit calls from session 6eb8f10e jsonl (2026-04-21 23:18-23:54 PT) on top of dev/apr-21-local-ships's presenter-dot indicator base. Replay applied 15/15 cleanly. Sutando.app (`src/Sutando/main.swift`): - New properties: `voiceMode`, `presenterModeActive`, three menu-item weak refs (`modeActiveMenuItem`, `modeMeetingMenuItem`, `modePresenterMenuItem`). - Three-mode radio in menu bar dropdown: "Mode: Active" / "Mode: Meeting" / "Mode: Presenter". Exactly one has ● at a time. Clicking any item switches: Active/Meeting write `state/voice-mode.request` (voice-agent picks up on 1s poll); Presenter POSTs to `:7877/presenter/on`. - Avatar badge: `avatarImage(presenterActive:meetingActive:)` paints a small purple dot when presenter is active, or amber dot for meeting — matches the web UI mode-pill colors. - New polls: `pollPresenterMode()` + `pollVoiceMode()` every 1s on the same timer. Both silent-fail if their backend is down. - `updateModeMenuItem()` recomposes the radio whenever either signal flips. Priority: presenter > meeting > active. Voice-agent (`src/voice-agent.ts`): - `switchModeTool.execute()` writes `state/voice-mode.txt` on switch_mode("active"|"meeting") so cross-process readers (web-client, Sutando.app) can resolve the unified mode. - `applyModeRequest()` polls `state/voice-mode.request` every 1s, applies the mode flip, deletes the request file. Lets Sutando.app send mode requests without an HTTP server in voice-agent. Web client (`src/web-client.ts`): - `mode` field added to /sse-status response — built from sync read of `state/voice-mode.txt` + cached presenter-active boolean (refreshed every 1s in background from :7877/presenter). - Mode pill in top-bar UI: 3 CSS variants (mode-active dim / mode-meeting amber / mode-presenter purple-glow). Polls /sse-status every 1.5s. Recovery details: notes/3-mode-recovered-edits-2026-04-21.md. Earlier #518 (simpler text-only version) is closed in favor of this fuller recovery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(voice-agent): sync voice-mode sentinel on switch_mode + after Zoom detect Two related bugs in the recovered 3-mode work surfaced when Chi tested the menu radio: clicking "Mode: Meeting" silently no-op'd because the in-memory `meetingActive` flag and the on-disk `state/voice-mode.txt` sentinel had diverged. (1) `switchModeTool.execute` flipped `meetingActive` but never called `writeVoiceModeSentinel()`. So a voice-triggered switch_mode kept the sentinel stale, and a subsequent menu click that wrote `voice-mode.request="meeting"` hit the `meetingActive === want` early-return in `applyModeRequest` — request consumed without writing the sentinel. Sutando.app's pollVoiceMode kept reading "active" → menu radio stuck. (2) The startup `writeVoiceModeSentinel()` call ran BEFORE the Zoom auto-detect, so even when Zoom was detected as in-meeting and `meetingActive` was set to true, the sentinel had already been written as "active" with no second write. Fix: call `writeVoiceModeSentinel()` inside `switchModeTool.execute`, and move the startup call to after the Zoom auto-detect block. Verified live: kickstart → sentinel "meeting" matches in-memory state (Zoom-running case); write voice-mode.request="meeting" → sentinel flips, log line "External request applied: mode=meeting" fires. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(voice-agent): elevate presenter_mode to direct tool trigger Mirrors the switch_mode pattern. Without this, "presenter mode on" / "the talk starts" routes to the work tool, producing "working on it" instead of silently flipping the mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(inline-tools): restore skill manifest loader stripped by PR #505 PR #505 (dup-name guard) inadvertently removed loadSkillManifestTools() along with the personalTools spread into inlineTools. Voice-agent had no way to register highlight_slide / presenter_mode tools from skills/personal-iclr-highlight/, so the autonav cue silently no-op'd during the ICLR talk rehearsal. Restored verbatim from 9b545c2 (the original local-ships commit). Smoke- tested: voice-agent boots with "[skill-loader] loaded 3 tool(s) from iclr-highlight" — highlightSlideTool, presenterModeTool, fullscreenTool (renamed to fullscreen_presenter to dodge the dup-name guard). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(voice-agent): reject filler words; disambig screen_record from fullscreen Two pre-talk hardenings observed during ICLR rehearsal: 1. recording-tools.ts: screen_record description now explicitly rejects "fullscreen" / "full screen" / "play fullscreen" cues. Chi asked to play the cross-owner video fullscreen and Sutando self-fired screen_record (06:34:46) right after fullscreen_presenter + play_video — STT or model association from the word "screen" was matching screen_record. 2. voice-agent.ts: restored FILLERS ARE NOT REQUESTS rule (originally on the unmerged 9b545c2 local-ships branch). Short utterances "hmm" / "um" / "ok" / "[BLANK_AUDIO]" are not instructions — Sutando should stay silent or ack, not call work and say "queued up". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(deps): bump bodhi-realtime-agent to ee58489 (PR #14 merged) Pulls in the bodhi #14 silent-context-injection fix on Gemini reconnect. Combined with PR #515 (merged into main, now in this branch via merge), this covers both reconnect paths: - Gemini-internal reconnect: silent injection drops "Say I'm back" prompt - Client reconnect <60s: getSecondsSinceLastTurn suppresses "Welcome back" Voice-agent verified: skill-loader still fires 3 tools on boot (PID 23502). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Revert "chore(deps): bump bodhi-realtime-agent to ee58489 (PR #14 merged)" This reverts commit f9a4be4. * feat(voice-agent): inject presenter-state into greeting + reconnect prompts Closes the gap that surfaced during talk rehearsal: voice-agent restart loses co-presenter mode anchor. Even though :7877/presenter still says active=true, the model gets the default Sutando system prompt and defaults to "Echo Act IV" generic greeting, routing slide-topic phrases to work instead of highlight_slide+narrate. Adds getPresenterStateMarker() that synchronously curls the highlight server and returns "[System: PRESENTER MODE IS CURRENTLY ACTIVE — apply the CO-PRESENTER protocol...]" when active. Appended to both greeting paths: - Fresh connect (line 484): generic greeting + presenter marker - Reconnect with history (line 453): reconnect prompt + presenter marker Failure-silent — if curl fails the marker is empty string, so non-talk sessions and missing iclr-highlight skill are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(voice-agent): suppress "Welcome back" when presenter mode is active Reconnect path was still emitting "Welcome back" when presenter is on, which breaks the co-presenter flow mid-talk. Reuses the existing quick-reconnect silent-reconnect hint when presenterActive is true, keyed off getPresenterStateMarker() returning non-empty. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(voice-agent): make instructions a per-session factory + inject presenter marker Previous a284190 + 5eb09af put the presenter-state marker in the GREETING string. Gemini Live treats the greeting as a user-style turn — the model called get_core_status to verify the "claim" instead of trusting it, and answered "presenter mode is not currently active" despite :7877/presenter being active. System instructions are authoritative. This commit: 1. Converts mainAgent.instructions from a static joined string to a factory function `() => [...].join('\n')`. Each session.start() now re-evaluates the array, picking up live state. 2. Adds getPresenterStateMarker() as the FIRST slot in the array — so the system prompt opens with "[System: PRESENTER MODE IS CURRENTLY ACTIVE...]" when the iclr-highlight server says active=true. Failure-silent — if the curl fails, the marker is empty string and the slot drops out of the join cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(deps): re-bump bodhi-realtime-agent to ee58489 (PR #14) Pairs with the system_instruction presenter marker (374cebf): with the marker now authoritative + sent on every session.start, the new bodhi's silent context injection should be safe — the per-cue gating in voice- context.txt is enforced from the system_instruction, not relying on the flat history replay to anchor model behavior. Trade-off in case of regression: revert this commit, voice-agent runs on old bodhi 4d1592eb (welcome-back fires, but no concatenation risk). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(open-file): QuickTime fullscreen survives Zoom screen-share The 'present' Apple event succeeds but the visible window doesn't come forward when Zoom is screen-sharing — Zoom's floating control bar holds the foreground z-order. Switch to Ctrl+Cmd+F routed through 'tell process "QuickTime Player"' so System Events targets QT directly instead of going through global keystroke focus that Zoom is grabbing. Same pattern as the Chrome fullscreen fix in skills/personal-iclr-highlight/tools.ts. Validated locally — TS clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

liususan091219 and others added 2 commits April 24, 2026 17:36

Merge branch 'main' into fix/voice-welcome-back-context-preservation

d9957f8

sonichi reviewed Apr 25, 2026

View reviewed changes

sonichi merged commit 9b6d0d8 into main Apr 25, 2026
1 check passed

sonichi deleted the fix/voice-welcome-back-context-preservation branch April 25, 2026 14:17

sonichi mentioned this pull request Apr 25, 2026

feat(mode-indicator): full 3-mode UI recovered from Apr 21 jsonl #519

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(voice): suppress "Welcome back" on quick reconnects (<60s)#515

fix(voice): suppress "Welcome back" on quick reconnects (<60s)#515
sonichi merged 3 commits intomainfrom
fix/voice-welcome-back-context-preservation

liususan091219 commented Apr 24, 2026

Uh oh!

sonichi left a comment

Uh oh!

liususan091219 commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

liususan091219 commented Apr 24, 2026

Summary

What changed

Why this scope

Test plan

Uh oh!

sonichi left a comment

Choose a reason for hiding this comment

Uh oh!

liususan091219 commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants