fix(open-file): QuickTime fullscreen survives Zoom screen-share#520
Merged
fix(open-file): QuickTime fullscreen survives Zoom screen-share#520
Conversation
Restores the 3-mode (active / meeting / presenter) UI that was lost
when `git reset --hard HEAD~2` on dev/apr-21-local-ships at 2026-04-23
06:22 PT discarded uncommitted working-tree edits alongside two
sync-memory commits.
Recovered by replaying 15 Edit calls from session 6eb8f10e jsonl
(2026-04-21 23:18-23:54 PT) on top of dev/apr-21-local-ships's
presenter-dot indicator base. Replay applied 15/15 cleanly.
Sutando.app (`src/Sutando/main.swift`):
- New properties: `voiceMode`, `presenterModeActive`, three menu-item
weak refs (`modeActiveMenuItem`, `modeMeetingMenuItem`,
`modePresenterMenuItem`).
- Three-mode radio in menu bar dropdown: "Mode: Active" /
"Mode: Meeting" / "Mode: Presenter". Exactly one has ● at a time.
Clicking any item switches: Active/Meeting write
`state/voice-mode.request` (voice-agent picks up on 1s poll);
Presenter POSTs to `:7877/presenter/on`.
- Avatar badge: `avatarImage(presenterActive:meetingActive:)` paints
a small purple dot when presenter is active, or amber dot for
meeting — matches the web UI mode-pill colors.
- New polls: `pollPresenterMode()` + `pollVoiceMode()` every 1s on
the same timer. Both silent-fail if their backend is down.
- `updateModeMenuItem()` recomposes the radio whenever either signal
flips. Priority: presenter > meeting > active.
Voice-agent (`src/voice-agent.ts`):
- `switchModeTool.execute()` writes `state/voice-mode.txt` on
switch_mode("active"|"meeting") so cross-process readers (web-client,
Sutando.app) can resolve the unified mode.
- `applyModeRequest()` polls `state/voice-mode.request` every 1s,
applies the mode flip, deletes the request file. Lets Sutando.app
send mode requests without an HTTP server in voice-agent.
Web client (`src/web-client.ts`):
- `mode` field added to /sse-status response — built from sync read of
`state/voice-mode.txt` + cached presenter-active boolean (refreshed
every 1s in background from :7877/presenter).
- Mode pill in top-bar UI: 3 CSS variants (mode-active dim /
mode-meeting amber / mode-presenter purple-glow). Polls /sse-status
every 1.5s.
Recovery details: notes/3-mode-recovered-edits-2026-04-21.md.
Earlier #518 (simpler text-only version) is closed in favor of this
fuller recovery.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…m detect
Two related bugs in the recovered 3-mode work surfaced when Chi tested
the menu radio: clicking "Mode: Meeting" silently no-op'd because the
in-memory `meetingActive` flag and the on-disk `state/voice-mode.txt`
sentinel had diverged.
(1) `switchModeTool.execute` flipped `meetingActive` but never called
`writeVoiceModeSentinel()`. So a voice-triggered switch_mode kept
the sentinel stale, and a subsequent menu click that wrote
`voice-mode.request="meeting"` hit the `meetingActive === want`
early-return in `applyModeRequest` — request consumed without
writing the sentinel. Sutando.app's pollVoiceMode kept reading
"active" → menu radio stuck.
(2) The startup `writeVoiceModeSentinel()` call ran BEFORE the Zoom
auto-detect, so even when Zoom was detected as in-meeting and
`meetingActive` was set to true, the sentinel had already been
written as "active" with no second write.
Fix: call `writeVoiceModeSentinel()` inside `switchModeTool.execute`,
and move the startup call to after the Zoom auto-detect block.
Verified live: kickstart → sentinel "meeting" matches in-memory state
(Zoom-running case); write voice-mode.request="meeting" → sentinel
flips, log line "External request applied: mode=meeting" fires.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the switch_mode pattern. Without this, "presenter mode on" / "the talk starts" routes to the work tool, producing "working on it" instead of silently flipping the mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #505 (dup-name guard) inadvertently removed loadSkillManifestTools() along with the personalTools spread into inlineTools. Voice-agent had no way to register highlight_slide / presenter_mode tools from skills/personal-iclr-highlight/, so the autonav cue silently no-op'd during the ICLR talk rehearsal. Restored verbatim from 9b545c2 (the original local-ships commit). Smoke- tested: voice-agent boots with "[skill-loader] loaded 3 tool(s) from iclr-highlight" — highlightSlideTool, presenterModeTool, fullscreenTool (renamed to fullscreen_presenter to dodge the dup-name guard). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…llscreen Two pre-talk hardenings observed during ICLR rehearsal: 1. recording-tools.ts: screen_record description now explicitly rejects "fullscreen" / "full screen" / "play fullscreen" cues. Chi asked to play the cross-owner video fullscreen and Sutando self-fired screen_record (06:34:46) right after fullscreen_presenter + play_video — STT or model association from the word "screen" was matching screen_record. 2. voice-agent.ts: restored FILLERS ARE NOT REQUESTS rule (originally on the unmerged 9b545c2 local-ships branch). Short utterances "hmm" / "um" / "ok" / "[BLANK_AUDIO]" are not instructions — Sutando should stay silent or ack, not call work and say "queued up". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…icator-recovered-v2
Pulls in the bodhi #14 silent-context-injection fix on Gemini reconnect. Combined with PR #515 (merged into main, now in this branch via merge), this covers both reconnect paths: - Gemini-internal reconnect: silent injection drops "Say I'm back" prompt - Client reconnect <60s: getSecondsSinceLastTurn suppresses "Welcome back" Voice-agent verified: skill-loader still fires 3 tools on boot (PID 23502). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ged)" This reverts commit f9a4be4.
…rompts Closes the gap that surfaced during talk rehearsal: voice-agent restart loses co-presenter mode anchor. Even though :7877/presenter still says active=true, the model gets the default Sutando system prompt and defaults to "Echo Act IV" generic greeting, routing slide-topic phrases to work instead of highlight_slide+narrate. Adds getPresenterStateMarker() that synchronously curls the highlight server and returns "[System: PRESENTER MODE IS CURRENTLY ACTIVE — apply the CO-PRESENTER protocol...]" when active. Appended to both greeting paths: - Fresh connect (line 484): generic greeting + presenter marker - Reconnect with history (line 453): reconnect prompt + presenter marker Failure-silent — if curl fails the marker is empty string, so non-talk sessions and missing iclr-highlight skill are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reconnect path was still emitting "Welcome back" when presenter is on, which breaks the co-presenter flow mid-talk. Reuses the existing quick-reconnect silent-reconnect hint when presenterActive is true, keyed off getPresenterStateMarker() returning non-empty. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…esenter marker Previous a284190 + 5eb09af put the presenter-state marker in the GREETING string. Gemini Live treats the greeting as a user-style turn — the model called get_core_status to verify the "claim" instead of trusting it, and answered "presenter mode is not currently active" despite :7877/presenter being active. System instructions are authoritative. This commit: 1. Converts mainAgent.instructions from a static joined string to a factory function `() => [...].join('\n')`. Each session.start() now re-evaluates the array, picking up live state. 2. Adds getPresenterStateMarker() as the FIRST slot in the array — so the system prompt opens with "[System: PRESENTER MODE IS CURRENTLY ACTIVE...]" when the iclr-highlight server says active=true. Failure-silent — if the curl fails, the marker is empty string and the slot drops out of the join cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pairs with the system_instruction presenter marker (374cebf): with the marker now authoritative + sent on every session.start, the new bodhi's silent context injection should be safe — the per-cue gating in voice- context.txt is enforced from the system_instruction, not relying on the flat history replay to anchor model behavior. Trade-off in case of regression: revert this commit, voice-agent runs on old bodhi 4d1592eb (welcome-back fires, but no concatenation risk). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 'present' Apple event succeeds but the visible window doesn't come forward when Zoom is screen-sharing — Zoom's floating control bar holds the foreground z-order. Switch to Ctrl+Cmd+F routed through 'tell process "QuickTime Player"' so System Events targets QT directly instead of going through global keystroke focus that Zoom is grabbing. Same pattern as the Chrome fullscreen fix in skills/personal-iclr-highlight/tools.ts. Validated locally — TS clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sonichi
commented
Apr 26, 2026
Owner
Author
sonichi
left a comment
There was a problem hiding this comment.
Review of PR #520 (HEAD 12e4549):
PR #520 = PR #519's 12-commit stack (already LGTM'd from the codex-sandbox angle in pullrequestreview-4175762398) + 1 new commit 12e45495 fix(open-file): QuickTime fullscreen survives Zoom screen-share.
The new commit (recording-tools.ts +16 -6):
- Replaces
present foundDocApple event withkeystroke "f" using {command down, control down}routed throughtell process "QuickTime Player". - Reason (from commit message + inline comment): during Zoom screen-share, Zoom's floating control bar holds foreground z-order — the
presentApple event succeeds but the QT window doesn't come forward. - The fix targets QuickTime directly via System Events
tell processinstead of relying on the global keystroke route Zoom is grabbing. Same pattern as the existing Chrome fullscreen fix inskills/personal-iclr-highlight/tools.ts. - 0.3s
delaybefore the keystroke gives QT time to registerfrontmost = truebefore sending the key combo.
Verification:
- CI:
tsc + tests (clean install)SUCCESS - MERGEABLE
- Pattern is precedented (Chrome fix uses identical structure)
- Only
recording-tools.tsis touched by this commit; doesn't disturb the welcome-back/presenter-state work in voice-agent.ts/main.swift
LGTM from codex-sandbox angle. Talk-day relevance: the QuickTime fullscreen + Zoom screen-share interaction is exactly what the cross-owner-healing demo video playback hits. Owner approval still needed for merge.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
presentApple event succeeds but the visible window doesn't come forward — Zoom's floating control bar holds the foreground z-order. User never sees fullscreen.tell process "QuickTime Player"so System Events targets QT directly, bypassing the global keystroke focus that Zoom is grabbing.skills/personal-iclr-highlight/tools.ts).Test plan
npx tsc --noEmit)🤖 Generated with Claude Code