Closed
Conversation
Replace YouTube thumbnail with GitHub-hosted MP4 for inline playback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sonichi
added a commit
that referenced
this pull request
Apr 10, 2026
Scripts the post-merge verification after the 3-PR Gemini 3.1 compat stack lands: - bodhi fork #1 (sendClientContent → sendRealtimeInput text path) - bodhi fork #2 (Susan: sendAudio media → audio) - bodhi fork #3 (me: sendFile mimeType branching) - sutando #259 (Chi: deduplicate tool declarations + SDK bump) Checks the installed bodhi dist for each wire-format fix using awk-scoped extraction (simpler grep would false-positive on the OpenAI realtime transport's sendAudio which also uses `audio:` as a flat field, so the 3.1 gemini-specific check has to be scoped to the Gemini transport's function body). Checks sutando src for the phone conversation-server's tool dedup comment as a PR #259 marker. Checks .env for the 2.5 pin (exits green on 2.5, warns on 3.1 — so the script is safe to run repeatedly before, during, and after the rollout). Exits non-zero with "NOT READY for 3.1 rollout" if any wire-format fix is missing. On success, prints the manual next-steps checklist for unpinning .env, restarting voice-agent, and running the real 3.1 session test. Usage: bash src/verify-gemini-31.sh # check current state bash src/verify-gemini-31.sh --install # npm install + check Tested on the current state: correctly reports PR #1 applied, PR #2 + #3 + #259 not yet. Will flip to "ready" once those merge and an `npm install` pulls the updated bodhi SHA. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
2 tasks
sonichi
added a commit
that referenced
this pull request
Apr 10, 2026
Isolates the second voice-agent-specific 3.1 compat issue discovered while landing the Gemini 3.1 stack today. Chi's #259 (dedup) + Susan's bodhi #2 (sendAudio) + my bodhi #3 (sendFile) all merged, npm installed, and 2.5 continued to work fine — but flipping .env to gemini-3.1-flash-live-preview still hit 1011 "exceeded your current quota" within 200ms of WebSocket setup. Misleading error text; the actual cause turned out to be the googleSearch + native-audio combo. Experiment: flipped googleSearch: true → false on an investigation branch, kept everything else identical (.env to 3.1, same dist, same tool list). Result: setup complete in 315ms, no 1011. Flipped back to googleSearch: true + 3.1 and the 1011 returned. Reproducible. So: Gemini 3.1 native audio rejects the googleSearch grounding tool entry. 2.5 silently accepted it. The rejection manifests as the same misleading "quota" close code that caught us on #259's duplicate-declaration bug earlier. Fix: env-var gate. VOICE_GOOGLE_SEARCH defaults to 'true' to preserve existing 2.5 behavior. Users unpinning VOICE_NATIVE_AUDIO_MODEL to 3.1 set VOICE_GOOGLE_SEARCH=false in .env at the same time. Comment at the declaration explains the constraint and points at today's investigation. Voice-agent LOSES google-search grounding on 3.1. Google search was used for "quick factual lookups" per the system instructions; 3.1 users will need to route those queries through the work tool (sutando-core) instead. Acceptable trade-off. Test plan: - ✓ tsc clean - ✓ Default (VOICE_GOOGLE_SEARCH unset, 2.5 pinned): setup complete, googleSearch: true as before - ✓ Investigation: VOICE_GOOGLE_SEARCH=false + 3.1 pinned → Gemini setup complete in 315ms, no 1011 - Live: user needs to test the full 3.1 session (greeting, tool call, voice goodbye, reconnect) before we can call 3.1 "ready" Follow-up: consider auto-inferring googleSearch=false when VOICE_NATIVE_AUDIO_MODEL contains "3.1" — would remove the need for users to set two env vars together. Not doing it here because the explicit config is more discoverable and easier to revert. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
4 tasks
sonichi
added a commit
that referenced
this pull request
Apr 10, 2026
Scripts the post-merge verification after the 3-PR Gemini 3.1 compat stack lands: - bodhi fork #1 (sendClientContent → sendRealtimeInput text path) - bodhi fork #2 (Susan: sendAudio media → audio) - bodhi fork #3 (me: sendFile mimeType branching) - sutando #259 (Chi: deduplicate tool declarations + SDK bump) Checks the installed bodhi dist for each wire-format fix using awk-scoped extraction (simpler grep would false-positive on the OpenAI realtime transport's sendAudio which also uses `audio:` as a flat field, so the 3.1 gemini-specific check has to be scoped to the Gemini transport's function body). Checks sutando src for the phone conversation-server's tool dedup comment as a PR #259 marker. Checks .env for the 2.5 pin (exits green on 2.5, warns on 3.1 — so the script is safe to run repeatedly before, during, and after the rollout). Exits non-zero with "NOT READY for 3.1 rollout" if any wire-format fix is missing. On success, prints the manual next-steps checklist for unpinning .env, restarting voice-agent, and running the real 3.1 session test. Usage: bash src/verify-gemini-31.sh # check current state bash src/verify-gemini-31.sh --install # npm install + check Tested on the current state: correctly reports PR #1 applied, PR #2 + #3 + #259 not yet. Will flip to "ready" once those merge and an `npm install` pulls the updated bodhi SHA. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
sonichi
added a commit
that referenced
this pull request
Apr 10, 2026
…mpat) (#262) * voice-agent: gate googleSearch on VOICE_GOOGLE_SEARCH env var Isolates the second voice-agent-specific 3.1 compat issue discovered while landing the Gemini 3.1 stack today. Chi's #259 (dedup) + Susan's bodhi #2 (sendAudio) + my bodhi #3 (sendFile) all merged, npm installed, and 2.5 continued to work fine — but flipping .env to gemini-3.1-flash-live-preview still hit 1011 "exceeded your current quota" within 200ms of WebSocket setup. Misleading error text; the actual cause turned out to be the googleSearch + native-audio combo. Experiment: flipped googleSearch: true → false on an investigation branch, kept everything else identical (.env to 3.1, same dist, same tool list). Result: setup complete in 315ms, no 1011. Flipped back to googleSearch: true + 3.1 and the 1011 returned. Reproducible. So: Gemini 3.1 native audio rejects the googleSearch grounding tool entry. 2.5 silently accepted it. The rejection manifests as the same misleading "quota" close code that caught us on #259's duplicate-declaration bug earlier. Fix: env-var gate. VOICE_GOOGLE_SEARCH defaults to 'true' to preserve existing 2.5 behavior. Users unpinning VOICE_NATIVE_AUDIO_MODEL to 3.1 set VOICE_GOOGLE_SEARCH=false in .env at the same time. Comment at the declaration explains the constraint and points at today's investigation. Voice-agent LOSES google-search grounding on 3.1. Google search was used for "quick factual lookups" per the system instructions; 3.1 users will need to route those queries through the work tool (sutando-core) instead. Acceptable trade-off. Test plan: - ✓ tsc clean - ✓ Default (VOICE_GOOGLE_SEARCH unset, 2.5 pinned): setup complete, googleSearch: true as before - ✓ Investigation: VOICE_GOOGLE_SEARCH=false + 3.1 pinned → Gemini setup complete in 315ms, no 1011 - Live: user needs to test the full 3.1 session (greeting, tool call, voice goodbye, reconnect) before we can call 3.1 "ready" Follow-up: consider auto-inferring googleSearch=false when VOICE_NATIVE_AUDIO_MODEL contains "3.1" — would remove the need for users to set two env vars together. Not doing it here because the explicit config is more discoverable and easier to revert. 🤖 Generated with [Claude Code](https://claude.com/claude-code) * sync package-lock.json to @google/genai 1.49 from #259 PR #262 branched before #259 merged, so the branch's package-lock still had @google/genai@1.48 while package.json had 1.49 (inherited from the rebase pull). CI failed with: npm error Invalid: lock file's @google/genai@1.48.0 does not satisfy @google/genai@1.49.0 Regenerated lockfile via `npm install` (no args). No other changes. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
sonichi
added a commit
that referenced
this pull request
Apr 10, 2026
…261) * Add src/verify-gemini-31.sh — pre-rollout gate for Gemini 3.1 stack Scripts the post-merge verification after the 3-PR Gemini 3.1 compat stack lands: - bodhi fork #1 (sendClientContent → sendRealtimeInput text path) - bodhi fork #2 (Susan: sendAudio media → audio) - bodhi fork #3 (me: sendFile mimeType branching) - sutando #259 (Chi: deduplicate tool declarations + SDK bump) Checks the installed bodhi dist for each wire-format fix using awk-scoped extraction (simpler grep would false-positive on the OpenAI realtime transport's sendAudio which also uses `audio:` as a flat field, so the 3.1 gemini-specific check has to be scoped to the Gemini transport's function body). Checks sutando src for the phone conversation-server's tool dedup comment as a PR #259 marker. Checks .env for the 2.5 pin (exits green on 2.5, warns on 3.1 — so the script is safe to run repeatedly before, during, and after the rollout). Exits non-zero with "NOT READY for 3.1 rollout" if any wire-format fix is missing. On success, prints the manual next-steps checklist for unpinning .env, restarting voice-agent, and running the real 3.1 session test. Usage: bash src/verify-gemini-31.sh # check current state bash src/verify-gemini-31.sh --install # npm install + check Tested on the current state: correctly reports PR #1 applied, PR #2 + #3 + #259 not yet. Will flip to "ready" once those merge and an `npm install` pulls the updated bodhi SHA. 🤖 Generated with [Claude Code](https://claude.com/claude-code) * verify-gemini-31: check VOICE_GOOGLE_SEARCH pin in .env Extends the script to enforce the second 3.1 compat requirement discovered in #262: VOICE_GOOGLE_SEARCH must be 'false' when VOICE_NATIVE_AUDIO_MODEL is pinned to 3.1. Without it, voice-agent hits a misleading 1011 close on connect. Check matrix: 2.5 + default (googleSearch=true) → pass 2.5 + VOICE_GOOGLE_SEARCH=false → warn (unnecessarily losing search) 3.1 + VOICE_GOOGLE_SEARCH=false → pass 3.1 + default (googleSearch=true) → fail (1011 on connect) 3.1 + unset VOICE_GOOGLE_SEARCH → fail (defaults to true → 1011) Also updates the 'Manual next steps' tail to tell users to set BOTH env vars together when flipping to 3.1. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
4 tasks
sonichi
added a commit
that referenced
this pull request
Apr 10, 2026
… fixes) Lockfile was pinned at 26992e3 (bodhi PR #1 only — sendClientContent text path migration). Bodhi PR #2 (sendAudio media→audio) and PR #3 (sendFile mimeType branching) have been merged on the fork for some time, but sutando was still running the old dist. This caused tonight's 1007 regression: client connects, mic audio frame → `realtime_input.media_chunks is deprecated` → session dies on every reconnect. The bodhi-dist probe added in this PR catches this case automatically. Advance to 33a08c0 (bodhi PR #3 HEAD), which is the current HEAD of the fork and contains all three compat fixes.
sonichi
added a commit
that referenced
this pull request
Apr 10, 2026
* health-check: add bodhi-dist probe to catch stale wire-format dist
The Gemini 3.1 migration's sendAudio/sendFile fixes live in the bodhi
fork. If `package-lock.json` advances to a post-fix commit via `git pull`
but `npm install` is not re-run, the dist on disk stays stale. voice-agent
boots cleanly because sendAudio isn't exercised until a client connects,
and existing probes (voice-agent port, voice-watchers log scan,
voice-transport close-code scan) all pass. The regression is invisible
until a real mic stream triggers 1007 "realtime_input.media_chunks is
deprecated" and the session dies on every reconnect.
The new probe scans node_modules/bodhi-realtime-agent/dist/index.js for
the Gemini transport's sendAudio and sendFile bodies and fails if either
still uses the deprecated `media: { data: ... }` shape. Uses a matched-
brace extractor (_extract_body) to isolate each function body, so it
ignores the OpenAI realtime transport which also defines sendAudio but
with a different shape.
Also fixes a leftover path in check_voice_transport from PR #251's
logs/ refactor: src/voice-agent.log → logs/voice-agent.log.
Repro verified: corrupting the dist to reintroduce `media: { data` in
the Gemini sendAudio body makes the probe fail with the expected detail
message and a clear fix instruction.
* chore: bump bodhi-realtime-agent to 33a08c0 (incl. sendAudio/sendFile fixes)
Lockfile was pinned at 26992e3 (bodhi PR #1 only — sendClientContent
text path migration). Bodhi PR #2 (sendAudio media→audio) and PR #3
(sendFile mimeType branching) have been merged on the fork for some
time, but sutando was still running the old dist.
This caused tonight's 1007 regression: client connects, mic audio
frame → `realtime_input.media_chunks is deprecated` → session dies on
every reconnect. The bodhi-dist probe added in this PR catches this
case automatically.
Advance to 33a08c0 (bodhi PR #3 HEAD), which is the current HEAD of
the fork and contains all three compat fixes.
sonichi
pushed a commit
that referenced
this pull request
Apr 11, 2026
* Add voice agent observability: events + tool calls + transcript Track voice agent sessions in data/voice-metrics.jsonl with the same format as phone agent's call-metrics.jsonl: - Events: session start/end, tool calls/results, user/assistant speech, errors - Tool calls: name, duration, timestamp - Transcript: user and assistant turns Writes on session end and on shutdown. Diagnosis skill updated to read both phone and voice metrics for unified analysis. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Voice observability: full utterances + 7s user timestamp shift - Remove 60-char truncation on user/assistant speech in events - Add 7s backward shift for user timestamps (measured STT pipeline lag for voice agent is ~7s vs 12s for phone agent via Twilio) - Update diagnosis skill: --metrics flag selects input file, source-aware labels (Voice Session vs Call), separate tracker HTML per source Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add call-diagnostics skill to repo Copy the diagnosis skill from local ~/.claude/skills/ into the repo so it's tracked. Includes --metrics flag for voice vs phone analysis, source-aware labels, and unified tracker HTML generation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Refactor diagnose.py: 1057 → 737 lines (30% reduction) - Extract CSS/chart JS into module constants - Consolidate repair recommendations with _make_repair helper - Compact issue detection (single-line dicts, _ts_short helper) - Use list append + join for HTML generation - Remove redundant whitespace in HTML output No change to CLI behavior or output appearance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix duplicate metrics: guard writeVoiceMetrics with metricsWritten flag onSessionEnd and shutdown() both call writeVoiceMetrics(). Flag prevents duplicate entries. Reset on session start for reconnect support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Remove 60-char truncation from phone agent event data collection Full utterances stored in call-metrics.jsonl — matches voice agent. Diagnosis HTML can truncate for display if needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Diagnosis: add -c flag for timeline context around each issue Shows ±2 surrounding events with >>> marking the issue point. Helps cross-reference issues with the observability timeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix narration drift, diagnostics false positives, get_task_status, test cleanup - Narration guardrail: "ONLY describe what you SEE" prevents Gemini from searching Google during record_screen_with_narration - Diagnostics: exclude work tool from "failed N times" check (async delegate returns 0ms by design), exclude file paths from inline-keyword match - get_task_status: use in-memory pendingTasks counter instead of scanning filesystem (task files get deleted after processing) - Remove 5 incompatible test stubs (node:test instead of vitest) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Restore test stubs removed in previous commit These test files use node:test (not vitest) and need migration, not deletion. Keeping them as placeholders. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix CI: update bodhi dependency to liususan091219 fork sonichi GitHub account is gone (404). CI fails on npm ci because it can't clone github:sonichi/bodhi_realtime_agent. Updated to github:liususan091219/bodhi_realtime_agent (same code, forked before account went down). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sonichi
added a commit
that referenced
this pull request
Apr 11, 2026
* Add voice agent observability + call diagnostics skill (#3) * Add voice agent observability: events + tool calls + transcript Track voice agent sessions in data/voice-metrics.jsonl with the same format as phone agent's call-metrics.jsonl: - Events: session start/end, tool calls/results, user/assistant speech, errors - Tool calls: name, duration, timestamp - Transcript: user and assistant turns Writes on session end and on shutdown. Diagnosis skill updated to read both phone and voice metrics for unified analysis. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Voice observability: full utterances + 7s user timestamp shift - Remove 60-char truncation on user/assistant speech in events - Add 7s backward shift for user timestamps (measured STT pipeline lag for voice agent is ~7s vs 12s for phone agent via Twilio) - Update diagnosis skill: --metrics flag selects input file, source-aware labels (Voice Session vs Call), separate tracker HTML per source Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add call-diagnostics skill to repo Copy the diagnosis skill from local ~/.claude/skills/ into the repo so it's tracked. Includes --metrics flag for voice vs phone analysis, source-aware labels, and unified tracker HTML generation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Refactor diagnose.py: 1057 → 737 lines (30% reduction) - Extract CSS/chart JS into module constants - Consolidate repair recommendations with _make_repair helper - Compact issue detection (single-line dicts, _ts_short helper) - Use list append + join for HTML generation - Remove redundant whitespace in HTML output No change to CLI behavior or output appearance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix duplicate metrics: guard writeVoiceMetrics with metricsWritten flag onSessionEnd and shutdown() both call writeVoiceMetrics(). Flag prevents duplicate entries. Reset on session start for reconnect support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Remove 60-char truncation from phone agent event data collection Full utterances stored in call-metrics.jsonl — matches voice agent. Diagnosis HTML can truncate for display if needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Diagnosis: add -c flag for timeline context around each issue Shows ±2 surrounding events with >>> marking the issue point. Helps cross-reference issues with the observability timeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix narration drift, diagnostics false positives, get_task_status, test cleanup - Narration guardrail: "ONLY describe what you SEE" prevents Gemini from searching Google during record_screen_with_narration - Diagnostics: exclude work tool from "failed N times" check (async delegate returns 0ms by design), exclude file paths from inline-keyword match - get_task_status: use in-memory pendingTasks counter instead of scanning filesystem (task files get deleted after processing) - Remove 5 incompatible test stubs (node:test instead of vitest) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Restore test stubs removed in previous commit These test files use node:test (not vitest) and need migration, not deletion. Keeping them as placeholders. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix CI: update bodhi dependency to liususan091219 fork sonichi GitHub account is gone (404). CI fails on npm ci because it can't clone github:sonichi/bodhi_realtime_agent. Updated to github:liususan091219/bodhi_realtime_agent (same code, forked before account went down). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Migrate repo references from sonichi to liususan091219 (#4) sonichi GitHub account is currently down (404). Update all references to point to liususan091219/sutando and liususan091219/bodhi_realtime_agent. This PR can be reverted once the sonichi account is back. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * startup: auto-install fswatch via Homebrew when missing Instead of just reporting "fswatch not found", startup.sh now auto-installs it via `brew install fswatch` if Homebrew is available. Falls back to the original error message if brew is not installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Susan Xueqing Liu <xliu127@stevens.edu> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Chi <wangchi@Chis-Mac-mini.hsd1.wa.comcast.net>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan
🤖 Generated with Claude Code