Inline demo video in README by sonichi · Pull Request #3 · sonichi/sutando

sonichi · 2026-03-28T16:58:03Z

Summary

Replace YouTube thumbnail with GitHub-hosted MP4 for inline playback
Video plays directly in the README without leaving the page

Test plan

Video renders and plays on GitHub README page

🤖 Generated with Claude Code

Replace YouTube thumbnail with GitHub-hosted MP4 for inline playback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Scripts the post-merge verification after the 3-PR Gemini 3.1 compat stack lands: - bodhi fork #1 (sendClientContent → sendRealtimeInput text path) - bodhi fork #2 (Susan: sendAudio media → audio) - bodhi fork #3 (me: sendFile mimeType branching) - sutando #259 (Chi: deduplicate tool declarations + SDK bump) Checks the installed bodhi dist for each wire-format fix using awk-scoped extraction (simpler grep would false-positive on the OpenAI realtime transport's sendAudio which also uses `audio:` as a flat field, so the 3.1 gemini-specific check has to be scoped to the Gemini transport's function body). Checks sutando src for the phone conversation-server's tool dedup comment as a PR #259 marker. Checks .env for the 2.5 pin (exits green on 2.5, warns on 3.1 — so the script is safe to run repeatedly before, during, and after the rollout). Exits non-zero with "NOT READY for 3.1 rollout" if any wire-format fix is missing. On success, prints the manual next-steps checklist for unpinning .env, restarting voice-agent, and running the real 3.1 session test. Usage: bash src/verify-gemini-31.sh # check current state bash src/verify-gemini-31.sh --install # npm install + check Tested on the current state: correctly reports PR #1 applied, PR #2 + #3 + #259 not yet. Will flip to "ready" once those merge and an `npm install` pulls the updated bodhi SHA. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Isolates the second voice-agent-specific 3.1 compat issue discovered while landing the Gemini 3.1 stack today. Chi's #259 (dedup) + Susan's bodhi #2 (sendAudio) + my bodhi #3 (sendFile) all merged, npm installed, and 2.5 continued to work fine — but flipping .env to gemini-3.1-flash-live-preview still hit 1011 "exceeded your current quota" within 200ms of WebSocket setup. Misleading error text; the actual cause turned out to be the googleSearch + native-audio combo. Experiment: flipped googleSearch: true → false on an investigation branch, kept everything else identical (.env to 3.1, same dist, same tool list). Result: setup complete in 315ms, no 1011. Flipped back to googleSearch: true + 3.1 and the 1011 returned. Reproducible. So: Gemini 3.1 native audio rejects the googleSearch grounding tool entry. 2.5 silently accepted it. The rejection manifests as the same misleading "quota" close code that caught us on #259's duplicate-declaration bug earlier. Fix: env-var gate. VOICE_GOOGLE_SEARCH defaults to 'true' to preserve existing 2.5 behavior. Users unpinning VOICE_NATIVE_AUDIO_MODEL to 3.1 set VOICE_GOOGLE_SEARCH=false in .env at the same time. Comment at the declaration explains the constraint and points at today's investigation. Voice-agent LOSES google-search grounding on 3.1. Google search was used for "quick factual lookups" per the system instructions; 3.1 users will need to route those queries through the work tool (sutando-core) instead. Acceptable trade-off. Test plan: - ✓ tsc clean - ✓ Default (VOICE_GOOGLE_SEARCH unset, 2.5 pinned): setup complete, googleSearch: true as before - ✓ Investigation: VOICE_GOOGLE_SEARCH=false + 3.1 pinned → Gemini setup complete in 315ms, no 1011 - Live: user needs to test the full 3.1 session (greeting, tool call, voice goodbye, reconnect) before we can call 3.1 "ready" Follow-up: consider auto-inferring googleSearch=false when VOICE_NATIVE_AUDIO_MODEL contains "3.1" — would remove the need for users to set two env vars together. Not doing it here because the explicit config is more discoverable and easier to revert. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Scripts the post-merge verification after the 3-PR Gemini 3.1 compat stack lands: - bodhi fork #1 (sendClientContent → sendRealtimeInput text path) - bodhi fork #2 (Susan: sendAudio media → audio) - bodhi fork #3 (me: sendFile mimeType branching) - sutando #259 (Chi: deduplicate tool declarations + SDK bump) Checks the installed bodhi dist for each wire-format fix using awk-scoped extraction (simpler grep would false-positive on the OpenAI realtime transport's sendAudio which also uses `audio:` as a flat field, so the 3.1 gemini-specific check has to be scoped to the Gemini transport's function body). Checks sutando src for the phone conversation-server's tool dedup comment as a PR #259 marker. Checks .env for the 2.5 pin (exits green on 2.5, warns on 3.1 — so the script is safe to run repeatedly before, during, and after the rollout). Exits non-zero with "NOT READY for 3.1 rollout" if any wire-format fix is missing. On success, prints the manual next-steps checklist for unpinning .env, restarting voice-agent, and running the real 3.1 session test. Usage: bash src/verify-gemini-31.sh # check current state bash src/verify-gemini-31.sh --install # npm install + check Tested on the current state: correctly reports PR #1 applied, PR #2 + #3 + #259 not yet. Will flip to "ready" once those merge and an `npm install` pulls the updated bodhi SHA. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

…mpat) (#262) * voice-agent: gate googleSearch on VOICE_GOOGLE_SEARCH env var Isolates the second voice-agent-specific 3.1 compat issue discovered while landing the Gemini 3.1 stack today. Chi's #259 (dedup) + Susan's bodhi #2 (sendAudio) + my bodhi #3 (sendFile) all merged, npm installed, and 2.5 continued to work fine — but flipping .env to gemini-3.1-flash-live-preview still hit 1011 "exceeded your current quota" within 200ms of WebSocket setup. Misleading error text; the actual cause turned out to be the googleSearch + native-audio combo. Experiment: flipped googleSearch: true → false on an investigation branch, kept everything else identical (.env to 3.1, same dist, same tool list). Result: setup complete in 315ms, no 1011. Flipped back to googleSearch: true + 3.1 and the 1011 returned. Reproducible. So: Gemini 3.1 native audio rejects the googleSearch grounding tool entry. 2.5 silently accepted it. The rejection manifests as the same misleading "quota" close code that caught us on #259's duplicate-declaration bug earlier. Fix: env-var gate. VOICE_GOOGLE_SEARCH defaults to 'true' to preserve existing 2.5 behavior. Users unpinning VOICE_NATIVE_AUDIO_MODEL to 3.1 set VOICE_GOOGLE_SEARCH=false in .env at the same time. Comment at the declaration explains the constraint and points at today's investigation. Voice-agent LOSES google-search grounding on 3.1. Google search was used for "quick factual lookups" per the system instructions; 3.1 users will need to route those queries through the work tool (sutando-core) instead. Acceptable trade-off. Test plan: - ✓ tsc clean - ✓ Default (VOICE_GOOGLE_SEARCH unset, 2.5 pinned): setup complete, googleSearch: true as before - ✓ Investigation: VOICE_GOOGLE_SEARCH=false + 3.1 pinned → Gemini setup complete in 315ms, no 1011 - Live: user needs to test the full 3.1 session (greeting, tool call, voice goodbye, reconnect) before we can call 3.1 "ready" Follow-up: consider auto-inferring googleSearch=false when VOICE_NATIVE_AUDIO_MODEL contains "3.1" — would remove the need for users to set two env vars together. Not doing it here because the explicit config is more discoverable and easier to revert. 🤖 Generated with [Claude Code](https://claude.com/claude-code) * sync package-lock.json to @google/genai 1.49 from #259 PR #262 branched before #259 merged, so the branch's package-lock still had @google/genai@1.48 while package.json had 1.49 (inherited from the rebase pull). CI failed with: npm error Invalid: lock file's @google/genai@1.48.0 does not satisfy @google/genai@1.49.0 Regenerated lockfile via `npm install` (no args). No other changes. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

…261) * Add src/verify-gemini-31.sh — pre-rollout gate for Gemini 3.1 stack Scripts the post-merge verification after the 3-PR Gemini 3.1 compat stack lands: - bodhi fork #1 (sendClientContent → sendRealtimeInput text path) - bodhi fork #2 (Susan: sendAudio media → audio) - bodhi fork #3 (me: sendFile mimeType branching) - sutando #259 (Chi: deduplicate tool declarations + SDK bump) Checks the installed bodhi dist for each wire-format fix using awk-scoped extraction (simpler grep would false-positive on the OpenAI realtime transport's sendAudio which also uses `audio:` as a flat field, so the 3.1 gemini-specific check has to be scoped to the Gemini transport's function body). Checks sutando src for the phone conversation-server's tool dedup comment as a PR #259 marker. Checks .env for the 2.5 pin (exits green on 2.5, warns on 3.1 — so the script is safe to run repeatedly before, during, and after the rollout). Exits non-zero with "NOT READY for 3.1 rollout" if any wire-format fix is missing. On success, prints the manual next-steps checklist for unpinning .env, restarting voice-agent, and running the real 3.1 session test. Usage: bash src/verify-gemini-31.sh # check current state bash src/verify-gemini-31.sh --install # npm install + check Tested on the current state: correctly reports PR #1 applied, PR #2 + #3 + #259 not yet. Will flip to "ready" once those merge and an `npm install` pulls the updated bodhi SHA. 🤖 Generated with [Claude Code](https://claude.com/claude-code) * verify-gemini-31: check VOICE_GOOGLE_SEARCH pin in .env Extends the script to enforce the second 3.1 compat requirement discovered in #262: VOICE_GOOGLE_SEARCH must be 'false' when VOICE_NATIVE_AUDIO_MODEL is pinned to 3.1. Without it, voice-agent hits a misleading 1011 close on connect. Check matrix: 2.5 + default (googleSearch=true) → pass 2.5 + VOICE_GOOGLE_SEARCH=false → warn (unnecessarily losing search) 3.1 + VOICE_GOOGLE_SEARCH=false → pass 3.1 + default (googleSearch=true) → fail (1011 on connect) 3.1 + unset VOICE_GOOGLE_SEARCH → fail (defaults to true → 1011) Also updates the 'Manual next steps' tail to tell users to set BOTH env vars together when flipping to 3.1. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

… fixes) Lockfile was pinned at 26992e3 (bodhi PR #1 only — sendClientContent text path migration). Bodhi PR #2 (sendAudio media→audio) and PR #3 (sendFile mimeType branching) have been merged on the fork for some time, but sutando was still running the old dist. This caused tonight's 1007 regression: client connects, mic audio frame → `realtime_input.media_chunks is deprecated` → session dies on every reconnect. The bodhi-dist probe added in this PR catches this case automatically. Advance to 33a08c0 (bodhi PR #3 HEAD), which is the current HEAD of the fork and contains all three compat fixes.

* health-check: add bodhi-dist probe to catch stale wire-format dist The Gemini 3.1 migration's sendAudio/sendFile fixes live in the bodhi fork. If `package-lock.json` advances to a post-fix commit via `git pull` but `npm install` is not re-run, the dist on disk stays stale. voice-agent boots cleanly because sendAudio isn't exercised until a client connects, and existing probes (voice-agent port, voice-watchers log scan, voice-transport close-code scan) all pass. The regression is invisible until a real mic stream triggers 1007 "realtime_input.media_chunks is deprecated" and the session dies on every reconnect. The new probe scans node_modules/bodhi-realtime-agent/dist/index.js for the Gemini transport's sendAudio and sendFile bodies and fails if either still uses the deprecated `media: { data: ... }` shape. Uses a matched- brace extractor (_extract_body) to isolate each function body, so it ignores the OpenAI realtime transport which also defines sendAudio but with a different shape. Also fixes a leftover path in check_voice_transport from PR #251's logs/ refactor: src/voice-agent.log → logs/voice-agent.log. Repro verified: corrupting the dist to reintroduce `media: { data` in the Gemini sendAudio body makes the probe fail with the expected detail message and a clear fix instruction. * chore: bump bodhi-realtime-agent to 33a08c0 (incl. sendAudio/sendFile fixes) Lockfile was pinned at 26992e3 (bodhi PR #1 only — sendClientContent text path migration). Bodhi PR #2 (sendAudio media→audio) and PR #3 (sendFile mimeType branching) have been merged on the fork for some time, but sutando was still running the old dist. This caused tonight's 1007 regression: client connects, mic audio frame → `realtime_input.media_chunks is deprecated` → session dies on every reconnect. The bodhi-dist probe added in this PR catches this case automatically. Advance to 33a08c0 (bodhi PR #3 HEAD), which is the current HEAD of the fork and contains all three compat fixes.

* Add voice agent observability: events + tool calls + transcript Track voice agent sessions in data/voice-metrics.jsonl with the same format as phone agent's call-metrics.jsonl: - Events: session start/end, tool calls/results, user/assistant speech, errors - Tool calls: name, duration, timestamp - Transcript: user and assistant turns Writes on session end and on shutdown. Diagnosis skill updated to read both phone and voice metrics for unified analysis. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Voice observability: full utterances + 7s user timestamp shift - Remove 60-char truncation on user/assistant speech in events - Add 7s backward shift for user timestamps (measured STT pipeline lag for voice agent is ~7s vs 12s for phone agent via Twilio) - Update diagnosis skill: --metrics flag selects input file, source-aware labels (Voice Session vs Call), separate tracker HTML per source Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add call-diagnostics skill to repo Copy the diagnosis skill from local ~/.claude/skills/ into the repo so it's tracked. Includes --metrics flag for voice vs phone analysis, source-aware labels, and unified tracker HTML generation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Refactor diagnose.py: 1057 → 737 lines (30% reduction) - Extract CSS/chart JS into module constants - Consolidate repair recommendations with _make_repair helper - Compact issue detection (single-line dicts, _ts_short helper) - Use list append + join for HTML generation - Remove redundant whitespace in HTML output No change to CLI behavior or output appearance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix duplicate metrics: guard writeVoiceMetrics with metricsWritten flag onSessionEnd and shutdown() both call writeVoiceMetrics(). Flag prevents duplicate entries. Reset on session start for reconnect support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Remove 60-char truncation from phone agent event data collection Full utterances stored in call-metrics.jsonl — matches voice agent. Diagnosis HTML can truncate for display if needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Diagnosis: add -c flag for timeline context around each issue Shows ±2 surrounding events with >>> marking the issue point. Helps cross-reference issues with the observability timeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix narration drift, diagnostics false positives, get_task_status, test cleanup - Narration guardrail: "ONLY describe what you SEE" prevents Gemini from searching Google during record_screen_with_narration - Diagnostics: exclude work tool from "failed N times" check (async delegate returns 0ms by design), exclude file paths from inline-keyword match - get_task_status: use in-memory pendingTasks counter instead of scanning filesystem (task files get deleted after processing) - Remove 5 incompatible test stubs (node:test instead of vitest) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Restore test stubs removed in previous commit These test files use node:test (not vitest) and need migration, not deletion. Keeping them as placeholders. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix CI: update bodhi dependency to liususan091219 fork sonichi GitHub account is gone (404). CI fails on npm ci because it can't clone github:sonichi/bodhi_realtime_agent. Updated to github:liususan091219/bodhi_realtime_agent (same code, forked before account went down). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add voice agent observability + call diagnostics skill (#3) * Add voice agent observability: events + tool calls + transcript Track voice agent sessions in data/voice-metrics.jsonl with the same format as phone agent's call-metrics.jsonl: - Events: session start/end, tool calls/results, user/assistant speech, errors - Tool calls: name, duration, timestamp - Transcript: user and assistant turns Writes on session end and on shutdown. Diagnosis skill updated to read both phone and voice metrics for unified analysis. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Voice observability: full utterances + 7s user timestamp shift - Remove 60-char truncation on user/assistant speech in events - Add 7s backward shift for user timestamps (measured STT pipeline lag for voice agent is ~7s vs 12s for phone agent via Twilio) - Update diagnosis skill: --metrics flag selects input file, source-aware labels (Voice Session vs Call), separate tracker HTML per source Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add call-diagnostics skill to repo Copy the diagnosis skill from local ~/.claude/skills/ into the repo so it's tracked. Includes --metrics flag for voice vs phone analysis, source-aware labels, and unified tracker HTML generation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Refactor diagnose.py: 1057 → 737 lines (30% reduction) - Extract CSS/chart JS into module constants - Consolidate repair recommendations with _make_repair helper - Compact issue detection (single-line dicts, _ts_short helper) - Use list append + join for HTML generation - Remove redundant whitespace in HTML output No change to CLI behavior or output appearance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix duplicate metrics: guard writeVoiceMetrics with metricsWritten flag onSessionEnd and shutdown() both call writeVoiceMetrics(). Flag prevents duplicate entries. Reset on session start for reconnect support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Remove 60-char truncation from phone agent event data collection Full utterances stored in call-metrics.jsonl — matches voice agent. Diagnosis HTML can truncate for display if needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Diagnosis: add -c flag for timeline context around each issue Shows ±2 surrounding events with >>> marking the issue point. Helps cross-reference issues with the observability timeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix narration drift, diagnostics false positives, get_task_status, test cleanup - Narration guardrail: "ONLY describe what you SEE" prevents Gemini from searching Google during record_screen_with_narration - Diagnostics: exclude work tool from "failed N times" check (async delegate returns 0ms by design), exclude file paths from inline-keyword match - get_task_status: use in-memory pendingTasks counter instead of scanning filesystem (task files get deleted after processing) - Remove 5 incompatible test stubs (node:test instead of vitest) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Restore test stubs removed in previous commit These test files use node:test (not vitest) and need migration, not deletion. Keeping them as placeholders. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix CI: update bodhi dependency to liususan091219 fork sonichi GitHub account is gone (404). CI fails on npm ci because it can't clone github:sonichi/bodhi_realtime_agent. Updated to github:liususan091219/bodhi_realtime_agent (same code, forked before account went down). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Migrate repo references from sonichi to liususan091219 (#4) sonichi GitHub account is currently down (404). Update all references to point to liususan091219/sutando and liususan091219/bodhi_realtime_agent. This PR can be reverted once the sonichi account is back. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * startup: auto-install fswatch via Homebrew when missing Instead of just reporting "fswatch not found", startup.sh now auto-installs it via `brew install fswatch` if Homebrew is available. Falls back to the original error message if brew is not installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Susan Xueqing Liu <xliu127@stevens.edu> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Chi <wangchi@Chis-Mac-mini.hsd1.wa.comcast.net>

sonichi and others added 2 commits March 28, 2026 09:57

Add inline demo video to README

2908e8d

Replace YouTube thumbnail with GitHub-hosted MP4 for inline playback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert to YouTube thumbnail — GitHub strips <video> tags in READMEs

2427cae

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sonichi closed this Mar 28, 2026

sonichi deleted the inline-demo-video branch March 28, 2026 17:00

sonichi mentioned this pull request Apr 10, 2026

Add src/verify-gemini-31.sh — pre-rollout gate for Gemini 3.1 stack #261

Merged

2 tasks

sonichi mentioned this pull request Apr 10, 2026

voice-agent: gate googleSearch on VOICE_GOOGLE_SEARCH env var (3.1 compat) #262

Merged

4 tasks

sonichi mentioned this pull request Apr 10, 2026

health-check: bodhi-dist probe + fix voice-transport log path #264

Merged

4 tasks

sonichi mentioned this pull request Apr 15, 2026

fix: replace all liususan091219/sutando references with sonichi/sutando #341

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inline demo video in README#3

Inline demo video in README#3
sonichi wants to merge 2 commits intomainfrom
inline-demo-video

sonichi commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sonichi commented Mar 28, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant