Skip to content

Inline demo video in README#3

Closed
sonichi wants to merge 2 commits intomainfrom
inline-demo-video
Closed

Inline demo video in README#3
sonichi wants to merge 2 commits intomainfrom
inline-demo-video

Conversation

@sonichi
Copy link
Copy Markdown
Owner

@sonichi sonichi commented Mar 28, 2026

Summary

  • Replace YouTube thumbnail with GitHub-hosted MP4 for inline playback
  • Video plays directly in the README without leaving the page

Test plan

  • Video renders and plays on GitHub README page

🤖 Generated with Claude Code

sonichi and others added 2 commits March 28, 2026 09:57
Replace YouTube thumbnail with GitHub-hosted MP4 for inline playback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sonichi sonichi closed this Mar 28, 2026
@sonichi sonichi deleted the inline-demo-video branch March 28, 2026 17:00
sonichi added a commit that referenced this pull request Apr 10, 2026
Scripts the post-merge verification after the 3-PR Gemini 3.1 compat
stack lands:
  - bodhi fork #1 (sendClientContent → sendRealtimeInput text path)
  - bodhi fork #2 (Susan: sendAudio media → audio)
  - bodhi fork #3 (me: sendFile mimeType branching)
  - sutando #259 (Chi: deduplicate tool declarations + SDK bump)

Checks the installed bodhi dist for each wire-format fix using
awk-scoped extraction (simpler grep would false-positive on the
OpenAI realtime transport's sendAudio which also uses `audio:` as a
flat field, so the 3.1 gemini-specific check has to be scoped to the
Gemini transport's function body).

Checks sutando src for the phone conversation-server's tool dedup
comment as a PR #259 marker.

Checks .env for the 2.5 pin (exits green on 2.5, warns on 3.1 — so
the script is safe to run repeatedly before, during, and after the
rollout).

Exits non-zero with "NOT READY for 3.1 rollout" if any wire-format
fix is missing. On success, prints the manual next-steps checklist
for unpinning .env, restarting voice-agent, and running the real
3.1 session test.

Usage:
  bash src/verify-gemini-31.sh              # check current state
  bash src/verify-gemini-31.sh --install    # npm install + check

Tested on the current state: correctly reports PR #1 applied,
PR #2 + #3 + #259 not yet. Will flip to "ready" once those merge
and an `npm install` pulls the updated bodhi SHA.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
sonichi added a commit that referenced this pull request Apr 10, 2026
Isolates the second voice-agent-specific 3.1 compat issue discovered
while landing the Gemini 3.1 stack today. Chi's #259 (dedup) +
Susan's bodhi #2 (sendAudio) + my bodhi #3 (sendFile) all merged,
npm installed, and 2.5 continued to work fine — but flipping .env to
gemini-3.1-flash-live-preview still hit 1011 "exceeded your current
quota" within 200ms of WebSocket setup. Misleading error text; the
actual cause turned out to be the googleSearch + native-audio combo.

Experiment: flipped googleSearch: true → false on an investigation
branch, kept everything else identical (.env to 3.1, same dist,
same tool list). Result: setup complete in 315ms, no 1011. Flipped
back to googleSearch: true + 3.1 and the 1011 returned. Reproducible.

So: Gemini 3.1 native audio rejects the googleSearch grounding
tool entry. 2.5 silently accepted it. The rejection manifests as
the same misleading "quota" close code that caught us on #259's
duplicate-declaration bug earlier.

Fix: env-var gate. VOICE_GOOGLE_SEARCH defaults to 'true' to
preserve existing 2.5 behavior. Users unpinning VOICE_NATIVE_AUDIO_MODEL
to 3.1 set VOICE_GOOGLE_SEARCH=false in .env at the same time.
Comment at the declaration explains the constraint and points at
today's investigation.

Voice-agent LOSES google-search grounding on 3.1. Google search was
used for "quick factual lookups" per the system instructions; 3.1
users will need to route those queries through the work tool
(sutando-core) instead. Acceptable trade-off.

Test plan:
- ✓ tsc clean
- ✓ Default (VOICE_GOOGLE_SEARCH unset, 2.5 pinned): setup complete,
  googleSearch: true as before
- ✓ Investigation: VOICE_GOOGLE_SEARCH=false + 3.1 pinned → Gemini
  setup complete in 315ms, no 1011
- Live: user needs to test the full 3.1 session (greeting, tool
  call, voice goodbye, reconnect) before we can call 3.1 "ready"

Follow-up: consider auto-inferring googleSearch=false when
VOICE_NATIVE_AUDIO_MODEL contains "3.1" — would remove the need
for users to set two env vars together. Not doing it here because
the explicit config is more discoverable and easier to revert.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
sonichi added a commit that referenced this pull request Apr 10, 2026
Scripts the post-merge verification after the 3-PR Gemini 3.1 compat
stack lands:
  - bodhi fork #1 (sendClientContent → sendRealtimeInput text path)
  - bodhi fork #2 (Susan: sendAudio media → audio)
  - bodhi fork #3 (me: sendFile mimeType branching)
  - sutando #259 (Chi: deduplicate tool declarations + SDK bump)

Checks the installed bodhi dist for each wire-format fix using
awk-scoped extraction (simpler grep would false-positive on the
OpenAI realtime transport's sendAudio which also uses `audio:` as a
flat field, so the 3.1 gemini-specific check has to be scoped to the
Gemini transport's function body).

Checks sutando src for the phone conversation-server's tool dedup
comment as a PR #259 marker.

Checks .env for the 2.5 pin (exits green on 2.5, warns on 3.1 — so
the script is safe to run repeatedly before, during, and after the
rollout).

Exits non-zero with "NOT READY for 3.1 rollout" if any wire-format
fix is missing. On success, prints the manual next-steps checklist
for unpinning .env, restarting voice-agent, and running the real
3.1 session test.

Usage:
  bash src/verify-gemini-31.sh              # check current state
  bash src/verify-gemini-31.sh --install    # npm install + check

Tested on the current state: correctly reports PR #1 applied,
PR #2 + #3 + #259 not yet. Will flip to "ready" once those merge
and an `npm install` pulls the updated bodhi SHA.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
sonichi added a commit that referenced this pull request Apr 10, 2026
…mpat) (#262)

* voice-agent: gate googleSearch on VOICE_GOOGLE_SEARCH env var

Isolates the second voice-agent-specific 3.1 compat issue discovered
while landing the Gemini 3.1 stack today. Chi's #259 (dedup) +
Susan's bodhi #2 (sendAudio) + my bodhi #3 (sendFile) all merged,
npm installed, and 2.5 continued to work fine — but flipping .env to
gemini-3.1-flash-live-preview still hit 1011 "exceeded your current
quota" within 200ms of WebSocket setup. Misleading error text; the
actual cause turned out to be the googleSearch + native-audio combo.

Experiment: flipped googleSearch: true → false on an investigation
branch, kept everything else identical (.env to 3.1, same dist,
same tool list). Result: setup complete in 315ms, no 1011. Flipped
back to googleSearch: true + 3.1 and the 1011 returned. Reproducible.

So: Gemini 3.1 native audio rejects the googleSearch grounding
tool entry. 2.5 silently accepted it. The rejection manifests as
the same misleading "quota" close code that caught us on #259's
duplicate-declaration bug earlier.

Fix: env-var gate. VOICE_GOOGLE_SEARCH defaults to 'true' to
preserve existing 2.5 behavior. Users unpinning VOICE_NATIVE_AUDIO_MODEL
to 3.1 set VOICE_GOOGLE_SEARCH=false in .env at the same time.
Comment at the declaration explains the constraint and points at
today's investigation.

Voice-agent LOSES google-search grounding on 3.1. Google search was
used for "quick factual lookups" per the system instructions; 3.1
users will need to route those queries through the work tool
(sutando-core) instead. Acceptable trade-off.

Test plan:
- ✓ tsc clean
- ✓ Default (VOICE_GOOGLE_SEARCH unset, 2.5 pinned): setup complete,
  googleSearch: true as before
- ✓ Investigation: VOICE_GOOGLE_SEARCH=false + 3.1 pinned → Gemini
  setup complete in 315ms, no 1011
- Live: user needs to test the full 3.1 session (greeting, tool
  call, voice goodbye, reconnect) before we can call 3.1 "ready"

Follow-up: consider auto-inferring googleSearch=false when
VOICE_NATIVE_AUDIO_MODEL contains "3.1" — would remove the need
for users to set two env vars together. Not doing it here because
the explicit config is more discoverable and easier to revert.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

* sync package-lock.json to @google/genai 1.49 from #259

PR #262 branched before #259 merged, so the branch's package-lock
still had @google/genai@1.48 while package.json had 1.49 (inherited
from the rebase pull). CI failed with:

  npm error Invalid: lock file's @google/genai@1.48.0 does not
  satisfy @google/genai@1.49.0

Regenerated lockfile via `npm install` (no args). No other changes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
sonichi added a commit that referenced this pull request Apr 10, 2026
…261)

* Add src/verify-gemini-31.sh — pre-rollout gate for Gemini 3.1 stack

Scripts the post-merge verification after the 3-PR Gemini 3.1 compat
stack lands:
  - bodhi fork #1 (sendClientContent → sendRealtimeInput text path)
  - bodhi fork #2 (Susan: sendAudio media → audio)
  - bodhi fork #3 (me: sendFile mimeType branching)
  - sutando #259 (Chi: deduplicate tool declarations + SDK bump)

Checks the installed bodhi dist for each wire-format fix using
awk-scoped extraction (simpler grep would false-positive on the
OpenAI realtime transport's sendAudio which also uses `audio:` as a
flat field, so the 3.1 gemini-specific check has to be scoped to the
Gemini transport's function body).

Checks sutando src for the phone conversation-server's tool dedup
comment as a PR #259 marker.

Checks .env for the 2.5 pin (exits green on 2.5, warns on 3.1 — so
the script is safe to run repeatedly before, during, and after the
rollout).

Exits non-zero with "NOT READY for 3.1 rollout" if any wire-format
fix is missing. On success, prints the manual next-steps checklist
for unpinning .env, restarting voice-agent, and running the real
3.1 session test.

Usage:
  bash src/verify-gemini-31.sh              # check current state
  bash src/verify-gemini-31.sh --install    # npm install + check

Tested on the current state: correctly reports PR #1 applied,
PR #2 + #3 + #259 not yet. Will flip to "ready" once those merge
and an `npm install` pulls the updated bodhi SHA.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

* verify-gemini-31: check VOICE_GOOGLE_SEARCH pin in .env

Extends the script to enforce the second 3.1 compat requirement
discovered in #262: VOICE_GOOGLE_SEARCH must be 'false' when
VOICE_NATIVE_AUDIO_MODEL is pinned to 3.1. Without it, voice-agent
hits a misleading 1011 close on connect.

Check matrix:
  2.5 + default (googleSearch=true)   → pass
  2.5 + VOICE_GOOGLE_SEARCH=false     → warn (unnecessarily losing search)
  3.1 + VOICE_GOOGLE_SEARCH=false     → pass
  3.1 + default (googleSearch=true)   → fail (1011 on connect)
  3.1 + unset VOICE_GOOGLE_SEARCH     → fail (defaults to true → 1011)

Also updates the 'Manual next steps' tail to tell users to set BOTH
env vars together when flipping to 3.1.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
sonichi added a commit that referenced this pull request Apr 10, 2026
… fixes)

Lockfile was pinned at 26992e3 (bodhi PR #1 only — sendClientContent
text path migration). Bodhi PR #2 (sendAudio media→audio) and PR #3
(sendFile mimeType branching) have been merged on the fork for some
time, but sutando was still running the old dist.

This caused tonight's 1007 regression: client connects, mic audio
frame → `realtime_input.media_chunks is deprecated` → session dies on
every reconnect. The bodhi-dist probe added in this PR catches this
case automatically.

Advance to 33a08c0 (bodhi PR #3 HEAD), which is the current HEAD of
the fork and contains all three compat fixes.
sonichi added a commit that referenced this pull request Apr 10, 2026
* health-check: add bodhi-dist probe to catch stale wire-format dist

The Gemini 3.1 migration's sendAudio/sendFile fixes live in the bodhi
fork. If `package-lock.json` advances to a post-fix commit via `git pull`
but `npm install` is not re-run, the dist on disk stays stale. voice-agent
boots cleanly because sendAudio isn't exercised until a client connects,
and existing probes (voice-agent port, voice-watchers log scan,
voice-transport close-code scan) all pass. The regression is invisible
until a real mic stream triggers 1007 "realtime_input.media_chunks is
deprecated" and the session dies on every reconnect.

The new probe scans node_modules/bodhi-realtime-agent/dist/index.js for
the Gemini transport's sendAudio and sendFile bodies and fails if either
still uses the deprecated `media: { data: ... }` shape. Uses a matched-
brace extractor (_extract_body) to isolate each function body, so it
ignores the OpenAI realtime transport which also defines sendAudio but
with a different shape.

Also fixes a leftover path in check_voice_transport from PR #251's
logs/ refactor: src/voice-agent.log → logs/voice-agent.log.

Repro verified: corrupting the dist to reintroduce `media: { data` in
the Gemini sendAudio body makes the probe fail with the expected detail
message and a clear fix instruction.

* chore: bump bodhi-realtime-agent to 33a08c0 (incl. sendAudio/sendFile fixes)

Lockfile was pinned at 26992e3 (bodhi PR #1 only — sendClientContent
text path migration). Bodhi PR #2 (sendAudio media→audio) and PR #3
(sendFile mimeType branching) have been merged on the fork for some
time, but sutando was still running the old dist.

This caused tonight's 1007 regression: client connects, mic audio
frame → `realtime_input.media_chunks is deprecated` → session dies on
every reconnect. The bodhi-dist probe added in this PR catches this
case automatically.

Advance to 33a08c0 (bodhi PR #3 HEAD), which is the current HEAD of
the fork and contains all three compat fixes.
sonichi pushed a commit that referenced this pull request Apr 11, 2026
* Add voice agent observability: events + tool calls + transcript

Track voice agent sessions in data/voice-metrics.jsonl with the same
format as phone agent's call-metrics.jsonl:
- Events: session start/end, tool calls/results, user/assistant speech, errors
- Tool calls: name, duration, timestamp
- Transcript: user and assistant turns

Writes on session end and on shutdown. Diagnosis skill updated to
read both phone and voice metrics for unified analysis.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Voice observability: full utterances + 7s user timestamp shift

- Remove 60-char truncation on user/assistant speech in events
- Add 7s backward shift for user timestamps (measured STT pipeline lag
  for voice agent is ~7s vs 12s for phone agent via Twilio)
- Update diagnosis skill: --metrics flag selects input file, source-aware
  labels (Voice Session vs Call), separate tracker HTML per source

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add call-diagnostics skill to repo

Copy the diagnosis skill from local ~/.claude/skills/ into the repo
so it's tracked. Includes --metrics flag for voice vs phone analysis,
source-aware labels, and unified tracker HTML generation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Refactor diagnose.py: 1057 → 737 lines (30% reduction)

- Extract CSS/chart JS into module constants
- Consolidate repair recommendations with _make_repair helper
- Compact issue detection (single-line dicts, _ts_short helper)
- Use list append + join for HTML generation
- Remove redundant whitespace in HTML output

No change to CLI behavior or output appearance.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix duplicate metrics: guard writeVoiceMetrics with metricsWritten flag

onSessionEnd and shutdown() both call writeVoiceMetrics(). Flag prevents
duplicate entries. Reset on session start for reconnect support.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Remove 60-char truncation from phone agent event data collection

Full utterances stored in call-metrics.jsonl — matches voice agent.
Diagnosis HTML can truncate for display if needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Diagnosis: add -c flag for timeline context around each issue

Shows ±2 surrounding events with >>> marking the issue point.
Helps cross-reference issues with the observability timeline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix narration drift, diagnostics false positives, get_task_status, test cleanup

- Narration guardrail: "ONLY describe what you SEE" prevents Gemini from
  searching Google during record_screen_with_narration
- Diagnostics: exclude work tool from "failed N times" check (async delegate
  returns 0ms by design), exclude file paths from inline-keyword match
- get_task_status: use in-memory pendingTasks counter instead of scanning
  filesystem (task files get deleted after processing)
- Remove 5 incompatible test stubs (node:test instead of vitest)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Restore test stubs removed in previous commit

These test files use node:test (not vitest) and need migration,
not deletion. Keeping them as placeholders.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix CI: update bodhi dependency to liususan091219 fork

sonichi GitHub account is gone (404). CI fails on npm ci because
it can't clone github:sonichi/bodhi_realtime_agent. Updated to
github:liususan091219/bodhi_realtime_agent (same code, forked
before account went down).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sonichi added a commit that referenced this pull request Apr 11, 2026
* Add voice agent observability + call diagnostics skill (#3)

* Add voice agent observability: events + tool calls + transcript

Track voice agent sessions in data/voice-metrics.jsonl with the same
format as phone agent's call-metrics.jsonl:
- Events: session start/end, tool calls/results, user/assistant speech, errors
- Tool calls: name, duration, timestamp
- Transcript: user and assistant turns

Writes on session end and on shutdown. Diagnosis skill updated to
read both phone and voice metrics for unified analysis.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Voice observability: full utterances + 7s user timestamp shift

- Remove 60-char truncation on user/assistant speech in events
- Add 7s backward shift for user timestamps (measured STT pipeline lag
  for voice agent is ~7s vs 12s for phone agent via Twilio)
- Update diagnosis skill: --metrics flag selects input file, source-aware
  labels (Voice Session vs Call), separate tracker HTML per source

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add call-diagnostics skill to repo

Copy the diagnosis skill from local ~/.claude/skills/ into the repo
so it's tracked. Includes --metrics flag for voice vs phone analysis,
source-aware labels, and unified tracker HTML generation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Refactor diagnose.py: 1057 → 737 lines (30% reduction)

- Extract CSS/chart JS into module constants
- Consolidate repair recommendations with _make_repair helper
- Compact issue detection (single-line dicts, _ts_short helper)
- Use list append + join for HTML generation
- Remove redundant whitespace in HTML output

No change to CLI behavior or output appearance.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix duplicate metrics: guard writeVoiceMetrics with metricsWritten flag

onSessionEnd and shutdown() both call writeVoiceMetrics(). Flag prevents
duplicate entries. Reset on session start for reconnect support.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Remove 60-char truncation from phone agent event data collection

Full utterances stored in call-metrics.jsonl — matches voice agent.
Diagnosis HTML can truncate for display if needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Diagnosis: add -c flag for timeline context around each issue

Shows ±2 surrounding events with >>> marking the issue point.
Helps cross-reference issues with the observability timeline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix narration drift, diagnostics false positives, get_task_status, test cleanup

- Narration guardrail: "ONLY describe what you SEE" prevents Gemini from
  searching Google during record_screen_with_narration
- Diagnostics: exclude work tool from "failed N times" check (async delegate
  returns 0ms by design), exclude file paths from inline-keyword match
- get_task_status: use in-memory pendingTasks counter instead of scanning
  filesystem (task files get deleted after processing)
- Remove 5 incompatible test stubs (node:test instead of vitest)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Restore test stubs removed in previous commit

These test files use node:test (not vitest) and need migration,
not deletion. Keeping them as placeholders.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix CI: update bodhi dependency to liususan091219 fork

sonichi GitHub account is gone (404). CI fails on npm ci because
it can't clone github:sonichi/bodhi_realtime_agent. Updated to
github:liususan091219/bodhi_realtime_agent (same code, forked
before account went down).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Migrate repo references from sonichi to liususan091219 (#4)

sonichi GitHub account is currently down (404). Update all
references to point to liususan091219/sutando and
liususan091219/bodhi_realtime_agent.

This PR can be reverted once the sonichi account is back.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* startup: auto-install fswatch via Homebrew when missing

Instead of just reporting "fswatch not found", startup.sh now
auto-installs it via `brew install fswatch` if Homebrew is available.
Falls back to the original error message if brew is not installed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Susan Xueqing Liu <xliu127@stevens.edu>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Chi <wangchi@Chis-Mac-mini.hsd1.wa.comcast.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant