Skip to content

fix(amd): cancel short_greeting timer on late STT transcript#1390

Merged
toubatbrian merged 16 commits intomainfrom
claude/quirky-galileo-B4wih
May 7, 2026
Merged

fix(amd): cancel short_greeting timer on late STT transcript#1390
toubatbrian merged 16 commits intomainfrom
claude/quirky-galileo-B4wih

Conversation

@toubatbrian
Copy link
Copy Markdown
Contributor

@toubatbrian toubatbrian commented May 4, 2026

Summary

Automated port of livekit/agents#5637 (fix(amd): reset timer for late stt transcript) into agents-js.

Note

This is an automated Claude Code Routine created by @toubatbrian. Right now it is in experimentation stage.

cc @toubatbrian @livekit/agent-devs for review.

Stacked on #1368 (the AMD tunable-params + tools port). The base of this PR is claude/quirky-galileo-51AGi; rebased after #1368 picked up review feedback.

Why

The python fix addresses a race in the AMD silence-timer state machine: after a short utterance, AMD pre-bakes a HUMAN/short_greeting verdict that will fire at speechEndedAt + humanSilenceThresholdMs. If the STT transcript happens to arrive inside that window — which is common when STT settles a beat slow — the short-greeting timer would still fire before the LLM had a chance to look at the transcript, causing AMD to settle as HUMAN/short_greeting instead of running the LLM.

The JS implementation (agents/src/voice/amd.ts) had the same bug. This PR ports the python fix.

What was ported

All changes land in agents/src/voice/amd.ts. The python fix introduces a _silence_timer_trigger field tagging the active silence timer as either "short_speech" (pre-baked HUMAN) or "long_speech" (waiting for LLM/timeout). On push_text, if the trigger is "short_speech" the timer is cancelled and replaced with a "long_speech" timer anchored at speech_ended_at + machine_silence_threshold.

JS mirror:

Python (livekit-agents/livekit/agents/voice/amd/classifier.py) JS (agents/src/voice/amd.ts)
_silence_timer_trigger: Literal["short_speech", "long_speech"] | None silenceTimerTrigger: 'short_speech' | 'long_speech' | undefined
_speech_ended_at: float | None speechEndedAt: number | undefined
Tagging in on_user_speech_ended (short branch) Tagging in handleUserStateChanged short branch
Tagging in on_user_speech_ended (long branch) Tagging in handleUserStateChanged long branch
Cancel + replace in push_text when trigger == "short_speech" Cancel + replace in handleTranscript when trigger === 'short_speech'
Clear trigger in _silence_timer_callback / on_user_speech_started / close Clear trigger in onSilenceTimerFired / clearTimer('silence') (covers user-speech-started + cleanup) / resetState

The replacement long-speech timer is computed against the current wall clock and uses the configurable machineSilenceThresholdMs field exposed by #1368:

const remaining = Math.max(
  0,
  this.speechEndedAt + this.machineSilenceThresholdMs - Date.now(),
);

This preserves the python invariant that the timer fires at speech_ended + machine_silence_threshold, not push_text + machine_silence_threshold.

The new silenceTimerTrigger is also set on the existing-transcript short-speech branch (added in #1368) and on the long-speech branch, so future push_text calls correctly identify which timer is active.

What was intentionally not ported

Python change JS status Reason
New tests/test_amd_classifier.py (250 lines, exercises the python _make_classifier with custom thresholds) Adapted to JS Now that #1368 exposes humanSilenceThresholdMs / machineSilenceThresholdMs as constructor options, the new JS tests use shorter custom thresholds (100 ms / 300 ms) so the suite stays fast. The python invariants (timer cancel, long-speech replacement, short-greeting still fires when no transcript arrives) are covered.
makefile change to add tests/test_amd_classifier.py to the unit-test suite Skipped JS uses vitest test discovery; the new tests are co-located in amd.test.ts and pick up automatically.

Implementation nuances

  • Wall clock vs event timestamp. The python code calls time.time() and stores _speech_ended_at as an epoch float, then schedules call_later(remaining) where remaining = (_speech_ended_at + machine_silence_threshold) - time.time(). JS uses Date.now() and stores speechEndedAt from UserStateChangedEvent.createdAt (also Date.now()-based, see agents/src/voice/events.ts). The arithmetic is identical.
  • Timer-fired hygiene. Python clears _silence_timer and _silence_timer_trigger at the top of _silence_timer_callback. JS does the same in onSilenceTimerFired so a push_text arriving after the timer has fired (but before tryEmitResult settles the run) doesn't see a stale 'short_speech' tag.
  • clearTimer('silence') resets the trigger. Centralising the trigger reset in clearTimer covers all the python "cancel + null" sites (on_user_speech_started, the cancel-before-replace inside push_text, close, and the postpone-termination path in feat(amd): port tunable params and postpone-termination tool from python #1368).

Test plan

Two new unit tests in agents/src/voice/amd.test.ts (using the configurable thresholds from #1368, so total real-time cost is sub-second):

  • should not fire short_greeting when a transcript arrives late — emits a 50 ms speech-ended transition, waits 40 ms (still inside the 100 ms HUMAN window), then emits a final transcript. Asserts the resolved verdict has reason === 'llm-verified' (from the StaticLLM), not 'short_greeting'.
  • should still fire short_greeting when no transcript arrives — emits the same speech transition with no transcript and asserts the resolved verdict has reason === 'short_greeting'. Guards against accidentally regressing the HUMAN heuristic.

Local verification:

  • pnpm --filter @livekit/agents build — passes
  • pnpm exec eslint agents/src/voice/amd.ts agents/src/voice/amd.test.ts — 0 errors / 0 warnings
  • pnpm exec prettier --check agents/src/voice/amd.ts agents/src/voice/amd.test.ts .changeset/amd-late-stt-cancel-short-greeting.md — clean
  • pnpm exec vitest run agents/src/voice/amd.test.ts8/8 pass (6 existing + 2 new)
  • Manual smoke test against a real SIP call (left to reviewer with phone-number infra)

Changeset

patch for @livekit/agents (per the routine's standing instructions).


Generated by Claude Code

claude and others added 6 commits May 1, 2026 12:06
Ports python livekit/agents#5584 (AMD improvement) into agents-js.

- Expose `humanSpeechThresholdMs`, `humanSilenceThresholdMs`,
  `machineSilenceThresholdMs`, and `prompt` as `AMDOptions` fields.
- Defer to the LLM (instead of forcing HUMAN) when a transcript is
  already available after a short greeting.
- Add `postpone_termination` LLM tool (capped at 3 extensions × 10s)
  alongside `save_prediction`; fall back to JSON-content parsing when
  the LLM does not emit tool calls.
- Add `participantIdentity` and `suppressCompatibilityWarning` options.
- Warn once when the resolved LLM is not in `EVALUATED_LLM_MODELS`.

Skipped (architectural divergence — see PR description): dedicated AMD
STT pipeline, track-subscription wait, and the `start()` /
`start_timers()` lifecycle split.
- Gate `save_prediction` and `postpone_termination` tool side effects on
  the current `detectGeneration`. Stale in-flight classifications now
  no-op instead of mutating timers, budget, or capturing a verdict that
  belongs to a superseded transcript window.
- Normalize `save_prediction`'s `label` argument through `parseCategory`
  before storing, so an off-enum value from a misbehaving LLM (or our
  manual JSON path that bypasses Zod) is treated as UNCERTAIN rather
  than producing an `AMDResult` with an invalid category string.
- Fix `warnIfNotEvaluated` substring check to also handle date-suffixed
  model names (e.g. `openai/gpt-4.1-mini-2025-04-14`).
Without this, a postpone_termination tool call resolved after aclose()
would still see isStale() === false (settled was never flipped) and
install a fresh silenceTimer that survives cleanup, eventually firing
scheduleLLMClassification + tryEmitResult and potentially triggering
session.interrupt on a closed AMD.
Without a lower bound and NaN guard, a misbehaving LLM passing a
negative or non-numeric `seconds` argument would compute a clampedMs
of NaN or a negative number, which setTimeout treats as 0 and fires
immediately. The manual tool-execution path here bypasses the Zod
schema, so this defense lives in execute().
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 4, 2026

🦋 Changeset detected

Latest commit: a2c8caa

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 29 packages
Name Type
@livekit/agents Major
@livekit/agents-plugin-anam Major
@livekit/agents-plugin-assemblyai Major
@livekit/agents-plugin-baseten Major
@livekit/agents-plugin-bey Major
@livekit/agents-plugin-cartesia Major
@livekit/agents-plugin-cerebras Major
@livekit/agents-plugin-deepgram Major
@livekit/agents-plugin-elevenlabs Major
@livekit/agents-plugin-google Major
@livekit/agents-plugin-hedra Major
@livekit/agents-plugin-inworld Major
@livekit/agents-plugin-lemonslice Major
@livekit/agents-plugin-liveavatar Major
@livekit/agents-plugin-livekit Major
@livekit/agents-plugin-minimax Major
@livekit/agents-plugin-mistral Major
@livekit/agents-plugin-mistralai Major
@livekit/agents-plugin-neuphonic Major
@livekit/agents-plugin-openai Major
@livekit/agents-plugin-phonic Major
@livekit/agents-plugin-resemble Major
@livekit/agents-plugin-rime Major
@livekit/agents-plugin-runway Major
@livekit/agents-plugin-sarvam Major
@livekit/agents-plugin-silero Major
@livekit/agents-plugins-test Major
@livekit/agents-plugin-trugen Major
@livekit/agents-plugin-xai Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 4, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ toubatbrian
❌ claude
You have signed the CLA already but the status is still pending? Let us recheck it.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@toubatbrian toubatbrian changed the base branch from main to claude/quirky-galileo-51AGi May 5, 2026 05:40
@toubatbrian toubatbrian force-pushed the claude/quirky-galileo-B4wih branch 2 times, most recently from aba6b80 to 95ddc42 Compare May 5, 2026 06:00
Comment thread agents/src/voice/amd.ts Outdated
Port of livekit/agents#5637. When a final STT transcript arrives inside
the short-speech HUMAN_SILENCE_THRESHOLD window, cancel the pre-baked
HUMAN/short_greeting silence timer and replace it with a long_speech
timer anchored at speechEndedAt + MACHINE_SILENCE_THRESHOLD_MS so the
LLM verdict gets the final word.

https://claude.ai/code/session_017SqU9Zxmo439ZtcdwzKZp9
@toubatbrian toubatbrian force-pushed the claude/quirky-galileo-B4wih branch from 95ddc42 to 15c346a Compare May 5, 2026 09:30
- added SIP code in the example;
- added support for separate STT;
- added support for participant wait;
- added default models
Base automatically changed from claude/quirky-galileo-51AGi to main May 6, 2026 06:42
chenghao-mou and others added 3 commits May 6, 2026 09:46
Port of livekit/agents#5637. When a final STT transcript arrives inside
the short-speech HUMAN_SILENCE_THRESHOLD window, cancel the pre-baked
HUMAN/short_greeting silence timer and replace it with a long_speech
timer anchored at speechEndedAt + MACHINE_SILENCE_THRESHOLD_MS so the
LLM verdict gets the final word.

https://claude.ai/code/session_017SqU9Zxmo439ZtcdwzKZp9
@chenghao-mou chenghao-mou force-pushed the claude/quirky-galileo-B4wih branch from 15c346a to 4027e25 Compare May 6, 2026 15:02
@toubatbrian toubatbrian merged commit 6df9c28 into main May 7, 2026
8 of 9 checks passed
@toubatbrian toubatbrian deleted the claude/quirky-galileo-B4wih branch May 7, 2026 01:26
@github-actions github-actions Bot mentioned this pull request May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants