fix(amd): cancel short_greeting timer on late STT transcript by toubatbrian · Pull Request #1390 · livekit/agents-js

toubatbrian · 2026-05-04T20:15:06Z

Summary

Automated port of livekit/agents#5637 (fix(amd): reset timer for late stt transcript) into agents-js.

Note

This is an automated Claude Code Routine created by @toubatbrian. Right now it is in experimentation stage.

cc @toubatbrian @livekit/agent-devs for review.

Stacked on #1368 (the AMD tunable-params + tools port). The base of this PR is claude/quirky-galileo-51AGi; rebased after #1368 picked up review feedback.

Why

The python fix addresses a race in the AMD silence-timer state machine: after a short utterance, AMD pre-bakes a HUMAN/short_greeting verdict that will fire at speechEndedAt + humanSilenceThresholdMs. If the STT transcript happens to arrive inside that window — which is common when STT settles a beat slow — the short-greeting timer would still fire before the LLM had a chance to look at the transcript, causing AMD to settle as HUMAN/short_greeting instead of running the LLM.

The JS implementation (agents/src/voice/amd.ts) had the same bug. This PR ports the python fix.

What was ported

All changes land in agents/src/voice/amd.ts. The python fix introduces a _silence_timer_trigger field tagging the active silence timer as either "short_speech" (pre-baked HUMAN) or "long_speech" (waiting for LLM/timeout). On push_text, if the trigger is "short_speech" the timer is cancelled and replaced with a "long_speech" timer anchored at speech_ended_at + machine_silence_threshold.

JS mirror:

Python (`livekit-agents/livekit/agents/voice/amd/classifier.py`)	JS (`agents/src/voice/amd.ts`)
`_silence_timer_trigger: Literal["short_speech", "long_speech"] \| None`	`silenceTimerTrigger: 'short_speech' \| 'long_speech' \| undefined`
`_speech_ended_at: float \| None`	`speechEndedAt: number \| undefined`
Tagging in `on_user_speech_ended` (short branch)	Tagging in `handleUserStateChanged` short branch
Tagging in `on_user_speech_ended` (long branch)	Tagging in `handleUserStateChanged` long branch
Cancel + replace in `push_text` when trigger == `"short_speech"`	Cancel + replace in `handleTranscript` when trigger === `'short_speech'`
Clear trigger in `_silence_timer_callback` / `on_user_speech_started` / `close`	Clear trigger in `onSilenceTimerFired` / `clearTimer('silence')` (covers user-speech-started + cleanup) / `resetState`

The replacement long-speech timer is computed against the current wall clock and uses the configurable machineSilenceThresholdMs field exposed by #1368:

const remaining = Math.max(
  0,
  this.speechEndedAt + this.machineSilenceThresholdMs - Date.now(),
);

This preserves the python invariant that the timer fires at speech_ended + machine_silence_threshold, not push_text + machine_silence_threshold.

The new silenceTimerTrigger is also set on the existing-transcript short-speech branch (added in #1368) and on the long-speech branch, so future push_text calls correctly identify which timer is active.

What was intentionally not ported

Python change	JS status	Reason
New `tests/test_amd_classifier.py` (250 lines, exercises the python `_make_classifier` with custom thresholds)	Adapted to JS	Now that #1368 exposes `humanSilenceThresholdMs` / `machineSilenceThresholdMs` as constructor options, the new JS tests use shorter custom thresholds (100 ms / 300 ms) so the suite stays fast. The python invariants (timer cancel, long-speech replacement, short-greeting still fires when no transcript arrives) are covered.
`makefile` change to add `tests/test_amd_classifier.py` to the unit-test suite	Skipped	JS uses `vitest` test discovery; the new tests are co-located in `amd.test.ts` and pick up automatically.

Implementation nuances

Wall clock vs event timestamp. The python code calls time.time() and stores _speech_ended_at as an epoch float, then schedules call_later(remaining) where remaining = (_speech_ended_at + machine_silence_threshold) - time.time(). JS uses Date.now() and stores speechEndedAt from UserStateChangedEvent.createdAt (also Date.now()-based, see agents/src/voice/events.ts). The arithmetic is identical.
Timer-fired hygiene. Python clears _silence_timer and _silence_timer_trigger at the top of _silence_timer_callback. JS does the same in onSilenceTimerFired so a push_text arriving after the timer has fired (but before tryEmitResult settles the run) doesn't see a stale 'short_speech' tag.
clearTimer('silence') resets the trigger. Centralising the trigger reset in clearTimer covers all the python "cancel + null" sites (on_user_speech_started, the cancel-before-replace inside push_text, close, and the postpone-termination path in feat(amd): port tunable params and postpone-termination tool from python #1368).

Test plan

Two new unit tests in agents/src/voice/amd.test.ts (using the configurable thresholds from #1368, so total real-time cost is sub-second):

should not fire short_greeting when a transcript arrives late — emits a 50 ms speech-ended transition, waits 40 ms (still inside the 100 ms HUMAN window), then emits a final transcript. Asserts the resolved verdict has reason === 'llm-verified' (from the StaticLLM), not 'short_greeting'.
should still fire short_greeting when no transcript arrives — emits the same speech transition with no transcript and asserts the resolved verdict has reason === 'short_greeting'. Guards against accidentally regressing the HUMAN heuristic.

Local verification:

pnpm --filter @livekit/agents build — passes
pnpm exec eslint agents/src/voice/amd.ts agents/src/voice/amd.test.ts — 0 errors / 0 warnings
pnpm exec prettier --check agents/src/voice/amd.ts agents/src/voice/amd.test.ts .changeset/amd-late-stt-cancel-short-greeting.md — clean
pnpm exec vitest run agents/src/voice/amd.test.ts — 8/8 pass (6 existing + 2 new)
Manual smoke test against a real SIP call (left to reviewer with phone-number infra)

Changeset

patch for @livekit/agents (per the routine's standing instructions).

Generated by Claude Code

Ports python livekit/agents#5584 (AMD improvement) into agents-js. - Expose `humanSpeechThresholdMs`, `humanSilenceThresholdMs`, `machineSilenceThresholdMs`, and `prompt` as `AMDOptions` fields. - Defer to the LLM (instead of forcing HUMAN) when a transcript is already available after a short greeting. - Add `postpone_termination` LLM tool (capped at 3 extensions × 10s) alongside `save_prediction`; fall back to JSON-content parsing when the LLM does not emit tool calls. - Add `participantIdentity` and `suppressCompatibilityWarning` options. - Warn once when the resolved LLM is not in `EVALUATED_LLM_MODELS`. Skipped (architectural divergence — see PR description): dedicated AMD STT pipeline, track-subscription wait, and the `start()` / `start_timers()` lifecycle split.

- Gate `save_prediction` and `postpone_termination` tool side effects on the current `detectGeneration`. Stale in-flight classifications now no-op instead of mutating timers, budget, or capturing a verdict that belongs to a superseded transcript window. - Normalize `save_prediction`'s `label` argument through `parseCategory` before storing, so an off-enum value from a misbehaving LLM (or our manual JSON path that bypasses Zod) is treated as UNCERTAIN rather than producing an `AMDResult` with an invalid category string. - Fix `warnIfNotEvaluated` substring check to also handle date-suffixed model names (e.g. `openai/gpt-4.1-mini-2025-04-14`).

Without this, a postpone_termination tool call resolved after aclose() would still see isStale() === false (settled was never flipped) and install a fresh silenceTimer that survives cleanup, eventually firing scheduleLLMClassification + tryEmitResult and potentially triggering session.interrupt on a closed AMD.

Without a lower bound and NaN guard, a misbehaving LLM passing a negative or non-numeric `seconds` argument would compute a clampedMs of NaN or a negative number, which setTimeout treats as 0 and fires immediately. The manual tool-execution path here bypasses the Zod schema, so this defense lives in execute().

changeset-bot · 2026-05-04T20:15:11Z

🦋 Changeset detected

Latest commit: a2c8caa

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 29 packages

Name	Type
@livekit/agents	Major
@livekit/agents-plugin-anam	Major
@livekit/agents-plugin-assemblyai	Major
@livekit/agents-plugin-baseten	Major
@livekit/agents-plugin-bey	Major
@livekit/agents-plugin-cartesia	Major
@livekit/agents-plugin-cerebras	Major
@livekit/agents-plugin-deepgram	Major
@livekit/agents-plugin-elevenlabs	Major
@livekit/agents-plugin-google	Major
@livekit/agents-plugin-hedra	Major
@livekit/agents-plugin-inworld	Major
@livekit/agents-plugin-lemonslice	Major
@livekit/agents-plugin-liveavatar	Major
@livekit/agents-plugin-livekit	Major
@livekit/agents-plugin-minimax	Major
@livekit/agents-plugin-mistral	Major
@livekit/agents-plugin-mistralai	Major
@livekit/agents-plugin-neuphonic	Major
@livekit/agents-plugin-openai	Major
@livekit/agents-plugin-phonic	Major
@livekit/agents-plugin-resemble	Major
@livekit/agents-plugin-rime	Major
@livekit/agents-plugin-runway	Major
@livekit/agents-plugin-sarvam	Major
@livekit/agents-plugin-silero	Major
@livekit/agents-plugins-test	Major
@livekit/agents-plugin-trugen	Major
@livekit/agents-plugin-xai	Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

CLAassistant · 2026-05-04T20:15:13Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ toubatbrian
❌ claude
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Port of livekit/agents#5637. When a final STT transcript arrives inside the short-speech HUMAN_SILENCE_THRESHOLD window, cancel the pre-baked HUMAN/short_greeting silence timer and replace it with a long_speech timer anchored at speechEndedAt + MACHINE_SILENCE_THRESHOLD_MS so the LLM verdict gets the final word. https://claude.ai/code/session_017SqU9Zxmo439ZtcdwzKZp9

- added SIP code in the example; - added support for separate STT; - added support for participant wait; - added default models

Port of livekit/agents#5637. When a final STT transcript arrives inside the short-speech HUMAN_SILENCE_THRESHOLD window, cancel the pre-baked HUMAN/short_greeting silence timer and replace it with a long_speech timer anchored at speechEndedAt + MACHINE_SILENCE_THRESHOLD_MS so the LLM verdict gets the final word. https://claude.ai/code/session_017SqU9Zxmo439ZtcdwzKZp9

…-and-stt-support

claude and others added 6 commits May 1, 2026 12:06

Merge branch 'main' into claude/quirky-galileo-51AGi

5bd733f

Update amd.ts

76de2fe

This comment was marked as resolved.

Sign in to view

toubatbrian changed the base branch from main to claude/quirky-galileo-51AGi May 5, 2026 05:40

toubatbrian force-pushed the claude/quirky-galileo-B4wih branch 2 times, most recently from aba6b80 to 95ddc42 Compare May 5, 2026 06:00

chenghao-mou approved these changes May 5, 2026

View reviewed changes

chenghao-mou reviewed May 5, 2026

View reviewed changes

Comment thread agents/src/voice/amd.ts Outdated

toubatbrian force-pushed the claude/quirky-galileo-B4wih branch from 95ddc42 to 15c346a Compare May 5, 2026 09:30

chenghao-mou added 4 commits May 5, 2026 14:19

feat(amd): feature parity with Python AMD implementation

d49ffd0

- added SIP code in the example; - added support for separate STT; - added support for participant wait; - added default models

prepare for AMD remote session event

718750e

address comments

1b96350

add remote session and protocol bump

18eb974

Base automatically changed from claude/quirky-galileo-51AGi to main May 6, 2026 06:42

chenghao-mou and others added 3 commits May 6, 2026 09:46

address comments

f4deb46

update branching example

8e1f6ce

chenghao-mou force-pushed the claude/quirky-galileo-B4wih branch from 15c346a to 4027e25 Compare May 6, 2026 15:02

chenghao-mou added 2 commits May 6, 2026 16:12

Merge branch 'claude/quirky-galileo-B4wih' into chenghao/feat/amd-sip…

9a24e2c

…-and-stt-support

feat(amd): feature parity with Python AMD implementation (#1394)

a2c8caa

toubatbrian merged commit 6df9c28 into main May 7, 2026
8 of 9 checks passed

toubatbrian deleted the claude/quirky-galileo-B4wih branch May 7, 2026 01:26

github-actions Bot mentioned this pull request May 6, 2026

Version Packages #1401

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(amd): cancel short_greeting timer on late STT transcript#1390

fix(amd): cancel short_greeting timer on late STT transcript#1390
toubatbrian merged 16 commits intomainfrom
claude/quirky-galileo-B4wih

toubatbrian commented May 4, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented May 4, 2026 •

edited

Loading

Uh oh!

CLAassistant commented May 4, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

toubatbrian commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What was ported

What was intentionally not ported

Implementation nuances

Test plan

Changeset

Uh oh!

changeset-bot Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

CLAassistant commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

toubatbrian commented May 4, 2026 •

edited

Loading

changeset-bot Bot commented May 4, 2026 •

edited

Loading

CLAassistant commented May 4, 2026 •

edited

Loading