Skip to content

fix(voice): skip stale end-of-turn metrics#1803

Merged
chenghao-mou merged 1 commit into
1.5.0from
pansy-gritting-tuning
Jun 16, 2026
Merged

fix(voice): skip stale end-of-turn metrics#1803
chenghao-mou merged 1 commit into
1.5.0from
pansy-gritting-tuning

Conversation

@rosetta-livekit-bot

@rosetta-livekit-bot rosetta-livekit-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Testing

  • pnpm --filter @livekit/agents build:types
  • pnpm --filter @livekit/agents lint (passes with existing warnings)
  • pnpm vitest run agents/src/voice/audio_recognition_span.test.ts agents/src/voice/audio_recognition_eou.test.ts
  • pnpm prettier --check agents/src/voice/audio_recognition.ts agents/src/voice/agent_activity.ts

Notes

  • Did not add the new source tests, per porting request.
  • pnpm --filter @livekit/agents api:check is blocked by an existing API Extractor parser error on export * as ___ in agents/dist/index.d.ts.

Ported from livekit/agents#6098

Original PR description

Summary

Fixes #6093.

For some user turns, the reported transcription_delay / end_of_turn_delay metrics on ChatMessage are extremely large (often >200s) even though the session recordings show no real delay, and stopped_speaking_at can precede started_speaking_at. In other turns the fields are missing entirely.

Root cause

The metrics are computed in _bounce_eou_task (audio_recognition.py) from three captured anchors:

started_speaking_at  = speech_start_time
stopped_speaking_at  = last_speaking_time           # the internal _last_speaking_time anchor
transcription_delay  = max(last_final_transcript_time - last_speaking_time, 0)
end_of_turn_delay    = time.time() - last_speaking_time

The guard around this block only checked that the three values are not None. When the turn detector commits a user turn whose _last_speaking_time was never refreshed for that segment — e.g. consecutive same-role turns split from one continuous utterance, with no VAD speech-stop/start cycle between them — the anchor is left over from an earlier point in the session and can predate the start of the current turn.

In that case the not-None guard still passes, so end_of_turn_delay = now - last_speaking_time becomes ~200s and stopped_speaking_at ends up before started_speaking_at, exactly the payload reported in the issue.

This is the same class of bug noted in #2361 / #5669 / #4388 (stale/0 anchor), now manifesting as an out-of-order anchor on adjacent turns within one long utterance.

Fix

An anchor that predates the start of the turn (last_speaking_time < speech_start_time) is logically impossible — you cannot stop speaking before the turn started. The existing code already has a policy for unreliable timing (see the in-code comment): skip the calculation and report the metrics as None, because that is better than emitting a likely-wrong value. This change extends that same policy to the out-of-order case.

The computation is extracted into a small pure helper, _compute_end_of_turn_metrics, which:

  • returns None for all four metrics when any anchor is missing or when last_speaking_time < speech_start_time (stale/out-of-order), and
  • otherwise returns the same values as before (with end_of_turn_delay now clamped to >= 0, consistent with the existing transcription_delay clamp).

This makes the behaviour directly unit-testable without audio/STT/VAD.

Testing

New unit test module tests/test_end_of_turn_metrics.py exercises the pure helper with crafted timestamps (no audio):

  • test_normal_turn_produces_small_bounded_delays — well-ordered turn yields the expected sub-second delays.
  • test_stale_anchor_predating_turn_start_is_skipped — regression for this issue, using the exact ~220s numbers from the reported payload; all four metrics must be None.
  • test_anchor_equal_to_start_is_accepted — boundary (last_speaking_time == speech_start_time) stays valid.
  • test_missing_anchor_is_skipped — any missing anchor skips the calculation.
$ uv run pytest tests/test_end_of_turn_metrics.py --unit -q
......                                                                   [100%]
6 passed in 0.02s

Confirmed RED before the fix (reverting the ordering guard): test_stale_anchor_predating_turn_start_is_skipped failed with started_speaking_at=1781342804.815377, end_of_turn_delay=220.28458189964294 — i.e. the bogus >200s value. The existing tests/test_speech_start_time_persistence.py still passes.

ruff check, ruff format --check, and mypy are clean on the changed files.

AI disclosure

This change was AI-assisted; all logic, tests, and verification were reviewed by the author.

@changeset-bot

changeset-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: 7c45be6

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

@chenghao-mou chenghao-mou merged commit 2765bf0 into 1.5.0 Jun 16, 2026
2 checks passed
@chenghao-mou chenghao-mou deleted the pansy-gritting-tuning branch June 16, 2026 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant