Skip to content

AMD short-greeting heuristic classifies voicemail as HUMAN without invoking LLM #5477

@wardandr

Description

@wardandr

Bug Description

classifier.py:163–176 uses a duration-only fast path that bypasses the LLM entirely.

Hardcoded thresholds:

HUMAN_SPEECH_THRESHOLD = 2.5
HUMAN_SILENCE_THRESHOLD = 0.5

These are not configurable.

Real call example: voicemail greeting paused mid-sentence at ~2.33s speech / 528ms silence and was classified as HUMAN.

Transcript from that call:

Your call has been forwarded to voicemail. The person you're trying to reach is not available. At the tone, please record your message. When you have finished recording, you may hang up.

Expected Behavior

Voicemail greetings, even short or slightly interrupted ones, should not be classified as HUMAN when transcript evidence clearly indicates a machine.

Reproduction Steps

1. Record obvious voicemail message (e.g., use the words "you have reached a voicemail box" < 2.5s
2. Pause > 0.5s (e.g. 525ms)
3. Review classification result, OTel spans

Operating System

OS X Sequoia

Models Used

Deepgram Nova 3, gpt-5.3-chat-lastest, Cartesia Sonic-3

Package Versions

livekit==1.1.5
lkivekit-agents==1.5.2
livekit-api==1.1.0

Session/Room/Call IDs

sessionID: RM_RArW6MYFJw9h

Proposed Solution

Expose `HUMAN_SPEECH_THRESHOLD` and `HUMAN_SILENCE_THRESHOLD` as configurable parameters (e.g., constructor kwargs).

If a transcript is available at the short-greeting cutoff, run LLM classification before returning `HUMAN`.

Include fast-path regex classifications for common voicemail greetings.

Additional Context

No response

Screenshots and Recordings

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions