Skip to content

plugins/sarvam: STT hardcodes confidence=1.0; should thread Sarvam's language_probability into SpeechData #5829

@hashirventhodi

Description

@hashirventhodi

Bug

livekit-plugins-sarvam returns SpeechData.confidence=1.0 for every transcript, but Sarvam's Saaras API returns a real language_probability field per response. The field is dropped at two sites in stt.py:

  • REST batch pathlivekit/plugins/sarvam/stt.py:701 (v1.5.12)

    alternatives = [
        stt.SpeechData(
            language=detected_language,
            text=transcript_text,
            start_time=start_time,
            end_time=end_time,
            confidence=1.0,  # Sarvam doesn't provide confidence score in this response
        )
    ]

    The comment is inaccurate — language_probability is in the response body. See: https://docs.sarvam.ai/api-reference-docs/speech-to-text/transcribe (response shows language_probability: 0.95).

  • Streaming WS pathlivekit/plugins/sarvam/stt.py:1475-1480 (v1.5.12) in _handle_transcript_data

    speech_data = stt.SpeechData(
        language=language,
        text=transcript_text,
        start_time=transcript_data.get("speech_start", 0.0),
        end_time=transcript_data.get("speech_end", 0.0),
    )

    confidence is omitted entirely, so SpeechData's default (1.0) takes over. The WS payload likewise carries language_probability per chunk.

Why it matters

Real per-chunk confidence values enable downstream language stickiness / code-mix handling. Without them, weighted-voting buffers (we built one for a multilingual voice assistant) can't tell a confident Hindi turn from a noisy guess.

Question before PR

@dhruvladia-sarvam — can you confirm language_probability is:

  1. A stable field in Saaras v3 streaming responses (not just REST)?
  2. Present on every transcript chunk, or omitted on some (e.g. final-vs-interim, noisy chunks)?
  3. Bounded to [0.0, 1.0], or could it ever fall outside?

This determines whether the PR should treat language_probability as required (assert) or optional with a fallback (defensive isinstance + 1.0 default).

Proposed fix

Once 1–3 are answered, I'd like to open a PR that:

  • Parses language_probability from the response/payload at both sites.
  • Threads it into SpeechData(confidence=...).
  • Defaults to 1.0 if absent or wrong type (defensive, in case the field is best-effort).
  • Adds a unit test using a mocked WS payload {"language_probability": 0.87, ...} asserting SpeechData.confidence == 0.87.

Patch is ~20 lines per site, mostly the isinstance-guard helper. Happy to draft once field stability is confirmed.

Reproducer

from livekit.plugins.sarvam import STT
stt = STT(model="saaras:v3", api_key="...")
event = await stt.recognize(buffer=...)
print(event.alternatives[0].confidence)  # always 1.0, regardless of input quality

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions