Bug
livekit-plugins-sarvam returns SpeechData.confidence=1.0 for every transcript, but Sarvam's Saaras API returns a real language_probability field per response. The field is dropped at two sites in stt.py:
-
REST batch path — livekit/plugins/sarvam/stt.py:701 (v1.5.12)
alternatives = [
stt.SpeechData(
language=detected_language,
text=transcript_text,
start_time=start_time,
end_time=end_time,
confidence=1.0, # Sarvam doesn't provide confidence score in this response
)
]
The comment is inaccurate — language_probability is in the response body. See: https://docs.sarvam.ai/api-reference-docs/speech-to-text/transcribe (response shows language_probability: 0.95).
-
Streaming WS path — livekit/plugins/sarvam/stt.py:1475-1480 (v1.5.12) in _handle_transcript_data
speech_data = stt.SpeechData(
language=language,
text=transcript_text,
start_time=transcript_data.get("speech_start", 0.0),
end_time=transcript_data.get("speech_end", 0.0),
)
confidence is omitted entirely, so SpeechData's default (1.0) takes over. The WS payload likewise carries language_probability per chunk.
Why it matters
Real per-chunk confidence values enable downstream language stickiness / code-mix handling. Without them, weighted-voting buffers (we built one for a multilingual voice assistant) can't tell a confident Hindi turn from a noisy guess.
Question before PR
@dhruvladia-sarvam — can you confirm language_probability is:
- A stable field in Saaras v3 streaming responses (not just REST)?
- Present on every transcript chunk, or omitted on some (e.g. final-vs-interim, noisy chunks)?
- Bounded to
[0.0, 1.0], or could it ever fall outside?
This determines whether the PR should treat language_probability as required (assert) or optional with a fallback (defensive isinstance + 1.0 default).
Proposed fix
Once 1–3 are answered, I'd like to open a PR that:
- Parses
language_probability from the response/payload at both sites.
- Threads it into
SpeechData(confidence=...).
- Defaults to
1.0 if absent or wrong type (defensive, in case the field is best-effort).
- Adds a unit test using a mocked WS payload
{"language_probability": 0.87, ...} asserting SpeechData.confidence == 0.87.
Patch is ~20 lines per site, mostly the isinstance-guard helper. Happy to draft once field stability is confirmed.
Reproducer
from livekit.plugins.sarvam import STT
stt = STT(model="saaras:v3", api_key="...")
event = await stt.recognize(buffer=...)
print(event.alternatives[0].confidence) # always 1.0, regardless of input quality
Bug
livekit-plugins-sarvamreturnsSpeechData.confidence=1.0for every transcript, but Sarvam's Saaras API returns a reallanguage_probabilityfield per response. The field is dropped at two sites instt.py:REST batch path —
livekit/plugins/sarvam/stt.py:701(v1.5.12)The comment is inaccurate —
language_probabilityis in the response body. See: https://docs.sarvam.ai/api-reference-docs/speech-to-text/transcribe (response showslanguage_probability: 0.95).Streaming WS path —
livekit/plugins/sarvam/stt.py:1475-1480(v1.5.12) in_handle_transcript_dataconfidenceis omitted entirely, soSpeechData's default (1.0) takes over. The WS payload likewise carrieslanguage_probabilityper chunk.Why it matters
Real per-chunk confidence values enable downstream language stickiness / code-mix handling. Without them, weighted-voting buffers (we built one for a multilingual voice assistant) can't tell a confident Hindi turn from a noisy guess.
Question before PR
@dhruvladia-sarvam — can you confirm
language_probabilityis:[0.0, 1.0], or could it ever fall outside?This determines whether the PR should treat
language_probabilityas required (assert) or optional with a fallback (defensiveisinstance+ 1.0 default).Proposed fix
Once 1–3 are answered, I'd like to open a PR that:
language_probabilityfrom the response/payload at both sites.SpeechData(confidence=...).1.0if absent or wrong type (defensive, in case the field is best-effort).{"language_probability": 0.87, ...}assertingSpeechData.confidence == 0.87.Patch is ~20 lines per site, mostly the isinstance-guard helper. Happy to draft once field stability is confirmed.
Reproducer