fix(plugins/sarvam): thread language_probability into SpeechData.confidence#5830
Conversation
…idence Saaras returns language_probability on every transcript (REST + WS), but both code paths drop it and hardcode confidence=1.0. This wires the real value through with a defensive 1.0 fallback if the field is absent or has an unexpected type, and a debug log on contract drift. Closes livekit#5829
- REST path: use self._logger instead of module-level logger
(matches every other log in _recognize_impl; consistent with WS site)
- Tests: rewrite test_language_probability.py to actually exercise
the production code:
- target SpeechStream._handle_transcript_data (not STT)
- mark tests `async def` so pytest-asyncio (mode=auto) awaits them
- drop the bogus `request_id="test"` kwarg (method signature is
`(self, data: dict)`)
- use the real WS payload shape: outer {"type": "data", "data": {...}}
with nested keys `transcript`, `language_code`, `language_probability`,
`speech_start`, `speech_end`, `metrics`, `request_id`
- 14 tests pass locally (happy + missing + null + wrong-type + out-of-range).
|
Thanks @devin-ai-integration — addressed all four findings in 🔴 P1 (tests): Rewrote
14 tests pass locally:
🟡 P2 (REST logger): Switched Apologies for the busted first cut — the original tests were written before I had access to read the real method signature and silently never executed. Fresh CI run should reflect the green local result. |
| # Defensive: defaults to 1.0 if the field is absent or has an | ||
| # unexpected type (the field is documented for REST but not | ||
| # explicitly for streaming — API contract drift detection). | ||
| _lang_prob = transcript_data.get("language_probability") |
There was a problem hiding this comment.
let's refactor this code instead of duplicating it in both places.
Per @davidzhao review feedback — collapse the two duplicated language_probability parse blocks (REST + WS) into one module-level helper. Both call sites now read: confidence=_extract_confidence(payload, self._logger) Helper preserves the defensive isinstance guard, 1.0 fallback, and debug-log on contract drift. 14 unit tests still pass.
|
@davidzhao thanks — refactored in |
…e guard Per @devin-ai-integration review: bool is a subclass of int, so a JSON false from Sarvam would slip through isinstance(value, (int, float)) and become confidence=0.0 — wrongly signalling very low confidence for a valid transcript. Same defensive pattern as livekit-plugins-slng/.../stt.py. Tests: added True/False to the bad_value parametrize (16 pass).
|
@devin-ai-integration good catch — fixed in |
|
Thank you @davidzhao @hashirventhodi |
Summary
Fixes #5829. The
livekit-plugins-sarvamSTT silently dropped Sarvam'slanguage_probabilityand reportedconfidence=1.0everywhere. This wires the real value through both the REST and WS paths.Changes
livekit-plugins/livekit-plugins-sarvam/livekit/plugins/sarvam/stt.py:recognize(): parselanguage_probabilityfrom response JSON; thread intoSpeechData(confidence=...)._handle_transcript_data(): parselanguage_probabilityfromtranscript_data; thread intoSpeechData(confidence=...).isinstance(_, (int, float))guard with 1.0 fallback so callers don't crash on null/missing/string values.logger.debugon unexpected types so contract drift is visible.livekit-plugins/livekit-plugins-sarvam/tests/test_language_probability.py(new):SpeechData.confidence == 0.87for a mocked WS payload containinglanguage_probability: 0.87.Validation
Tested in a downstream voice assistant via a vendored fork carrying these exact patches: worker logs show
confidencevarying across utterances rather than the previous flat 1.0. The defensive fallback handles missing/null cases without crashing.Notes / Open question for review
language_probabilityis documented for Sarvam's REST batch endpoint. Open question (see plugins/sarvam: STT hardcodes confidence=1.0; should thread Sarvam's language_probability into SpeechData #5829): is it guaranteed present on Saaras v3 WS chunks, or best-effort? The defensiveisinstance + 1.0 fallbackmakes the PR safe under either answer, but if it's guaranteed, the fallback path becomes dead code we could tighten later.@dhruvladia-sarvam— would appreciate confirmation on field stability so we know whether to keep or remove the fallback in a follow-up.confidencedefensively.