fix(plugins/sarvam): thread language_probability into SpeechData.confidence by hashirventhodi · Pull Request #5830 · livekit/agents

hashirventhodi · 2026-05-24T18:23:30Z

Summary

Fixes #5829. The livekit-plugins-sarvam STT silently dropped Sarvam's language_probability and reported confidence=1.0 everywhere. This wires the real value through both the REST and WS paths.

Changes

livekit-plugins/livekit-plugins-sarvam/livekit/plugins/sarvam/stt.py:
- REST recognize(): parse language_probability from response JSON; thread into SpeechData(confidence=...).
- WS _handle_transcript_data(): parse language_probability from transcript_data; thread into SpeechData(confidence=...).
- Both sites: isinstance(_, (int, float)) guard with 1.0 fallback so callers don't crash on null/missing/string values. logger.debug on unexpected types so contract drift is visible.
livekit-plugins/livekit-plugins-sarvam/tests/test_language_probability.py (new):
- Asserts SpeechData.confidence == 0.87 for a mocked WS payload containing language_probability: 0.87.
- Asserts fallback to 1.0 when the field is absent, null, or has a wrong type.
- REST test stubbed (mocks aiohttp session — same isinstance-guard logic as WS).

Validation

Tested in a downstream voice assistant via a vendored fork carrying these exact patches: worker logs show confidence varying across utterances rather than the previous flat 1.0. The defensive fallback handles missing/null cases without crashing.

Notes / Open question for review

language_probability is documented for Sarvam's REST batch endpoint. Open question (see plugins/sarvam: STT hardcodes confidence=1.0; should thread Sarvam's language_probability into SpeechData #5829): is it guaranteed present on Saaras v3 WS chunks, or best-effort? The defensive isinstance + 1.0 fallback makes the PR safe under either answer, but if it's guaranteed, the fallback path becomes dead code we could tighten later.
@dhruvladia-sarvam — would appreciate confirmation on field stability so we know whether to keep or remove the fallback in a follow-up.
Defensive fallback to 1.0 preserves backward behaviour for callers that read confidence defensively.
No public API changes.
CLA: will be signed before merge (first-time contributor).

…idence Saaras returns language_probability on every transcript (REST + WS), but both code paths drop it and hardcode confidence=1.0. This wires the real value through with a defensive 1.0 fallback if the field is absent or has an unexpected type, and a debug log on contract drift. Closes livekit#5829

CLAassistant · 2026-05-24T18:23:36Z

All committers have signed the CLA.

- REST path: use self._logger instead of module-level logger (matches every other log in _recognize_impl; consistent with WS site) - Tests: rewrite test_language_probability.py to actually exercise the production code: - target SpeechStream._handle_transcript_data (not STT) - mark tests `async def` so pytest-asyncio (mode=auto) awaits them - drop the bogus `request_id="test"` kwarg (method signature is `(self, data: dict)`) - use the real WS payload shape: outer {"type": "data", "data": {...}} with nested keys `transcript`, `language_code`, `language_probability`, `speech_start`, `speech_end`, `metrics`, `request_id` - 14 tests pass locally (happy + missing + null + wrong-type + out-of-range).

hashirventhodi · 2026-05-24T18:31:54Z

Thanks @devin-ai-integration — addressed all four findings in eb7a39af:

🔴 P1 (tests): Rewrote test_language_probability.py to:

Target SpeechStream._handle_transcript_data (not STT — that was just wrong).
Use async def so pytest-asyncio (mode=auto, already in pyproject.toml) awaits the call.
Drop the bogus request_id="test" kwarg — method signature is (self, data: dict).
Use the real WS payload shape: outer {"type": "data", "data": {...}} with nested transcript / language_code / language_probability / speech_start / speech_end / metrics / request_id.

14 tests pass locally:

5 happy-path values (0.0, 0.123, 0.5, 0.87, 1.0) thread through to SpeechData.confidence.
missing / None / wrong-type (str, list, dict, object) fall back to 1.0.
3 out-of-range values pass through verbatim (not clamping at this layer).

🟡 P2 (REST logger): Switched logger.debug → self._logger.debug to match every other log in _recognize_impl and the equivalent WS site at stt.py:1505.

Apologies for the busted first cut — the original tests were written before I had access to read the real method signature and silently never executed. Fresh CI run should reflect the green local result.

davidzhao · 2026-05-24T20:01:19Z

+            # Defensive: defaults to 1.0 if the field is absent or has an
+            # unexpected type (the field is documented for REST but not
+            # explicitly for streaming — API contract drift detection).
+            _lang_prob = transcript_data.get("language_probability")


let's refactor this code instead of duplicating it in both places.

@davidzhao

Per @davidzhao review feedback — collapse the two duplicated language_probability parse blocks (REST + WS) into one module-level helper. Both call sites now read: confidence=_extract_confidence(payload, self._logger) Helper preserves the defensive isinstance guard, 1.0 fallback, and debug-log on contract drift. 14 unit tests still pass.

hashirventhodi · 2026-05-25T03:35:21Z

@davidzhao thanks — refactored in 3493a50b (pushed). Both call sites now read confidence=_extract_confidence(payload, self._logger); the parse + isinstance-guard + 1.0 fallback + drift-log live in one module-level helper. 14 tests still pass.

@devin-ai-integration

…e guard Per @devin-ai-integration review: bool is a subclass of int, so a JSON false from Sarvam would slip through isinstance(value, (int, float)) and become confidence=0.0 — wrongly signalling very low confidence for a valid transcript. Same defensive pattern as livekit-plugins-slng/.../stt.py. Tests: added True/False to the bad_value parametrize (16 pass).

hashirventhodi · 2026-05-25T03:47:18Z

@devin-ai-integration good catch — fixed in 0b1fe91b. Guard now reads isinstance(value, (int, float)) and not isinstance(value, bool), matching the livekit-plugins-slng pattern you cited. Added True/False to the parametrize'd bad-value test (16 cases pass).

dhruvladia-sarvam · 2026-05-26T07:18:36Z

Thank you @davidzhao @hashirventhodi
For your context, this was upcoming this week wrt to recent updates to our STT model, as language confidence is only populated with the relevant custom value when language code is unknown.

This comment was marked as resolved.

Sign in to view

davidzhao reviewed May 24, 2026

View reviewed changes

This comment was marked as resolved.

Sign in to view

davidzhao approved these changes May 25, 2026

View reviewed changes

davidzhao merged commit cea762d into livekit:main May 25, 2026
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(plugins/sarvam): thread language_probability into SpeechData.confidence#5830

fix(plugins/sarvam): thread language_probability into SpeechData.confidence#5830
davidzhao merged 4 commits into
livekit:mainfrom
hashirventhodi:fix/sarvam-language-probability

hashirventhodi commented May 24, 2026 •

edited

Loading

Uh oh!

CLAassistant commented May 24, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

hashirventhodi commented May 24, 2026

Uh oh!

davidzhao May 24, 2026

Uh oh!

hashirventhodi commented May 25, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

hashirventhodi commented May 25, 2026

Uh oh!

Uh oh!

dhruvladia-sarvam commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hashirventhodi commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Validation

Notes / Open question for review

Uh oh!

CLAassistant commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

hashirventhodi commented May 24, 2026

Uh oh!

davidzhao May 24, 2026

Choose a reason for hiding this comment

Uh oh!

hashirventhodi commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

hashirventhodi commented May 25, 2026

Uh oh!

Uh oh!

dhruvladia-sarvam commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hashirventhodi commented May 24, 2026 •

edited

Loading

CLAassistant commented May 24, 2026 •

edited

Loading

hashirventhodi commented May 25, 2026 •

edited

Loading