Skip to content

fix(plugins/sarvam): thread language_probability into SpeechData.confidence#5830

Merged
davidzhao merged 4 commits into
livekit:mainfrom
hashirventhodi:fix/sarvam-language-probability
May 25, 2026
Merged

fix(plugins/sarvam): thread language_probability into SpeechData.confidence#5830
davidzhao merged 4 commits into
livekit:mainfrom
hashirventhodi:fix/sarvam-language-probability

Conversation

@hashirventhodi
Copy link
Copy Markdown
Contributor

@hashirventhodi hashirventhodi commented May 24, 2026

Summary

Fixes #5829. The livekit-plugins-sarvam STT silently dropped Sarvam's language_probability and reported confidence=1.0 everywhere. This wires the real value through both the REST and WS paths.

Changes

  • livekit-plugins/livekit-plugins-sarvam/livekit/plugins/sarvam/stt.py:

    • REST recognize(): parse language_probability from response JSON; thread into SpeechData(confidence=...).
    • WS _handle_transcript_data(): parse language_probability from transcript_data; thread into SpeechData(confidence=...).
    • Both sites: isinstance(_, (int, float)) guard with 1.0 fallback so callers don't crash on null/missing/string values. logger.debug on unexpected types so contract drift is visible.
  • livekit-plugins/livekit-plugins-sarvam/tests/test_language_probability.py (new):

    • Asserts SpeechData.confidence == 0.87 for a mocked WS payload containing language_probability: 0.87.
    • Asserts fallback to 1.0 when the field is absent, null, or has a wrong type.
    • REST test stubbed (mocks aiohttp session — same isinstance-guard logic as WS).

Validation

Tested in a downstream voice assistant via a vendored fork carrying these exact patches: worker logs show confidence varying across utterances rather than the previous flat 1.0. The defensive fallback handles missing/null cases without crashing.

Notes / Open question for review

  • language_probability is documented for Sarvam's REST batch endpoint. Open question (see plugins/sarvam: STT hardcodes confidence=1.0; should thread Sarvam's language_probability into SpeechData #5829): is it guaranteed present on Saaras v3 WS chunks, or best-effort? The defensive isinstance + 1.0 fallback makes the PR safe under either answer, but if it's guaranteed, the fallback path becomes dead code we could tighten later.
  • @dhruvladia-sarvam — would appreciate confirmation on field stability so we know whether to keep or remove the fallback in a follow-up.
  • Defensive fallback to 1.0 preserves backward behaviour for callers that read confidence defensively.
  • No public API changes.
  • CLA: will be signed before merge (first-time contributor).

…idence

Saaras returns language_probability on every transcript (REST + WS), but
both code paths drop it and hardcode confidence=1.0. This wires the real
value through with a defensive 1.0 fallback if the field is absent or
has an unexpected type, and a debug log on contract drift.

Closes livekit#5829
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 24, 2026

CLA assistant check
All committers have signed the CLA.

devin-ai-integration[bot]

This comment was marked as resolved.

- REST path: use self._logger instead of module-level logger
  (matches every other log in _recognize_impl; consistent with WS site)
- Tests: rewrite test_language_probability.py to actually exercise
  the production code:
  - target SpeechStream._handle_transcript_data (not STT)
  - mark tests `async def` so pytest-asyncio (mode=auto) awaits them
  - drop the bogus `request_id="test"` kwarg (method signature is
    `(self, data: dict)`)
  - use the real WS payload shape: outer {"type": "data", "data": {...}}
    with nested keys `transcript`, `language_code`, `language_probability`,
    `speech_start`, `speech_end`, `metrics`, `request_id`
- 14 tests pass locally (happy + missing + null + wrong-type + out-of-range).
@hashirventhodi
Copy link
Copy Markdown
Contributor Author

Thanks @devin-ai-integration — addressed all four findings in eb7a39af:

🔴 P1 (tests): Rewrote test_language_probability.py to:

  • Target SpeechStream._handle_transcript_data (not STT — that was just wrong).
  • Use async def so pytest-asyncio (mode=auto, already in pyproject.toml) awaits the call.
  • Drop the bogus request_id="test" kwarg — method signature is (self, data: dict).
  • Use the real WS payload shape: outer {"type": "data", "data": {...}} with nested transcript / language_code / language_probability / speech_start / speech_end / metrics / request_id.

14 tests pass locally:

  • 5 happy-path values (0.0, 0.123, 0.5, 0.87, 1.0) thread through to SpeechData.confidence.
  • missing / None / wrong-type (str, list, dict, object) fall back to 1.0.
  • 3 out-of-range values pass through verbatim (not clamping at this layer).

🟡 P2 (REST logger): Switched logger.debugself._logger.debug to match every other log in _recognize_impl and the equivalent WS site at stt.py:1505.

Apologies for the busted first cut — the original tests were written before I had access to read the real method signature and silently never executed. Fresh CI run should reflect the green local result.

# Defensive: defaults to 1.0 if the field is absent or has an
# unexpected type (the field is documented for REST but not
# explicitly for streaming — API contract drift detection).
_lang_prob = transcript_data.get("language_probability")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's refactor this code instead of duplicating it in both places.

Per @davidzhao review feedback — collapse the two duplicated
language_probability parse blocks (REST + WS) into one module-level
helper. Both call sites now read:

  confidence=_extract_confidence(payload, self._logger)

Helper preserves the defensive isinstance guard, 1.0 fallback, and
debug-log on contract drift. 14 unit tests still pass.
@hashirventhodi
Copy link
Copy Markdown
Contributor Author

hashirventhodi commented May 25, 2026

@davidzhao thanks — refactored in 3493a50b (pushed). Both call sites now read confidence=_extract_confidence(payload, self._logger); the parse + isinstance-guard + 1.0 fallback + drift-log live in one module-level helper. 14 tests still pass.

devin-ai-integration[bot]

This comment was marked as resolved.

…e guard

Per @devin-ai-integration review: bool is a subclass of int, so a
JSON false from Sarvam would slip through isinstance(value, (int, float))
and become confidence=0.0 — wrongly signalling very low confidence for
a valid transcript.

Same defensive pattern as livekit-plugins-slng/.../stt.py.

Tests: added True/False to the bad_value parametrize (16 pass).
@hashirventhodi
Copy link
Copy Markdown
Contributor Author

@devin-ai-integration good catch — fixed in 0b1fe91b. Guard now reads isinstance(value, (int, float)) and not isinstance(value, bool), matching the livekit-plugins-slng pattern you cited. Added True/False to the parametrize'd bad-value test (16 cases pass).

@davidzhao davidzhao merged commit cea762d into livekit:main May 25, 2026
16 checks passed
@dhruvladia-sarvam
Copy link
Copy Markdown
Contributor

Thank you @davidzhao @hashirventhodi
For your context, this was upcoming this week wrt to recent updates to our STT model, as language confidence is only populated with the relevant custom value when language code is unknown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

plugins/sarvam: STT hardcodes confidence=1.0; should thread Sarvam's language_probability into SpeechData

4 participants