feat: add Sarvam AI STT plugin by mshivam019 · Pull Request #1046 · livekit/agents-js

mshivam019 · 2026-02-12T09:34:33Z

Summary

Adds speech-to-text (STT) support to the Sarvam AI plugin with both REST and WebSocket streaming
REST: _recognize() for single-shot transcription via /speech-to-text and /speech-to-text-translate
WebSocket streaming: SpeechStream class with real-time audio streaming via /speech-to-text/ws and /speech-to-text-translate/ws
Defaults to the recommended saaras:v3 model with 22+ Indian languages and 5 transcription modes (transcribe, translate, verbatim, translit, codemix)
Also supports saarika:v2.5 (deprecated) and saaras:v2.5 (Indic-to-English translation with auto language detection)
Follows the same discriminated-union option pattern as the existing TTS plugin (model-specific types, per-model defaults, safe updateOptions on model switch)

WebSocket streaming details

Tri-task architecture: sendTask (audio input → base64 JSON), listenTask (server messages → SpeechEvents), wsMonitor (disconnect detection)
VAD events: START_SPEECH / END_SPEECH mapped to SpeechEventType
All WS query params supported: high_vad_sensitivity, flush_signal, vad_signals, input_audio_codec
Robust error parsing: handles data.message, data.error, and top-level fallbacks
Retry loop with linear backoff on idle timeout (~20s server-side disconnect)
end_of_stream includes empty audio field per Sarvam protocol requirement
Matches the Python SDK's reconnect-on-idle approach (no keepalive needed)

Files changed

File	Change
`plugins/sarvam/src/stt.ts`	STT class with `_recognize()` + `SpeechStream` class with WS streaming
`plugins/sarvam/src/stt.test.ts`	Unit tests using `@livekit/agents-plugins-test` harness
`plugins/sarvam/src/models.ts`	Added `STTModels`, `STTModes`, `STTV2Languages`, `STTV3Languages` types
`plugins/sarvam/src/index.ts`	Export STT, SpeechStream, and option types
`plugins/sarvam/package.json`	Added `ws` and `@types/ws` dependencies
`plugins/sarvam/README.md`	Updated with STT usage, language list, and mode reference

Usage

import * as sarvam from '@livekit/agents-plugin-sarvam';

// REST (single-shot)
const stt = new sarvam.STT({
  model: 'saaras:v3',
  languageCode: 'en-IN',
  mode: 'transcribe',
});

// WebSocket streaming (real-time)
const stream = stt.stream();

Test plan

Verify STT.recognize() transcribes audio correctly with saaras:v3
Verify updateOptions() handles model switching (v3 → v2.5 and back) without leaking model-specific fields
Verify WS streaming receives transcripts and VAD events in real-time
Verify WS reconnects gracefully on idle timeout
Verify error parsing handles all server error formats
Run pnpm vitest plugins/sarvam with SARVAM_API_KEY set

Add @livekit/agents-plugin-sarvam with text-to-speech support using Sarvam AI's Bulbul models. Supports 11 Indian languages and 45+ speaker voices via the Sarvam REST API. - TTS and ChunkedStream classes following existing plugin patterns - Models/speakers/languages type definitions - Test file using shared @livekit/agents-plugins-test harness - SARVAM_API_KEY added to turbo.json globalEnv - Calls AudioByteStream.flush() to prevent trailing audio truncation

…peaker names - Split TTSSpeakers into TTSV2Speakers and TTSV3Speakers types - Add TTSSampleRates and TTSAudioCodecs types to models.ts - Rewrite TTS with discriminated union options (TTSV2Options/TTSV3Options) - V2-specific: pitch, loudness, enablePreprocessing - V3-specific: temperature - Extract resolveOptions() and buildRequestBody() for SRP - Fix speaker names to lowercase (API requires lowercase, not capitalized) - Export new types from index.ts

AudioByteStream requires raw PCM data, which we obtain by stripping the 44-byte WAV header. Allowing user-configurable outputAudioCodec would produce compressed audio (mp3, opus, etc.) that silently breaks the pipeline. Remove outputAudioCodec from public options and hardcode WAV in the API request.

When updateOptions switches the model (e.g. v2 -> v3), the previous shallow merge kept stale model-specific fields like speaker, pitch, and loudness from the old model. Now delegates to resolveOptions() so model-specific defaults are re-applied correctly.

The spread of ResolvedTTSOptions (model: TTSModels) doesn't satisfy the discriminated union TTSOptions. Cast to TTSOptions before passing to resolveOptions, which handles discrimination internally via isV3 check.

…ioCodecs - Drop model-specific fields (speaker, pitch, loudness, temperature, enablePreprocessing) when switching models so resolveOptions applies correct defaults for the new model - Add type assertions for discriminated union compatibility - Remove unused TTSAudioCodecs type from models.ts

changeset-bot · 2026-02-12T09:34:38Z

🦋 Changeset detected

Latest commit: 8bcf7fb

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 20 packages

Name	Type
@livekit/agents-plugin-sarvam	Patch
@livekit/agents	Patch
@livekit/agents-plugin-anam	Patch
@livekit/agents-plugin-baseten	Patch
@livekit/agents-plugin-bey	Patch
@livekit/agents-plugin-cartesia	Patch
@livekit/agents-plugin-deepgram	Patch
@livekit/agents-plugin-elevenlabs	Patch
@livekit/agents-plugin-google	Patch
@livekit/agents-plugin-hedra	Patch
@livekit/agents-plugin-inworld	Patch
@livekit/agents-plugin-lemonslice	Patch
@livekit/agents-plugin-livekit	Patch
@livekit/agents-plugin-neuphonic	Patch
@livekit/agents-plugin-openai	Patch
@livekit/agents-plugin-resemble	Patch
@livekit/agents-plugin-rime	Patch
@livekit/agents-plugin-silero	Patch
@livekit/agents-plugin-xai	Patch
@livekit/agents-plugins-test	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

devin-ai-integration

Devin Review found 1 potential issue.

View 6 additional findings in Devin Review.

plugins/sarvam/src/stt.ts

Add speech-to-text support to the Sarvam plugin using the Sarvam AI speech-to-text REST API. Defaults to the recommended saaras:v3 model with support for 22+ Indian languages and 5 transcription modes (transcribe, translate, verbatim, translit, codemix). Also supports the deprecated saarika:v2.5 model for backward compatibility.

- Add saaras:v2.5 model with /speech-to-text-translate endpoint routing - Add STTTranslateOptions with prompt param for translate endpoint - Model-aware buildFormData: language_code for saarika/saaras-v3, mode for saaras-v3, prompt for saaras-v2.5 - Endpoint routing: saaras:v2.5 → /speech-to-text-translate, others → /speech-to-text - updateOptions handles model switching across all three models - Language fallback chain: API language_code → configured → 'unknown'

- Add SpeechStream class with full WS streaming (sendTask, listenTask, wsMonitor) - Support both /speech-to-text/ws and /speech-to-text-translate/ws endpoints - Handle VAD events (START_SPEECH, END_SPEECH) and final transcripts - Add all WS query params: high_vad_sensitivity, flush_signal, vad_signals - Add prompt and withTimestamps support for REST endpoints - Robust error parsing (data.message + data.error + top-level fallbacks) - Retry loop with linear backoff on disconnect (matches idle timeout behavior) - end_of_stream includes empty audio field per Sarvam protocol requirement - Add ws and @types/ws dependencies

devin-ai-integration

Devin Review found 1 new potential issue.

View 16 additional findings in Devin Review.

plugins/sarvam/src/stt.ts

…nection listenTask was created with this.abortController (the SpeechStream's main controller). When listenTask.cancel() was called in the finally block, it permanently aborted the stream's main signal, causing sendTask to exit immediately on every subsequent WS reconnection (infinite rapid reconnect loop with no audio sent). Fix: remove the shared abortController arg so Task.from() creates its own internal controller. listenTask.cancel() now only aborts that local controller, leaving the stream's main signal intact for reconnection.

devin-ai-integration

Devin Review found 1 new potential issue.

View 18 additional findings in Devin Review.

plugins/sarvam/src/stt.ts

…m transcripts

…se.all Task objects are not thenables — passing wsMonitor directly to Promise.all caused it to resolve immediately, making WS close detection ineffective. The stream would hang on idle timeout instead of triggering the retry loop.

toubatbrian · 2026-02-12T22:30:54Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a55b1ee053

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-12T22:37:22Z

plugins/sarvam/src/stt.ts

+      await Promise.race([
+        this.#resetWS.await,
+        Promise.all([sendTask(), listenTask.result, wsMonitor.result]),
+      ]);


Cancel stale send loop on websocket reset

In #runWS, Promise.race can resolve via this.#resetWS.await when updateOptions() is called, but sendTask() is a plain async function and is never cancelled. After the finally block closes the websocket, that old send loop can still be blocked on this.input.next() and then consume subsequent audio frames, attempting to write them to a closed socket; this drops user audio right after an option/model change.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-12T22:37:22Z

plugins/sarvam/src/stt.ts

+      const listenMessage = new Promise<void>((resolve, reject) => {
+        ws.on('message', (msg: RawData) => {
+          try {


Resolve listener when websocket closes without messages

listenMessage only subscribes to ws.on('message') and only resolves from inside that callback, so a normal shutdown path with no trailing transcript/event message can hang forever. This is reachable when sendTask sets closing = true and cancels wsMonitor (e.g., silent input), because a subsequent socket close then has no code path to settle listenTask, preventing stream completion.

Useful? React with 👍 / 👎.

- Add .gitattributes with `* text=auto eol=lf` to enforce LF endings - Fix sendTask not cancellable on WS reset (session-scoped AbortController) - Fix listenMessage hanging on WS close without trailing messages - Normalize all sarvam plugin files from CRLF to LF - Fix prettier formatting in index.ts and models.ts

devin-ai-integration

Devin Review found 1 new potential issue.

View 20 additional findings in Devin Review.

plugins/sarvam/src/stt.ts

devin-ai-integration

Devin Review found 1 new potential issue.

View 24 additional findings in Devin Review.

plugins/sarvam/src/stt.ts

Prevents stale #speaking=true from suppressing START_OF_SPEECH events after a WS disconnect that occurred mid-speech.

devin-ai-integration

Devin Review found 1 new potential issue.

View 27 additional findings in Devin Review.

plugins/sarvam/src/stt.ts

….cancel() listenTask was created without the parent abort controller, causing a deadlock when SpeechStream.close() was called — the finally block couldn't run because Promise.all was stuck waiting for listenTask. Passing this.abortController to Task.from allows listenTask to exit when the stream is closed. Removing listenTask.cancel() from the finally block prevents it from permanently aborting the parent controller on WS reconnection. Instead, ws.close() triggers the ws.once('close') handler in listenMessage for clean exit.

devin-ai-integration

Devin Review found 1 new potential issue.

View 31 additional findings in Devin Review.

plugins/sarvam/src/stt.ts

Resetting retries to 0 after TCP connect meant that if the session immediately failed (e.g. auth error, server rejection), the counter never reached maxRetry, causing an infinite tight loop with 0ms delay.

Escape curly braces in JSDoc comment that TSDoc parser was interpreting as malformed inline tags.

devin-ai-integration

Devin Review found 1 new potential issue.

View 32 additional findings in Devin Review.

plugins/sarvam/src/stt.ts

…fer slice - Reset retry counter only after sessions that ran >5s, distinguishing expected idle-timeout reconnections from persistent connection failures - Use buffer.slice(byteOffset, byteOffset+byteLength) for AudioByteStream to handle typed array views into pooled/shared ArrayBuffers correctly - Fix TSDoc comment with unescaped braces

toubatbrian · 2026-02-13T22:54:36Z

Can you merge main into the branch to resolve some conflicts? Just merged the TTS plugin

mshivam019 and others added 7 commits February 10, 2026 12:37

Create red-cars-warn.md

fdf5ddf

fix: add type assertion in updateOptions for discriminated union compat

13ec7f9

The spread of ResolvedTTSOptions (model: TTSModels) doesn't satisfy the discriminated union TTSOptions. Cast to TTSOptions before passing to resolveOptions, which handles discrimination internally via isV3 check.

devin-ai-integration bot reviewed Feb 12, 2026

View reviewed changes

plugins/sarvam/src/stt.ts Outdated Show resolved Hide resolved

mshivam019 force-pushed the feat/sarvam-stt-plugin branch 3 times, most recently from fec3009 to 7df750e Compare February 12, 2026 11:45

mshivam019 force-pushed the feat/sarvam-stt-plugin branch from 7df750e to f3fb024 Compare February 12, 2026 11:47

mshivam019 added 2 commits February 12, 2026 17:27

devin-ai-integration bot reviewed Feb 12, 2026

View reviewed changes

plugins/sarvam/src/stt.ts Outdated Show resolved Hide resolved

devin-ai-integration bot reviewed Feb 12, 2026

View reviewed changes

plugins/sarvam/src/stt.ts Outdated Show resolved Hide resolved

mshivam019 added 2 commits February 12, 2026 20:11

fix: set interimResults to false — Sarvam STT does not support interi…

51961ce

…m transcripts

chatgpt-codex-connector bot reviewed Feb 12, 2026

View reviewed changes

devin-ai-integration bot reviewed Feb 13, 2026

View reviewed changes

plugins/sarvam/src/stt.ts Show resolved Hide resolved

chore: add .gitattributes to enforce LF line endings

c501c43

devin-ai-integration bot reviewed Feb 13, 2026

View reviewed changes

plugins/sarvam/src/stt.ts Show resolved Hide resolved

fix: reset speaking state on WS reconnection

f41bb61

Prevents stale #speaking=true from suppressing START_OF_SPEECH events after a WS disconnect that occurred mid-speech.

devin-ai-integration bot reviewed Feb 13, 2026

View reviewed changes

plugins/sarvam/src/stt.ts Outdated Show resolved Hide resolved

fix: use consistent default confidence (0) across REST and WS paths

15c8fa0

devin-ai-integration bot reviewed Feb 13, 2026

View reviewed changes

plugins/sarvam/src/stt.ts Outdated Show resolved Hide resolved

mshivam019 added 2 commits February 13, 2026 11:30

fix: remove retry counter reset to prevent infinite reconnect loop

1116b29

Resetting retries to 0 after TCP connect meant that if the session immediately failed (e.g. auth error, server rejection), the counter never reached maxRetry, causing an infinite tight loop with 0ms delay.

fix: resolve TSDoc lint warnings in STT plugin

231c14a

Escape curly braces in JSDoc comment that TSDoc parser was interpreting as malformed inline tags.

devin-ai-integration bot reviewed Feb 13, 2026

View reviewed changes

plugins/sarvam/src/stt.ts Show resolved Hide resolved

mshivam019 and others added 2 commits February 14, 2026 07:10

Merge branch 'main' into feat/sarvam-stt-plugin

0275ab5

Create shiny-lamps-sparkle.md

8bcf7fb

toubatbrian approved these changes Feb 14, 2026

View reviewed changes

toubatbrian merged commit 47251f4 into livekit:main Feb 14, 2026
4 checks passed

github-actions bot mentioned this pull request Feb 14, 2026

Version Packages #1049

Merged

mshivam019 mentioned this pull request Feb 16, 2026

fix: add explicit Sarvam WS/REST preference toggles #1057

Merged

Conversation

mshivam019 commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

WebSocket streaming details

Files changed

Usage

Test plan

Uh oh!

changeset-bot bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

toubatbrian commented Feb 12, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

toubatbrian commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

mshivam019 commented Feb 12, 2026 •

edited

Loading

changeset-bot bot commented Feb 12, 2026 •

edited

Loading