Add Inworld STT provider to livekit-plugins-inworld#5451
Merged
tinalenguyen merged 9 commits intolivekit:mainfrom Apr 21, 2026
Merged
Add Inworld STT provider to livekit-plugins-inworld#5451tinalenguyen merged 9 commits intolivekit:mainfrom
tinalenguyen merged 9 commits intolivekit:mainfrom
Conversation
0daf792 to
35acfa0
Compare
ianbbqzy
reviewed
Apr 14, 2026
a850403 to
82b9dd7
Compare
Add streaming speech-to-text support via Inworld's WebSocket STT API alongside the existing TTS plugin. The implementation follows established LiveKit plugin patterns (Deepgram, Soniox, Google) including reconnection logic, stream tracking, START/END_OF_SPEECH events, and request_id. - STT class with streaming and sync recognition - SpeechStream with reconnection loop and _reconnect_event - WeakSet stream tracking for coordinated shutdown - Pass-through model selection (no hardcoded model list) - update_options() for runtime config changes - Voice profile and VAD configuration support - Example agent and standalone test script
Member
tinalenguyen
left a comment
There was a problem hiding this comment.
left a few comments, i also think the example would be the most useful and accessible in the plugin README
… parsing helper, inline example in README, restore product-specific docs links
The landing page /agents/integrations/inworld/ 404s. Point at the TTS and STT integration pages instead, matching the README.
A final transcript with an empty transcript string (VAD false positive, unrecognizable noise) was returning early before the END_OF_SPEECH block, leaving _speaking=True. Subsequent speechStarted events were then ignored, wedging the stream. Skip only the transcript event when text is empty; let final events fall through to the end-of-speech emission.
The base class stt._metrics_monitor_task reports metrics_collected for streaming STT off RECOGNITION_USAGE events; without them no usage was ever surfaced for Inworld sessions. Mirror the pattern used by deepgram, gladia, xai, and elevenlabs: accumulate audio frame durations via a PeriodicCollector and emit a RECOGNITION_USAGE event every 5 seconds. Flush on FlushSentinel and on input-channel close so the last partial window is reported.
Per review: STT page lives under /agents/models/stt/inworld/ (not /integrations/). Remove the now-redundant standalone example file — the same agent is inlined in the plugin README.
tinalenguyen
approved these changes
Apr 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
livekit-plugins-inworldpackage via Inworld's WebSocket STT API (wss://api.inworld.ai/stt/v1/transcribe:streamBidirectional)tasks_groupcleanup, stream tracking, START/END_OF_SPEECH events, and request_idinworld/inworld-stt-1)endOfTurnConfidenceThreshold=0.3,minEndOfTurnSilenceWhenConfident=200ms) for low-latency voice agent useSpeechData.metadata["voice_profile"]update_options()for runtime config changesmetadata: dict[str, Any] | Nonefield toSpeechDatainlivekit-agentsfor plugin-specific data (backwards-compatible, defaults toNone)Changes
livekit-agents/livekit/agents/stt/stt.py— Add optionalmetadatafield toSpeechDatafor plugin-specific datalivekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/stt.py— New STT + SpeechStream implementationlivekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/__init__.py— Add STT/SpeechStream exportslivekit-plugins/livekit-plugins-inworld/pyproject.toml— Update description/keywords for STT+TTSlivekit-plugins/livekit-plugins-inworld/README.md— Add STT usage docsexamples/voice_agents/inworld_agent.py— Voice agent using Inworld STT+TTS.gitignore— Add.envUsage
Test plan
update_options()works for runtime model changesruff checkandruff formatpassmypytype checks pass (resolved all 11 prior errors)SpeechDatamay need separate PR if CI runs cross-package)