Skip to content

Add Inworld STT provider to livekit-plugins-inworld#5451

Merged
tinalenguyen merged 9 commits intolivekit:mainfrom
cshape:inworld-stt
Apr 21, 2026
Merged

Add Inworld STT provider to livekit-plugins-inworld#5451
tinalenguyen merged 9 commits intolivekit:mainfrom
cshape:inworld-stt

Conversation

@cshape
Copy link
Copy Markdown
Contributor

@cshape cshape commented Apr 14, 2026

Summary

  • Add streaming speech-to-text support to the existing livekit-plugins-inworld package via Inworld's WebSocket STT API (wss://api.inworld.ai/stt/v1/transcribe:streamBidirectional)
  • Follows established plugin patterns (Soniox, Deepgram, Google) including reconnection loop with tasks_group cleanup, stream tracking, START/END_OF_SPEECH events, and request_id
  • Pass-through model selection — any model string accepted (default: inworld/inworld-stt-1)
  • Eager end-of-turn defaults (endOfTurnConfidenceThreshold=0.3, minEndOfTurnSilenceWhenConfident=200ms) for low-latency voice agent use
  • Voice profile detection enabled by default — age, gender, emotion, accent data surfaced via SpeechData.metadata["voice_profile"]
  • update_options() for runtime config changes
  • Adds metadata: dict[str, Any] | None field to SpeechData in livekit-agents for plugin-specific data (backwards-compatible, defaults to None)

Changes

  • livekit-agents/livekit/agents/stt/stt.py — Add optional metadata field to SpeechData for plugin-specific data
  • livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/stt.py — New STT + SpeechStream implementation
  • livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/__init__.py — Add STT/SpeechStream exports
  • livekit-plugins/livekit-plugins-inworld/pyproject.toml — Update description/keywords for STT+TTS
  • livekit-plugins/livekit-plugins-inworld/README.md — Add STT usage docs
  • examples/voice_agents/inworld_agent.py — Voice agent using Inworld STT+TTS
  • .gitignore — Add .env

Usage

from livekit.plugins import inworld

session = AgentSession(
    stt=inworld.STT(),                          # default: inworld/inworld-stt-1
    tts=inworld.TTS(voice="Clive"),
    llm="openai/gpt-4.1-mini",
)

# Access voice profile metadata
@session.on("user_input_transcribed")
def on_input(ev):
    print(ev.transcript)
# Override model or tune end-of-turn behavior
stt = inworld.STT(
    model="assemblyai/universal-streaming-multilingual",
    end_of_turn_confidence_threshold=0.5,
    min_end_of_turn_silence_when_confident=400,
)

Test plan

  • Verified imports with SDK 1.5.2
  • Tested streaming transcription end-to-end with 26s audio file (interim + final transcripts)
  • Tested full agent workflow via console mode (Inworld STT + TTS + OpenAI LLM)
  • Verified update_options() works for runtime model changes
  • Verified no secrets in committed files
  • ruff check and ruff format pass
  • mypy type checks pass (resolved all 11 prior errors)
  • CI checks (framework change to SpeechData may need separate PR if CI runs cross-package)

devin-ai-integration[bot]

This comment was marked as resolved.

@cshape cshape force-pushed the inworld-stt branch 2 times, most recently from 0daf792 to 35acfa0 Compare April 14, 2026 20:20
Comment thread livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/stt.py Outdated
Comment thread livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/stt.py Outdated
@cshape cshape force-pushed the inworld-stt branch 2 times, most recently from a850403 to 82b9dd7 Compare April 14, 2026 20:27
Comment thread livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/stt.py Outdated
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Add streaming speech-to-text support via Inworld's WebSocket STT API
alongside the existing TTS plugin. The implementation follows established
LiveKit plugin patterns (Deepgram, Soniox, Google) including reconnection
logic, stream tracking, START/END_OF_SPEECH events, and request_id.

- STT class with streaming and sync recognition
- SpeechStream with reconnection loop and _reconnect_event
- WeakSet stream tracking for coordinated shutdown
- Pass-through model selection (no hardcoded model list)
- update_options() for runtime config changes
- Voice profile and VAD configuration support
- Example agent and standalone test script
Copy link
Copy Markdown
Member

@tinalenguyen tinalenguyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few comments, i also think the example would be the most useful and accessible in the plugin README

Comment thread livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/stt.py Outdated
Comment thread livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/stt.py Outdated
cshape added 2 commits April 21, 2026 12:59
… parsing helper, inline example in README, restore product-specific docs links
The landing page /agents/integrations/inworld/ 404s. Point at the TTS
and STT integration pages instead, matching the README.
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

cshape added 5 commits April 21, 2026 14:19
A final transcript with an empty transcript string (VAD false positive,
unrecognizable noise) was returning early before the END_OF_SPEECH block,
leaving _speaking=True. Subsequent speechStarted events were then ignored,
wedging the stream. Skip only the transcript event when text is empty; let
final events fall through to the end-of-speech emission.
The base class stt._metrics_monitor_task reports metrics_collected for
streaming STT off RECOGNITION_USAGE events; without them no usage was
ever surfaced for Inworld sessions. Mirror the pattern used by deepgram,
gladia, xai, and elevenlabs: accumulate audio frame durations via a
PeriodicCollector and emit a RECOGNITION_USAGE event every 5 seconds.
Flush on FlushSentinel and on input-channel close so the last partial
window is reported.
Per review: STT page lives under /agents/models/stt/inworld/ (not
/integrations/). Remove the now-redundant standalone example file —
the same agent is inlined in the plugin README.
@tinalenguyen tinalenguyen merged commit 3756780 into livekit:main Apr 21, 2026
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants