Skip to content

feat(deepgram): support flux-general-multi and language_hint in STTv2 #5480

@robertvy

Description

@robertvy

Feature Request

Please add flux-general-multi and repeatable language_hint support to the Deepgram STTv2 plugin.

Why this belongs in the plugin

livekit-plugins-deepgram already implements the Deepgram v2 streaming API directly over WebSocket. It is not blocked on Deepgram SDK support for this feature.

Relevant plugin files:

  • livekit/plugins/deepgram/stt_v2.py
  • livekit/plugins/deepgram/_utils.py
  • livekit/plugins/deepgram/models.py

Today the plugin already supports Flux v2 for English-only use cases via flux-general-en, but it does not expose the Flux multilingual path.

What is missing today

  1. flux-general-multi is not included in the plugin's v2 model set.
  2. language_hint is not exposed on STTv2 constructor or update_options.
  3. language_hint is not forwarded as repeated query parameters on the v2 WebSocket URL.
  4. Auto-detected language information is not surfaced cleanly for Flux multilingual responses.

Requested support

  1. Allow model="flux-general-multi" in STTv2.
  2. Add language_hint: list[str] | None support on:
    • constructor
    • update_options(...)
    • stream reconnection / URL generation
  3. Encode hints as repeated query params on /v2/listen, e.g.:
    • ...?model=flux-general-multi&language_hint=de&language_hint=en
  4. For flux-general-multi, use returned language metadata from the API when emitting transcript events instead of defaulting to a fixed plugin language.
  5. Optionally expose mid-stream reconfiguration for Flux multilingual if that fits the plugin API.

Relevant Deepgram docs

The key points from those docs:

  • flux-general-multi uses the same production v2 endpoint as Flux.
  • language_hint is only supported on flux-general-multi.
  • Hints are repeatable and optional.
  • Without hints, the model auto-detects the spoken language.
  • An EU endpoint is available for the multilingual model as well.

Why this matters

Flux multilingual is valuable for multilingual voice agents because it brings the same turn-aware / interruption-aware behavior as flux-general-en to supported non-English languages.

That means users can get Flux-style turn detection without staying on Nova-3 solely for multilingual traffic.

Implementation notes

This looks relatively contained:

  • _utils._to_deepgram_url(...) already uses urlencode(..., doseq=True), so repeatable language_hint query params fit naturally.
  • STTv2 already connects directly to /v2/listen over WebSocket.
  • The main plugin work appears to be:
    • model typing / validation
    • option plumbing
    • event parsing for detected language metadata

Concrete example

stt = deepgram.STTv2(
    model="flux-general-multi",
    language_hint=["de", "en"],
    eot_threshold=0.7,
)

or with auto-detect:

stt = deepgram.STTv2(
    model="flux-general-multi",
)

Thanks.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions