Skip to content

Support for Non-Streaming Mode with Language Detection in Deepgram STT #3142

@ringofhealth

Description

@ringofhealth

Problem Description

The current Deepgram STT implementation in LiveKit Agents Python SDK has a limitation where language detection (detect_language=True) cannot be used with the standard STT class because:

  1. The Deepgram STT class always advertises streaming capabilities in its constructor
  2. When detect_language=True is set, the stream() method raises an error stating that language detection is not supported in streaming mode
  3. There's no way to use language detection without creating a custom subclass

Current Behavior

        super().__init__(
            capabilities=stt.STTCapabilities(streaming=True, interim_results=interim_results)
        )

Expected Behavior

The STT class should automatically disable streaming capabilities when detect_language=True is set, allowing it to work seamlessly in non-streaming mode without requiring custom implementations.

Proposed Solution

Modify the Deepgram STT __init__ method to conditionally set streaming capabilities based on the detect_language parameter:

def __init__(
    self,
    *,
    model: DeepgramModels | str = "nova-3",
    language: DeepgramLanguages | str = "en-US", 
    detect_language: bool = False,
    # ... other parameters
) -> None:
    # Conditionally disable streaming when language detection is enabled
    super().__init__(
        capabilities=stt.STTCapabilities(
            streaming=not detect_language,  # Disable streaming if detect_language=True
            interim_results=interim_results
        )
    )
    # ... rest of initialization

Use Case

This is particularly important for voice agent applications that need to:

  1. Detect the caller's language automatically at the beginning of a conversation
  2. Use non-streaming recognition for the initial language detection phase
  3. Switch to language-specific agents after detection

Current Workaround

We currently have to create a custom subclass that overrides the initialization:

class NonStreamingDeepgramSTT(deepgram.STT):
    def __init__(self, *args, **kwargs):
        detect_language = kwargs.get("detect_language", False)
        interim_results = kwargs.get("interim_results", True)
        
        # Initialize parent stt.STT with appropriate capabilities
        stt.STT.__init__(
            self,
            capabilities=stt.STTCapabilities(
                streaming=not detect_language,
                interim_results=interim_results,
            ),
        )
        # ... manually set up all Deepgram attributes

Impact

This change would:

  • Make the Deepgram STT implementation more intuitive
  • Remove the need for custom subclasses when using language detection
  • Align with Deepgram's actual API capabilities (which supports language detection in non-streaming mode)

Version Information

  • LiveKit Agents Python SDK version: Latest
  • Deepgram plugin version: Latest

Additional Context

The Deepgram API itself supports language detection in non-streaming mode (prerecorded API), so this is purely a limitation of the LiveKit SDK wrapper, not the underlying service.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions