-
Notifications
You must be signed in to change notification settings - Fork 3k
Support for Non-Streaming Mode with Language Detection in Deepgram STT #3142
Description
Problem Description
The current Deepgram STT implementation in LiveKit Agents Python SDK has a limitation where language detection (detect_language=True) cannot be used with the standard STT class because:
- The Deepgram STT class always advertises streaming capabilities in its constructor
- When
detect_language=Trueis set, the stream() method raises an error stating that language detection is not supported in streaming mode - There's no way to use language detection without creating a custom subclass
Current Behavior
super().__init__(
capabilities=stt.STTCapabilities(streaming=True, interim_results=interim_results)
)Expected Behavior
The STT class should automatically disable streaming capabilities when detect_language=True is set, allowing it to work seamlessly in non-streaming mode without requiring custom implementations.
Proposed Solution
Modify the Deepgram STT __init__ method to conditionally set streaming capabilities based on the detect_language parameter:
def __init__(
self,
*,
model: DeepgramModels | str = "nova-3",
language: DeepgramLanguages | str = "en-US",
detect_language: bool = False,
# ... other parameters
) -> None:
# Conditionally disable streaming when language detection is enabled
super().__init__(
capabilities=stt.STTCapabilities(
streaming=not detect_language, # Disable streaming if detect_language=True
interim_results=interim_results
)
)
# ... rest of initializationUse Case
This is particularly important for voice agent applications that need to:
- Detect the caller's language automatically at the beginning of a conversation
- Use non-streaming recognition for the initial language detection phase
- Switch to language-specific agents after detection
Current Workaround
We currently have to create a custom subclass that overrides the initialization:
class NonStreamingDeepgramSTT(deepgram.STT):
def __init__(self, *args, **kwargs):
detect_language = kwargs.get("detect_language", False)
interim_results = kwargs.get("interim_results", True)
# Initialize parent stt.STT with appropriate capabilities
stt.STT.__init__(
self,
capabilities=stt.STTCapabilities(
streaming=not detect_language,
interim_results=interim_results,
),
)
# ... manually set up all Deepgram attributesImpact
This change would:
- Make the Deepgram STT implementation more intuitive
- Remove the need for custom subclasses when using language detection
- Align with Deepgram's actual API capabilities (which supports language detection in non-streaming mode)
Version Information
- LiveKit Agents Python SDK version: Latest
- Deepgram plugin version: Latest
Additional Context
The Deepgram API itself supports language detection in non-streaming mode (prerecorded API), so this is purely a limitation of the LiveKit SDK wrapper, not the underlying service.