Skip to content

(gladia & soniox): add translation support#5148

Merged
tinalenguyen merged 5 commits intomainfrom
tina/support-translations-speechdata
Mar 20, 2026
Merged

(gladia & soniox): add translation support#5148
tinalenguyen merged 5 commits intomainfrom
tina/support-translations-speechdata

Conversation

@tinalenguyen
Copy link
Member

add input_language and input_text to SpeechData and add translation support for gladia and soniox

closes #4943 and #4402

to test, iterate through STT node:

async def stt_node(
        self, audio: AsyncIterable, model_settings: ModelSettings
    ) -> AsyncIterable[stt_module.SpeechEvent]:
        async for event in Agent.default.stt_node(self, audio, model_settings):
            if isinstance(event, stt_module.SpeechEvent) and event.alternatives:
                alt = event.alternatives[0]
                if alt.input_language:
                    logger.info(
                        f"[STT translation] input_language={alt.input_language}, "
                        f"language={alt.language}, "
                        f"input_text={alt.input_text!r}, text={alt.text!r}"
                    )
            yield event

@chenghao-mou chenghao-mou requested a review from a team March 18, 2026 21:19
devin-ai-integration[bot]

This comment was marked as resolved.

MSameerAbbas added a commit to MSameerAbbas/agents that referenced this pull request Mar 20, 2026
…dpoint

Emit END_OF_SPEECH based on speaking state, not final text presence.
Previously both were inside the same conditional, so if an error or
finished message arrived while speaking but before final tokens
accumulated, END_OF_SPEECH was skipped. This left downstream consumers
in speaking state with no turn detection triggered.

Only affects agents using turn_detection=stt (no VAD). Pre-existing
bug also present on main and livekit#5148.
Copy link
Member

@chenghao-mou chenghao-mou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. One small thing I noticed is that the translation works with code-switching, but the source/input language only supports one value.

input_language: LanguageCode | None = None
"""the detected/input language spoken by the user. populated by STT services that support translation,
where `language` holds the target language and `input_language` holds the original spoken language"""
input_text: str | None = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: borrowing terms from machine translation terminology, we should name them source_language, and source_text.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed to source_languages and source_texts, for the polyglots

# Reset speaking state, so the next transcript will send START_OF_SPEECH again.
is_speaking = False
else:
final_original.reset()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to reset this here

MSameerAbbas added a commit to MSameerAbbas/agents that referenced this pull request Mar 20, 2026
…dpoint

Emit END_OF_SPEECH based on speaking state, not final text presence.
Previously both were inside the same conditional, so if an error or
finished message arrived while speaking but before final tokens
accumulated, END_OF_SPEECH was skipped. This left downstream consumers
in speaking state with no turn detection triggered.

Only affects agents using turn_detection=stt (no VAD). Pre-existing
bug also present on main and livekit#5148.
devin-ai-integration[bot]

This comment was marked as resolved.

@tinalenguyen tinalenguyen merged commit 5f13474 into main Mar 20, 2026
14 of 22 checks passed
@tinalenguyen tinalenguyen deleted the tina/support-translations-speechdata branch March 20, 2026 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Soniox Real-Time Translation Support to livekit-plugins-soniox

2 participants