Add vad_threshold parameter to AssemblyAI STT plugin by AhmadIbrahiim · Pull Request #4880 · livekit/agents

AhmadIbrahiim · 2026-02-18T01:38:08Z

Summary

Add support for the vad_threshold parameter from AssemblyAI's streaming API
This allows users to configure VAD sensitivity (0-1 range) for different audio environments

Changes

Added vad_threshold to STTOptions dataclass
Added vad_threshold parameter to STT.__init__()
Added vad_threshold to STT.update_options() and SpeechStream.update_options()
Include vad_threshold in WebSocket connection config

API Reference

The threshold for voice activity detection (VAD). A value between 0 and 1 that determines how sensitive the VAD is. Lower values make the VAD more sensitive, meaning it will detect quieter speech. Higher values make the VAD less sensitive. The default value is 0.4

Closes #4879

Add support for the vad_threshold parameter from AssemblyAI's streaming API. This allows users to configure VAD sensitivity (0-1 range) for different audio environments. Closes livekit#4879

Add unit tests covering: - Default value (NOT_GIVEN) - Setting value in constructor - Boundary values (0 and 1) - Dynamic updates via update_options - Interaction with other options - Partial updates

AhmadIbrahiim · 2026-02-18T01:46:52Z

Hi @theomonnom @davidzhao 👋

This PR adds support for the vad_threshold parameter from AssemblyAI's streaming API. We need this to configure VAD sensitivity for different audio environments.

Would appreciate a review when you have a moment. Thanks!

chenghao-mou · 2026-02-18T09:40:22Z

/test-stt

github-actions · 2026-02-18T09:44:24Z

STT Test Results

Status: ✗ Some tests failed

Metric	Count
✓ Passed	20
✗ Failed	3
× Errors	1
→ Skipped	15
▣ Total	39
⏱ Duration	191.8s

Failed Tests

tests.test_stt::test_recognize[livekit.plugins.google]

self = <livekit.plugins.google.stt.STT object at 0x7f2dc5297cb0>
buffer = [rtc.AudioFrame(sample_rate=24000, num_channels=1, samples_per_channel=1119168, duration=46.632)]

    async def _recognize_impl(
        self,
        buffer: utils.AudioBuffer,
        *,
        language: NotGivenOr[SpeechLanguages | str] = NOT_GIVEN,
        conn_options: APIConnectOptions,
    ) -> stt.SpeechEvent:
        frame = rtc.combine_audio_frames(buffer)
  
        config = self._build_recognition_config(
            sample_rate=frame.sample_rate,
            num_channels=frame.num_channels,
            language=language,
        )
  
        try:
            async with self._pool.connection(timeout=conn_options.timeout) as client:
>               raw = await client.recognize(
                    self._build_recognition_request(client, config, frame.data.tobytes()),
                    timeout=conn_options.timeout,
                )

livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py:368: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <google.cloud.speech_v1.services.speech.async_client.SpeechAsyncClient object at 0x7f2dc52cd400>
request = config {
  encoding: LINEAR16
  sample_rate_hertz: 24000
  language_code: "en-US"
  audio_channel_count: 1
  enable_wo...000\000\000\000\000\000\000\001\000\001\000\001\000\000\000\000\000\001\000\377\377\377\377\000\000\000\000\000\000"
}


    async def recognize(
        self,
        request: Optional[Union[cloud_speech.RecognizeRequest, dict]] = None,
        *,
        config: Optional[cloud_speech.RecognitionConfig] = None,
        audio: Optional[cloud_speech.RecognitionAudio] = None,
        retry: OptionalRetry = gapic_v1.method.DEFAULT,
        timeout: Union[float, object] = gapic_v1.method.DEFAULT,
        metadata: Sequence[Tuple[str, Union[str, bytes]]] = (),
    ) -> cloud_speech.RecognizeResponse:
        r"""Performs synchronous speech recognition: receive

tests.test_stt::test_stream[livekit.plugins.speechmatics]

def finalizer() -> None:
        """Yield again, to finalize."""
  
        async def async_finalizer() -> None:
            try:
                await gen_obj.__anext__()
            except StopAsyncIteration:
                pass
            else:
                msg = "Async generator fixture didn't stop."
                msg += "Yield only once."
                raise ValueError(msg)
  
>       runner.run(async_finalizer(), context=context)

.venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py:330: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <asyncio.runners.Runner object at 0x7f4f66b55010>
coro = <coroutine object _wrap_asyncgen_fixture.<locals>._asyncgen_fixture_wrapper.<locals>.finalizer.<locals>.async_finalizer at 0x7f4f66b43c60>

    def run(self, coro, *, context=None):
        """Run a coroutine inside the embedded event loop."""
        if not coroutines.iscoroutine(coro):
            raise ValueError("a coroutine was expected, got {!r}".format(coro))
  
        if events._get_running_loop() is not None:
            # fail fast with short traceback
            raise RuntimeError(
                "Runner.run() cannot be called from a running event loop")
  
        self._lazy_init()
  
        if context is None:
            context = self._context
        task = self._loop.create_task(coro, context=context)
  
        if (threading.current_thread() is threading.main_thread()
            and signal.getsignal(signal.SIGINT) is signal.default_int_handler
        ):
            sigint_handler = functools.partial(self._on_sigint, main_task=task)
            try:
                signal.signal(signal.SIGINT, sigint_handler)
            except ValueError:
                # `signal.signal` may throw if `threading.main_thread` does
                # not support signals (e.g. embedded interpreter with signals
                # not registered - see gh-91880)
                sigint_handler = None

tests.test_stt::test_stream[livekit.agents.inference]

stt_factory = <function parameter_factory.<locals>.<lambda> at 0x7f2dc520dee0>
request = <FixtureRequest for <Coroutine test_stream[livekit.agents.inference]>>

    @pytest.mark.usefixtures("job_process")
    @pytest.mark.parametrize("stt_factory", STTs)
    async def test_stream(stt_factory: Callable[[], STT], request):
        sample_rate = SAMPLE_RATE
        plugin_id = request.node.callspec.id.split("-")[0]
        frames, transcript, _ = await make_test_speech(chunk_duration_ms=10, sample_rate=sample_rate)
  
        # TODO: differentiate missing key vs other errors
        try:
            stt_instance: STT = stt_factory()
        except ValueError as e:
            pytest.skip(f"{plugin_id}: {e}")
  
        async with stt_instance as stt:
            label = f"{stt.model}@{stt.provider}"
            if not stt.capabilities.streaming:
                pytest.skip(f"{label} does not support streaming")
  
            for attempt in range(MAX_RETRIES):
                try:
                    state = {"closing": False}
  
                    async def _stream_input(
                        frames: list[rtc.AudioFrame], stream: RecognizeStream, state: dict = state
                    ):
                        for frame in frames:
                            stream.push_frame(frame)
                            await asyncio.sleep(0.005)
  
                        stream.end_input()
                        state["closing"] = True
  
                    async def _stream_output(stream: RecognizeStream, state: dict = state):
                        text = ""
                        # make sure the events are sent in the right order
                        recv_start, recv_end = False, True
                        start_time = time.time()
                        got_final_transcript = False
  
                        async for event in stream:
                            if event.type == agents.stt.SpeechEventType.START_OF_SPEECH:

tests.test_stt::test_stream[livekit.plugins.google]

self = <google.api_core.grpc_helpers_async._WrappedStreamStreamCall object at 0x7f4f75c5af60>

    async def _wrapped_aiter(self) -> AsyncGenerator[P, None]:
        try:
            # NOTE(lidiz) coverage doesn't understand the exception raised from
            # __anext__ method. It is covered by test case:
            #     test_wrap_stream_errors_aiter_non_rpc_error
>           async for response in self._call:  # pragma: no branch

.venv/lib/python3.12/site-packages/google/api_core/grpc_helpers_async.py:107: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <_AioCall of RPC that terminated with:
	status = Request had invalid authentication credentials. Expected OAuth 2 acce... other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project."}"
>

    async def _fetch_stream_responses(self) -> ResponseType:
        message = await self._read()
        while message is not cygrpc.EOF:
            yield message
            message = await self._read()
  
        # If the read operation failed, Core should explain why.
>       await self._raise_for_status()

.venv/lib/python3.12/site-packages/grpc/aio/_call.py:364: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <_AioCall of RPC that terminated with:
	status = Request had invalid authentication credentials. Expected OAuth 2 acce... other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project."}"
>

    async def _raise_for_status(self) -> None:
        if self._cython_call.is_locally_cancelled():
            raise asyncio.CancelledError()
        code = await self.code()
        if code != grpc.StatusCode.OK:
>           raise _create_rpc_error(
                await self.initial_metadata(),
                await self._cython_call.status(),
            )
E           grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with

Skipped Tests

Test	Reason
`tests.test_stt::test_recognize[livekit.plugins.assemblyai]`	universal-streaming-english@AssemblyAI does not support batch recognition
`tests.test_stt::test_recognize[livekit.plugins.speechmatics]`	enhanced@Speechmatics does not support batch recognition
`tests.test_stt::test_recognize[livekit.plugins.fireworksai]`	unknown@FireworksAI does not support batch recognition
`tests.test_stt::test_recognize[livekit.plugins.cartesia]`	ink-whisper@Cartesia does not support batch recognition
`tests.test_stt::test_recognize[livekit.plugins.soniox]`	stt-rt-v3@Soniox does not support batch recognition
`tests.test_stt::test_recognize[livekit.plugins.aws]`	unknown@Amazon Transcribe does not support batch recognition
`tests.test_stt::test_recognize[livekit.plugins.nvidia]`	unknown@unknown does not support batch recognition
`tests.test_stt::test_recognize[livekit.plugins.deepgram.STTv2]`	flux-general-en@Deepgram does not support batch recognition
`tests.test_stt::test_recognize[livekit.plugins.gradium.STT]`	unknown@Gradium does not support batch recognition
`tests.test_stt::test_recognize[livekit.agents.inference]`	unknown@livekit does not support batch recognition
`tests.test_stt::test_recognize[livekit.plugins.azure]`	unknown@Azure STT does not support batch recognition
`tests.test_stt::test_stream[livekit.plugins.elevenlabs]`	scribe_v1@ElevenLabs does not support streaming
`tests.test_stt::test_stream[livekit.plugins.mistralai]`	voxtral-mini-latest@MistralAI does not support streaming
`tests.test_stt::test_stream[livekit.plugins.openai]`	gpt-4o-mini-transcribe@api.openai.com does not support streaming
`tests.test_stt::test_stream[livekit.plugins.fal]`	Wizper@Fal does not support streaming

Triggered by workflow run #881

AhmadIbrahiim · 2026-02-18T18:06:55Z

/test-stt

@chenghao-mou I see 3 failed tests -- I don't think it's related to the changes in this PR?

chenghao-mou · 2026-02-18T18:28:07Z

Yeah, it should be fine. I will review and test this by EOW.

chenghao-mou

LGTM. One minor issue about the docstring.

chenghao-mou · 2026-02-19T10:59:55Z

livekit-plugins/livekit-plugins-assemblyai/livekit/plugins/assemblyai/stt.py

+            vad_threshold: The threshold for voice activity detection (VAD). A value between
+                0 and 1 that determines how sensitive the VAD is. Lower values make the VAD
+                more sensitive (detects quieter speech). Higher values make it less sensitive.
+                Defaults to 0.5.


nitpicking: the official default value is 0.4.

Updated ✅

devin-ai-integration

Devin Review found 1 new potential issue.

View 5 additional findings in Devin Review.

devin-ai-integration · 2026-02-19T18:54:51Z

livekit-plugins/livekit-plugins-assemblyai/livekit/plugins/assemblyai/stt.py

+            vad_threshold: The threshold for voice activity detection (VAD). A value between
+                0 and 1 that determines how sensitive the VAD is. Lower values make the VAD
+                more sensitive (detects quieter speech). Higher values make it less sensitive.
+                Defaults to 0.4.


🟡 Docstring states incorrect default value for vad_threshold (0.4 vs 0.5)

The docstring for vad_threshold at line 93 states "Defaults to 0.4" but the AssemblyAI API documentation (referenced in the PR description itself) states the server-side default is 0.5.

Detailed Explanation

The PR description directly quotes AssemblyAI's docs:

The default value is 0.5.

However, the docstring at livekit-plugins/livekit-plugins-assemblyai/livekit/plugins/assemblyai/stt.py:93 says:

Defaults to 0.4.

Since vad_threshold defaults to NOT_GIVEN and is sent as None (thus omitted from the WebSocket URL query parameters), the actual default applied is whatever AssemblyAI's server uses — which is 0.5 per their docs. Users reading this docstring would be misled about the effective default behavior of the VAD sensitivity.

Impact: Users relying on the documented default of 0.4 to understand the VAD sensitivity behavior will have incorrect expectations. The actual server-side default is 0.5 (less sensitive than documented).

Suggested change

Defaults to 0.4.

Defaults to 0.5.

Was this helpful? React with 👍 or 👎 to provide feedback.

AhmadIbrahiim added 3 commits February 17, 2026 17:37

Add vad_threshold parameter to AssemblyAI STT plugin

ca38475

Add support for the vad_threshold parameter from AssemblyAI's streaming API. This allows users to configure VAD sensitivity (0-1 range) for different audio environments. Closes livekit#4879

Add tests for AssemblyAI vad_threshold parameter

d43ac5c

Add unit tests covering: - Default value (NOT_GIVEN) - Setting value in constructor - Boundary values (0 and 1) - Dynamic updates via update_options - Interaction with other options - Partial updates

Fix formatting in test file

ba9da90

chenghao-mou self-assigned this Feb 18, 2026

chenghao-mou approved these changes Feb 19, 2026

View reviewed changes

Fix vad_threshold default value documentation to 0.4

767b5f4

devin-ai-integration bot reviewed Feb 19, 2026

View reviewed changes

chenghao-mou merged commit 1cb9c53 into livekit:main Feb 19, 2026
9 checks passed

AhmadIbrahiim deleted the add-assemblyai-vad-threshold branch February 19, 2026 19:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add vad_threshold parameter to AssemblyAI STT plugin#4880

Add vad_threshold parameter to AssemblyAI STT plugin#4880
chenghao-mou merged 4 commits intolivekit:mainfrom
AhmadIbrahiim:add-assemblyai-vad-threshold

AhmadIbrahiim commented Feb 18, 2026 •

edited

Loading

Uh oh!

AhmadIbrahiim commented Feb 18, 2026 •

edited

Loading

Uh oh!

chenghao-mou commented Feb 18, 2026

Uh oh!

github-actions bot commented Feb 18, 2026

Uh oh!

AhmadIbrahiim commented Feb 18, 2026

Uh oh!

chenghao-mou commented Feb 18, 2026

Uh oh!

chenghao-mou left a comment

Uh oh!

chenghao-mou Feb 19, 2026

Uh oh!

AhmadIbrahiim Feb 19, 2026

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

AhmadIbrahiim commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

API Reference

Uh oh!

AhmadIbrahiim commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chenghao-mou commented Feb 18, 2026

Uh oh!

github-actions bot commented Feb 18, 2026

STT Test Results

Uh oh!

AhmadIbrahiim commented Feb 18, 2026

Uh oh!

chenghao-mou commented Feb 18, 2026

Uh oh!

chenghao-mou left a comment

Choose a reason for hiding this comment

Uh oh!

chenghao-mou Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

AhmadIbrahiim Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AhmadIbrahiim commented Feb 18, 2026 •

edited

Loading

AhmadIbrahiim commented Feb 18, 2026 •

edited

Loading