Add vad_threshold parameter to AssemblyAI STT plugin#4880
Add vad_threshold parameter to AssemblyAI STT plugin#4880chenghao-mou merged 4 commits intolivekit:mainfrom
Conversation
Add support for the vad_threshold parameter from AssemblyAI's streaming API. This allows users to configure VAD sensitivity (0-1 range) for different audio environments. Closes livekit#4879
Add unit tests covering: - Default value (NOT_GIVEN) - Setting value in constructor - Boundary values (0 and 1) - Dynamic updates via update_options - Interaction with other options - Partial updates
|
Hi @theomonnom @davidzhao 👋 This PR adds support for the Would appreciate a review when you have a moment. Thanks! |
|
/test-stt |
STT Test ResultsStatus: ✗ Some tests failed
Failed Tests
Skipped Tests
Triggered by workflow run #881 |
@chenghao-mou I see 3 failed tests -- I don't think it's related to the changes in this PR? |
|
Yeah, it should be fine. I will review and test this by EOW. |
chenghao-mou
left a comment
There was a problem hiding this comment.
LGTM. One minor issue about the docstring.
| vad_threshold: The threshold for voice activity detection (VAD). A value between | ||
| 0 and 1 that determines how sensitive the VAD is. Lower values make the VAD | ||
| more sensitive (detects quieter speech). Higher values make it less sensitive. | ||
| Defaults to 0.5. |
| vad_threshold: The threshold for voice activity detection (VAD). A value between | ||
| 0 and 1 that determines how sensitive the VAD is. Lower values make the VAD | ||
| more sensitive (detects quieter speech). Higher values make it less sensitive. | ||
| Defaults to 0.4. |
There was a problem hiding this comment.
🟡 Docstring states incorrect default value for vad_threshold (0.4 vs 0.5)
The docstring for vad_threshold at line 93 states "Defaults to 0.4" but the AssemblyAI API documentation (referenced in the PR description itself) states the server-side default is 0.5.
Detailed Explanation
The PR description directly quotes AssemblyAI's docs:
The default value is 0.5.
However, the docstring at livekit-plugins/livekit-plugins-assemblyai/livekit/plugins/assemblyai/stt.py:93 says:
Defaults to 0.4.
Since vad_threshold defaults to NOT_GIVEN and is sent as None (thus omitted from the WebSocket URL query parameters), the actual default applied is whatever AssemblyAI's server uses — which is 0.5 per their docs. Users reading this docstring would be misled about the effective default behavior of the VAD sensitivity.
Impact: Users relying on the documented default of 0.4 to understand the VAD sensitivity behavior will have incorrect expectations. The actual server-side default is 0.5 (less sensitive than documented).
| Defaults to 0.4. | |
| Defaults to 0.5. | |
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
vad_thresholdparameter from AssemblyAI's streaming APIChanges
vad_thresholdtoSTTOptionsdataclassvad_thresholdparameter toSTT.__init__()vad_thresholdtoSTT.update_options()andSpeechStream.update_options()vad_thresholdin WebSocket connection configAPI Reference
From AssemblyAI's docs:
Closes #4879