Dynamic VAD Mode #1318

mitrokun · 2025-10-10T04:12:26Z

mitrokun
Oct 10, 2025

Describe the feature

I'm proposing an enhancement to our Voice Activity Detection system to make it more adaptive based on the length of the user's spoken input. Currently, we have three predefined settings for detecting the end of speech:

Aggressive mode: This works well for short commands, as it quickly detects silence and ends the Speech-to-Text (STT) phase. However, it makes it challenging to handle longer questions, as it might cut off the user prematurely.
Default mode: A compromise setting, offering a balance but still not ideal for very short or very long inputs. This example shows that detecting the end of speech takes as long as the command itself.

Relaxed mode: Conversely, this is great for long queries, allowing more time for natural pauses without interrupting. But it leads to excessive waiting times for short commands, which can feel unresponsive.

I propose implementing a dynamic VAD mode that adjusts the silence threshold based on the spoken phrase's length. For example, start with an aggressive threshold (0.25 seconds) for the first 1.5 seconds, then gradually increase it to a relaxed threshold (1.25 seс) I believe the maximum threshold should be reached around the 4-5 second mark. The exact numbers (thresholds, ramp-up timing, and steps) would need further investigation and testing.
The longer our question, the more we're willing to wait for a response—an extra second doesn't cause discomfort.

Example commands

Affects all voice commands

Use cases

Improving the user experience and response speed

Anything else?

No response

mitrokun · 2025-10-30T18:11:34Z

mitrokun
Oct 30, 2025
Author

I've implemented a simple proof-of-concept by patching the VoiceCommandSegmenter class in assist_pipeline/vad.py, and the initial results are very promising. It provides the snappy feel of 'aggressive' mode for short commands while being forgiving like 'relaxed' mode for longer sentences.
To be clear, this is just a quick hack to validate the concept. For a full implementation, further work would be needed:
More testing to fine-tune the timing thresholds. I've isolated this logic in a new _get_dynamic_silence_seconds method, so it's easy to adjust.
Properly adding a fourth 'dynamic' mode to the UI, which would require changes to other component files beyond just vad.py.

For those who want to experiment, I'm attaching my modified vad.py below. Please note that this is an unsupported proof-of-concept. I'm intentionally omitting installation steps, as it involves modifying core files and is intended for advanced users who are comfortable with the process and understand the risks. Be sure to make a backup.
vad.py

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home Assistant

Dynamic VAD Mode #1318

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Home Assistant

Dynamic VAD Mode #1318

Uh oh!

mitrokun Oct 10, 2025

Describe the feature

Example commands

Use cases

Anything else?

Replies: 1 comment

Uh oh!

mitrokun Oct 30, 2025 Author

mitrokun
Oct 10, 2025

mitrokun
Oct 30, 2025
Author