Dynamic VAD Mode #1318
Replies: 1 comment
-
|
I've implemented a simple proof-of-concept by patching the VoiceCommandSegmenter class in assist_pipeline/vad.py, and the initial results are very promising. It provides the snappy feel of 'aggressive' mode for short commands while being forgiving like 'relaxed' mode for longer sentences. For those who want to experiment, I'm attaching my modified vad.py below. Please note that this is an unsupported proof-of-concept. I'm intentionally omitting installation steps, as it involves modifying core files and is intended for advanced users who are comfortable with the process and understand the risks. Be sure to make a backup. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the feature
I'm proposing an enhancement to our Voice Activity Detection system to make it more adaptive based on the length of the user's spoken input. Currently, we have three predefined settings for detecting the end of speech:
I propose implementing a dynamic VAD mode that adjusts the silence threshold based on the spoken phrase's length. For example, start with an aggressive threshold (0.25 seconds) for the first 1.5 seconds, then gradually increase it to a relaxed threshold (1.25 seс) I believe the maximum threshold should be reached around the 4-5 second mark. The exact numbers (thresholds, ramp-up timing, and steps) would need further investigation and testing.
The longer our question, the more we're willing to wait for a response—an extra second doesn't cause discomfort.
Example commands
Affects all voice commands
Use cases
Improving the user experience and response speed
Anything else?
No response
Beta Was this translation helpful? Give feedback.
All reactions