-
Notifications
You must be signed in to change notification settings - Fork 5
[Request for clarification] Meaning of signalMs in VAD messages #144
Comments
Hey, thanks for the report ! The value is supposed to be the audio timestamp as set by the matching audio-server. We think of two ways for you to observe weird values:
You can check what is happening over the MQTT bus directly like that. If the timestamps look right there, it means the issue is in the mqtt / json client library.
|
Hi @kali, thanks for your response! That makes sense, as I'm seeing these negative values in the But from the documentation of audioFrame in the Hermes protocol I had the impression that the audio server just publishes a binary payload with a WAV of the sound frame. So I don't understand "the timestamps in the audio stream reaching your wakeword engine itself is already broken." How does the audio server publish these timestamps? |
Well, the documentation only covers the minimal implementation of a working audio-server. The full implementation is significantly more complex (in addition to the timestamps, more mqtt messages to support rewind and replay features). We have not made the format public at the current point, as there really is a big gap in complexity and implicit assumptions. Could you tell us what is the rationale for running your own implementation ? |
OK, now I understand. My rationale is that I wanted to use an open source audio server as a "satellite" for Rhasspy, which understands part of the Hermes protocol. Because |
Would an option be to use the snips-audio-server and somewhat pull, reformat and forward snips-audio-server message to Rhasspy ? |
I'm not sure what part of snips-platform you are using, but the rewind and replay is what allow us to reduce drastically the necessary gap between hotword detection and asr start of decoding, so depending on your exact use case, you may loose more than the timestamps. |
Well, there's no real issue for me, as my implementation just works with Rhasspy and it seems to work with Snips too. This was just a request for more information because I wanted to stay faithful to the Hermes protocol. But if parts are not fully documented yet and I don't need them for my use case, it's not a big deal that I can't take them into account, so you can close this issue. Thanks for your clarification. But I do hope the complete protocol will be published sometime :-) |
Ok, thanks. And yes, we will try to find a way to handle these extensions. |
Hi @koenvervloesem, we've just merged a bit of doc in hermes explaining the exact format used by the audio server, you can have a look a it here: https://github.com/snipsco/hermes-protocol/blob/develop/hermes/src/ontology/audio_server.rs#L25..L87 (note that this may change in the future) |
@fredszaq Many thanks for publishing the exact format, much appreciated! I had already figured out the timestamps, but I wouldn't have found the rest by myself... |
According to the documentation of the Voice Activity Detection of the wake word detector, setting
vad_messages
totrue
will letsnips-hotword
send messages on the MQTT topicshermes/voiceActivity/<siteid>/vadDown
andhermes/voiceActivity/<siteid>/vadUp
. I did this and I saw that the JSON payload has a "signalMs" key with weird values like -390. What is the meaning of this value?The text was updated successfully, but these errors were encountered: