[Request for clarification] Meaning of signalMs in VAD messages #144

koenvervloesem · 2019-05-17T13:02:56Z

According to the documentation of the Voice Activity Detection of the wake word detector, setting vad_messages to true will let snips-hotword send messages on the MQTT topics hermes/voiceActivity/<siteid>/vadDown and hermes/voiceActivity/<siteid>/vadUp. I did this and I saw that the JSON payload has a "signalMs" key with weird values like -390. What is the meaning of this value?

The text was updated successfully, but these errors were encountered:

kali · 2019-05-17T14:22:44Z

Hey, thanks for the report !

The value is supposed to be the audio timestamp as set by the matching audio-server. We think of two ways for you to observe weird values:

the client library and parser you're using is confused by the 64-bit value
the timestamps in the audio stream reaching your wakeword engine itself is already broken.

You can check what is happening over the MQTT bus directly like that. If the timestamps look right there, it means the issue is in the mqtt / json client library.

$ mosquitto_sub -t '#' -T hermes/audioServer/default/audioFrame
{"siteId":"default","signalMs":1558102591756}
{"siteId":"default","signalMs":1558102592491}

koenvervloesem · 2019-05-17T14:47:16Z

Hi @kali, thanks for your response! That makes sense, as I'm seeing these negative values in the hermes/voiceActivity/# topics of snips-hotword when using my own reimplementation of the audio server, while I see sensible values like 1558103849185 using snips-audio-server.

But from the documentation of audioFrame in the Hermes protocol I had the impression that the audio server just publishes a binary payload with a WAV of the sound frame. So I don't understand "the timestamps in the audio stream reaching your wakeword engine itself is already broken." How does the audio server publish these timestamps?

kali · 2019-05-17T15:11:33Z

Well, the documentation only covers the minimal implementation of a working audio-server. The full implementation is significantly more complex (in addition to the timestamps, more mqtt messages to support rewind and replay features). We have not made the format public at the current point, as there really is a big gap in complexity and implicit assumptions.

Could you tell us what is the rationale for running your own implementation ?

koenvervloesem · 2019-05-17T15:28:28Z

OK, now I understand. My rationale is that I wanted to use an open source audio server as a "satellite" for Rhasspy, which understands part of the Hermes protocol. Because snips-audio-server is not open source, I had to reimplement a minimal implementation of it. But in the spirit of being a good citizen in the world of the protocol I'm using, I want my implementation to play nice with the Snips services. Which it is (as I can just swap snips-audio-server for my minimal implementation and my Snips assistant keeps working), apart from these weird values I'm seeing in the VAD messages of snips-hotword.

kali · 2019-05-17T15:32:05Z

Would an option be to use the snips-audio-server and somewhat pull, reformat and forward snips-audio-server message to Rhasspy ?

kali · 2019-05-17T15:34:37Z

I'm not sure what part of snips-platform you are using, but the rewind and replay is what allow us to reduce drastically the necessary gap between hotword detection and asr start of decoding, so depending on your exact use case, you may loose more than the timestamps.

koenvervloesem · 2019-05-17T15:53:09Z

Well, there's no real issue for me, as my implementation just works with Rhasspy and it seems to work with Snips too. This was just a request for more information because I wanted to stay faithful to the Hermes protocol. But if parts are not fully documented yet and I don't need them for my use case, it's not a big deal that I can't take them into account, so you can close this issue. Thanks for your clarification. But I do hope the complete protocol will be published sometime :-)

kali · 2019-05-17T15:56:04Z

Ok, thanks. And yes, we will try to find a way to handle these extensions.

fredszaq · 2019-05-20T16:14:37Z

Hi @koenvervloesem, we've just merged a bit of doc in hermes explaining the exact format used by the audio server, you can have a look a it here: https://github.com/snipsco/hermes-protocol/blob/develop/hermes/src/ontology/audio_server.rs#L25..L87

(note that this may change in the future)

koenvervloesem · 2019-05-20T16:18:15Z

@fredszaq Many thanks for publishing the exact format, much appreciated! I had already figured out the timestamps, but I wouldn't have found the rest by myself...

cpoisson added the type: support Support issue ( should not exist in this project ;) label May 17, 2019

cpoisson self-assigned this May 17, 2019

kali self-assigned this May 17, 2019

kali closed this as completed May 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Request for clarification] Meaning of signalMs in VAD messages #144

[Request for clarification] Meaning of signalMs in VAD messages #144

koenvervloesem commented May 17, 2019

kali commented May 17, 2019

koenvervloesem commented May 17, 2019

kali commented May 17, 2019

koenvervloesem commented May 17, 2019 •

edited

Loading

kali commented May 17, 2019

kali commented May 17, 2019

koenvervloesem commented May 17, 2019

kali commented May 17, 2019

fredszaq commented May 20, 2019

koenvervloesem commented May 20, 2019

[Request for clarification] Meaning of signalMs in VAD messages #144

[Request for clarification] Meaning of signalMs in VAD messages #144

Comments

koenvervloesem commented May 17, 2019

kali commented May 17, 2019

koenvervloesem commented May 17, 2019

kali commented May 17, 2019

koenvervloesem commented May 17, 2019 • edited Loading

kali commented May 17, 2019

kali commented May 17, 2019

koenvervloesem commented May 17, 2019

kali commented May 17, 2019

fredszaq commented May 20, 2019

koenvervloesem commented May 20, 2019

koenvervloesem commented May 17, 2019 •

edited

Loading