Skip to content
This repository has been archived by the owner on Dec 11, 2019. It is now read-only.

[Request for clarification] Meaning of signalMs in VAD messages #144

Closed
koenvervloesem opened this issue May 17, 2019 · 10 comments
Closed
Assignees
Labels
type: support Support issue ( should not exist in this project ;)

Comments

@koenvervloesem
Copy link

According to the documentation of the Voice Activity Detection of the wake word detector, setting vad_messages to true will let snips-hotword send messages on the MQTT topics hermes/voiceActivity/<siteid>/vadDown and hermes/voiceActivity/<siteid>/vadUp. I did this and I saw that the JSON payload has a "signalMs" key with weird values like -390. What is the meaning of this value?

@cpoisson cpoisson added the type: support Support issue ( should not exist in this project ;) label May 17, 2019
@cpoisson cpoisson self-assigned this May 17, 2019
@kali
Copy link

kali commented May 17, 2019

Hey, thanks for the report !

The value is supposed to be the audio timestamp as set by the matching audio-server. We think of two ways for you to observe weird values:

  • the client library and parser you're using is confused by the 64-bit value
  • the timestamps in the audio stream reaching your wakeword engine itself is already broken.

You can check what is happening over the MQTT bus directly like that. If the timestamps look right there, it means the issue is in the mqtt / json client library.

$ mosquitto_sub -t '#' -T hermes/audioServer/default/audioFrame
{"siteId":"default","signalMs":1558102591756}
{"siteId":"default","signalMs":1558102592491}

@koenvervloesem
Copy link
Author

Hi @kali, thanks for your response! That makes sense, as I'm seeing these negative values in the hermes/voiceActivity/# topics of snips-hotword when using my own reimplementation of the audio server, while I see sensible values like 1558103849185 using snips-audio-server.

But from the documentation of audioFrame in the Hermes protocol I had the impression that the audio server just publishes a binary payload with a WAV of the sound frame. So I don't understand "the timestamps in the audio stream reaching your wakeword engine itself is already broken." How does the audio server publish these timestamps?

@kali kali self-assigned this May 17, 2019
@kali
Copy link

kali commented May 17, 2019

Well, the documentation only covers the minimal implementation of a working audio-server. The full implementation is significantly more complex (in addition to the timestamps, more mqtt messages to support rewind and replay features). We have not made the format public at the current point, as there really is a big gap in complexity and implicit assumptions.

Could you tell us what is the rationale for running your own implementation ?

@koenvervloesem
Copy link
Author

koenvervloesem commented May 17, 2019

OK, now I understand. My rationale is that I wanted to use an open source audio server as a "satellite" for Rhasspy, which understands part of the Hermes protocol. Because snips-audio-server is not open source, I had to reimplement a minimal implementation of it. But in the spirit of being a good citizen in the world of the protocol I'm using, I want my implementation to play nice with the Snips services. Which it is (as I can just swap snips-audio-server for my minimal implementation and my Snips assistant keeps working), apart from these weird values I'm seeing in the VAD messages of snips-hotword.

@kali
Copy link

kali commented May 17, 2019

Would an option be to use the snips-audio-server and somewhat pull, reformat and forward snips-audio-server message to Rhasspy ?

@kali
Copy link

kali commented May 17, 2019

I'm not sure what part of snips-platform you are using, but the rewind and replay is what allow us to reduce drastically the necessary gap between hotword detection and asr start of decoding, so depending on your exact use case, you may loose more than the timestamps.

@koenvervloesem
Copy link
Author

Well, there's no real issue for me, as my implementation just works with Rhasspy and it seems to work with Snips too. This was just a request for more information because I wanted to stay faithful to the Hermes protocol. But if parts are not fully documented yet and I don't need them for my use case, it's not a big deal that I can't take them into account, so you can close this issue. Thanks for your clarification. But I do hope the complete protocol will be published sometime :-)

@kali
Copy link

kali commented May 17, 2019

Ok, thanks. And yes, we will try to find a way to handle these extensions.

@kali kali closed this as completed May 17, 2019
@fredszaq
Copy link

Hi @koenvervloesem, we've just merged a bit of doc in hermes explaining the exact format used by the audio server, you can have a look a it here: https://github.com/snipsco/hermes-protocol/blob/develop/hermes/src/ontology/audio_server.rs#L25..L87

(note that this may change in the future)

@koenvervloesem
Copy link
Author

@fredszaq Many thanks for publishing the exact format, much appreciated! I had already figured out the timestamps, but I wouldn't have found the rest by myself...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: support Support issue ( should not exist in this project ;)
Projects
None yet
Development

No branches or pull requests

4 participants