-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EN Extremely bad wake word recognition #9
Comments
I received my ESP Box today. |
Tried a Skainet example which uses the wake word Alexa and this works fine. Compiled the ESP_Box demo to use wakeword model alexa8 instead of hiesp8 and now wakes easily. |
Maybe the Hi_ESP_Q8 model as I did get it to work but learning how to position myself and control my speech for wake word recognition took considerable time. I need to get back up to speed with esp and install the IDF and get to grips with the framework so will also compile and test but currently the supplied firmware gives an extremely bad impression of the box capabilities which in use is near useless. @s60sc how did you find the multinet commands? https://github.com/espressif/esp-sr/blob/509cf82658cf2b645aa9ae89b6764beea48e7eae/include/esp_map.h#L23 |
The multiword model worked fine for me. I did not change the mic arrangement. I will do more work on the ESP-BOX in the new year. I have uploaded a version of the firmware with the Alexa wakeword in this repository |
Hi, we have updated a new "Hi,ESP" model by adding more samples. |
Tried the skainet en_speech_commands_recognition example after updating the esp-sr component, installed on the ESP-Box. hiesp8 version:
alexa8 version:
Core dump if that helps:
|
have fixed the bug of wakenet8_hiesp, pls try again. |
Now works, and hiesp8 model is significantly improved. Updated factory demo firmwares in this repository |
Seems to be now fixed so will close |
Hi downloaded https://github.com/espressif/esp-box/releases/download/v0.2.1/ESP-Box_Demo_EN_V0.2.1.bin and flashed.
I have spent over ten minutes several reboots to get only 3 recognitions with worry that my neighbours may be thinking I have gone mad.
If you do manage the wake word then the command turn on/off light works great.
Actually getting the wake word to activate is another matter in a silent room.
I have a feeling it could be your dataset and now it actually has a english male voice it doesn't fit well or this firmware / model is not good.
I have no problems recording some samples if you tell me format and sampling rate...
Maybe community members could contribute if its merely a esp32-s3-box dataset accent problem.
Actually with more playing seems you need to be 2-3 meters away (further the better) for it to work relatively well if you pull the unit out on place close on your desk it doesn't work well maybe AGC?
Maybe its as simple as the MAP of mic spacing is 50mm whilst the box is 34mm, dunno but currently does not seem to be well.
The hardware design and software of the esp-box is brilliant and is really polished but the results currently of ESP_SR wow I really don't want to comment and due to the module being a .a with no source code there is little I can do, but wait fingers crossed.
ESP_DL lacks LSTM/GRU support which for audio is likely to be restrictive but always struggle what the xtensa hifi / esp32-s3 are capable of.
Maybe TFLite-Micro might come to the esp32-s3's rescue with vector optimisation as what is available and that its closed source is a huge stumbling block for anyone wanting to do ML on the S3
The text was updated successfully, but these errors were encountered: