Detect simple voice commands and audio events on small embedded sytems like the PiZero.
Classify audio with neural nets on embedded systems like the Raspberry Pi using Tensorflow. Libraries are avialable for Linux,OSX and RaspberryPi. For other platforms you will have to compile the library yourself.
To run an example
git clone --depth 1 https://github.com/nyumaya/nyumaya_audio_recognition.git cd nyumaya_audio_recognition/python python streaming_example.py
The demo captures audio from the default microphone. For each application, different model architectures are available which are a tradeoff between accuracy and cpu/mem usage.
An experimental web demo is available here. This has been tested with recent versions of Chrome and Firefox.
- Small model (CPU Pi0: 20% CPU Pi3 one core: 6%)
- Big model (CPU Pi0: 62% CPU Pi3 one core: 13%)
I compiled a list of project ideas here
- Command Objects (music,radio,television,door,water,computer,temperature,light,house)
- Voice-gender (female,male,nospeech)
- Baby-monitor (cry, babble, door-open, music, glass-break, footsteps, fire-alarm)
- Impulse-response (Play tone and interpret echo: Bedroom, Kitchen, Bathroom, Outdoor, Hall, Living Room, Basement)
- Alarm-system (door-open, glass-break, footsteps, fire-alarm, voice)
- Door-monitor (door bell, door knocking, voice)
- Weather (thunder, rain, storm, hail)
- Language detection
- Swear word detection (imagine some unappropriate words)
- Crowd monitoring(screaming, shouting, gunshot, siren, explosion)
- Animal monitoring (dog, cat, chicken, rooster..)
- Speaker Verification
If you need a special combination of audio classes or model architecture trained create an issue and I will try to prioritize or train it.
- Marvin Hotword (marvin)
- Sheila Hotword (sheila)
- Marvin Sheila Hotword (marvin,sheila)
- Command Subset (yes,no,up,down,left,right,on,off,stop,follow,play)
- Command Numbers (one,two,three,four,five,six,seven,eight,nine,zero)
The sensitivity parameter is a tradeoff between accuracy and false positives. Setting the sensitivity to a high value means it's easier to trigger the hotword. If you experience a lot of false detections, set the sensitivity to a lower value. All models have a corresponding result.txt file where the test results for different sensitivities are captured. A False predictions per hour value of 0 doesn't mean that no false prediction will ever occur. It just means that during the test (~5 hours of audio, mostly speech) no false prediction occured.
You can run the audio_check script to get some info about your volume level and possible DC-Offset. Speak as loud as the maximum expected volume will be.
The multi_streaming_example.py is a demo of how to chain commands. You can add commands with a list of words and function to call when the command is detected.
Be aware that CPU usage increases when multiple models have to run concurrently. I this case the software has to run the marvin_model (marvin) and the subset_model (stop) at the same time.
Compiling the library for your own target:
The source code for building the library can be found here. You will most likely have to modify the CMakeLists.txt
Supporting a new platform is very easy. After compiling the library and moving it to lib/ you have to modify the file src/auto_platform.py . You can run the script with python auto_platform.py and it will output your system and machine information. All you have to do is setting the default library path and wether to use arecord or pydub as audio source.
You might have to modify the python bindings.