We set out to create an impactful solution for anyone who can benefit from improved accessibility to everyday sound events. Our mobile application uses artificial intelligence to recognize key sound events of interest to the community such as emergency vehicle sirens and door knocks where immediate alerts and continuous logging is critical for the user. While there are many audio accessibility innovations in the app space, up until the time of writing it has been mostly in the areas of sound amplification and text to speech/speech to text. This app is optimized for Android with low-latency so that it works in real-time for the user.
The Melon AI app converts a sound wave (from the mic) into a mel-spectrogram image that serves as the main feature fed into a Convolutional Neural Network that will then classify the sound into one of eight classes. Average inference time is about 15 ms so the user never has to worry about missing a beat and the app can also be synced with a wearable device.
Coming soon to Google Play
- 110 MB Peak Memory Usage
- 5% Average, 10% Peak CPU Usage
- 10-15% Battery Life Penalty
Algorithmic performance:
- Enable "wake word" detection based on user's name
- Cross-platform support
- Sensitivity (threshold tuning)
- General accuracy improvements with minimal power usage penalty