Speech Recognition on STM32 using Machine Learning

This uses the TensorFlow Lite for Microcontrollers framework to perform simple keyword recognition on an STM32L475VGT B-L745E-IOT01A2 development board. It can detect "yes", "no", "up", "down", "left" and "right". It is trained on the speech_commands dataset by P. Warden.

ML on MCU Demo Project

⚠️ Make sure you clone the repository with --recursive, as it contains submodules.

Progress

Model running on PC using Tensorflow and TFLite
Model running on STM32 using TFLite runtime
STFT preprocessing ported to STM32
PDM Microphone readout on STM32

How does it work?

The model running on the microcontroller is able to classify recordings of spoken keywords into 6 classes: yes, no, up, down, left and right. For this, a short-time fourier transform is applied to the input waveform. This is achieved by taking a fixed-size window of the signal, multiplying it by a Hanning window function and then applying an FFT to the result. The output of the FFT results in a single column of the spectrum.

Then, this 124 x 129 spectrogram is fed into a neural network, which was previously trained using the TensorFlow framework. The model consists of a simple convolutional and dense neural network, and was quantized to only use integers to speed up to inference time on the resource-constrained microcontroller. After training, the model is converted to a TFlite model, which can run on top of the TFLite runtime on the STM32.

Dependencies

You only need some essentials and the arm-none-eabi toolchain. On Ubuntu 23.10 and similar this should be enough to build the project:

sudo apt-get update
sudo apt-get install build-essential cmake gcc-arm-none-eabi python3-numpy python3-pil unzip

Making it work on other Linux distros is possible, running it on Windows requires major changes because a lot of Linux-specifics are hard-coded, and is probably not worth the trouble.

Model Training

Create a virtual environment and install the python dependencies

cd ml
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

The model can be trained by running train.py, which will also download the dataset and split it into train, test and validation sets. The model will only be trained if model.keras does not exist already, so delete that to force retraining.

python train.py

Once the model is trained, it is automatically copied to src/models, where it will be compiled into the firmware in the next step.

Build

With the model trained, you can proceed to build the code.

cd stm32-speech-recognition
cmake -B build && make -C build

Upload

To upload the compiled binary (demo.elf) to the board, you can either use st-util, STM32CubeIDE, or any other SWD programmer (e.g., SEGGER j-link with Ozone).

Evaluation

Currently, there is no support for reading out the on-board microphone, so a waveform needs to be sent from the computer.

To evaluate the performance of the model running on the microcontroller, there are some helper scripts in tools. These scripts automatically send waveforms from the test set to the STM32 over UART, convert and plot various things. With the -DPRINT_SPECTROGRAM build flag spectrograms can be obtained from the microcontroller. Please note that these scripts are somewhat experimental, i.e., they might be adapted to work on your system.

Currently, the overall accuracy is about 80%.

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
.github/workflows		.github/workflows
debug		debug
docs		docs
include		include
ld		ld
ml		ml
src		src
third_party		third_party
tools		tools
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Recognition on STM32 using Machine Learning

Progress

How does it work?

Dependencies

Model Training

Build

Upload

Evaluation

About

Releases

Packages

Languages

License

stgloorious/stm32-speech-recognition

Folders and files

Latest commit

History

Repository files navigation

Speech Recognition on STM32 using Machine Learning

Progress

How does it work?

Dependencies

Model Training

Build

Upload

Evaluation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages