Build your own Real-time Speech Emotion Recognizer
alt text

EmoVoice is a set of tools, which allow you to build your own real-time emotion recognizer based on acoustic properties of speech (not using word information).




Make sure Visual Studio 2015 Redistributable is installed on your machine. Then run install.cmd to download core binaries and install an embedded version of Python.

If you plan to extract SoundNet features, you will also have to execute install_tensorflow.cmd and download the file sound8.npy into the chains folder.



  • SSI -- Social Signal Interpretation Framework
  • LIBSVM -- A Library for Support Vector Machines
  • LIBLINEAR -- A Library for Large Linear Classification
  • openSMILE -- The Munich Versatile and Fast Open-Source Audio Feature Extractor
  • Emo-DB -- Berlin Database of Emotional Speech
  • SoundNet -- TensorFlow implementation of "SoundNet"


 author = {Wagner, Johannes and Lingenfelser, Florian and Baur, Tobias and Damian, Ionut and Kistler, Felix and Andr{\'e}, Elisabeth},
 title = {The social signal interpretation (SSI) framework: multimodal signal processing and recognition in real-time},
 booktitle = {Proceedings of the 21st ACM international conference on Multimedia},
 series = {MM '13},
 year = {2013},
 isbn = {978-1-4503-2404-5},
 location = {Barcelona, Spain},
 pages = {831--834},
 numpages = {4},
 url = {},
 doi = {10.1145/2502081.2502223},
 acmid = {2502223},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {multimodal fusion, open source framework, real-time pattern recognition, social signal processing},


The framework is released under LGPL (see LICENSE). Please note custom license files for the plug-ins (see LICENSE.*).


Johannes Wagner, Lab for Human Centered Multimedia, 2018

