Skip to content

kduane/speech_emotion_recognition

Repository files navigation

The purpose of this study is to examine two primary methods of modeling sound data for speech emotion recognition-- a flattened feature transform consistent with telecommunications standards and a convolutional neural network utilizing stacked spectrogram arrays.

Phase 1 - Problem Definition 1.1 Broad Goals 1.2 Data Source 1.3 Problem Statement

Phase 2 - Data Gathering 2.1 load files 2.2 convert stereo files to mono

Phase 3 - Exploratory Data Analysis 3.1 Waveforms 3.2 Spectrograms 3.3 Speech vs Song

Phase 4 - Modeling 4.1 Train/Test/Split 4.2 Flat Features 4.4 Convolutional Neural Net 4.5 Comparative Modeling

Phase 5 - Model Analysis 5.0 Baseline Score 5.1 Compare Accuracy Scores 5.2 Production Model

Phase 6 - Conclusions 6.1 Revisit 1.3 Problem Statement 6.2 Conclusions 6.3 Recommendations for Further Research 6.4 Credits/References

Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.

EDA Segment and comprehension of the Short Term Fuorier Transform inspired by https://jackschaedler.github.io/circles-sines-signals

https://www.kdnuggets.com/2017/12/audio-classifier-deep-neural-networks.html

https://github.com/lukas/ml-class/blob/master/videos/cnn-audio/audio.ipynb

Speech Intelligibility information courtesy of : https://www.dpamicrophones.com/mic-university/facts-about-speech-intelligibility

AUTHOR=Lech Margaret, Stolar Melissa, Best Christopher, Bolia Robert TITLE=Real-Time Speech Emotion Recognition Using a Pre-trained Image Classification Network: Effects of Bandwidth Reduction and Companding
JOURNAL=Frontiers in Computer Science
VOLUME=2
YEAR=2020 PAGES=14
URL=https://www.frontiersin.org/article/10.3389/fcomp.2020.00014 DOI=10.3389/fcomp.2020.00014
ISSN=2624-9898

McFee, Brian, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. “librosa: Audio and music signal analysis in python.” In Proceedings of the 14th python in science conference, pp. 18-25. 2015.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages