GitHub

The purpose of this study is to examine two primary methods of modeling sound data for speech emotion recognition-- a flattened feature transform consistent with telecommunications standards and a convolutional neural network utilizing stacked spectrogram arrays.

Phase 1 - Problem Definition 1.1 Broad Goals 1.2 Data Source 1.3 Problem Statement

Phase 2 - Data Gathering 2.1 load files 2.2 convert stereo files to mono

Phase 3 - Exploratory Data Analysis 3.1 Waveforms 3.2 Spectrograms 3.3 Speech vs Song

Phase 4 - Modeling 4.1 Train/Test/Split 4.2 Flat Features 4.4 Convolutional Neural Net 4.5 Comparative Modeling

Phase 5 - Model Analysis 5.0 Baseline Score 5.1 Compare Accuracy Scores 5.2 Production Model

Phase 6 - Conclusions 6.1 Revisit 1.3 Problem Statement 6.2 Conclusions 6.3 Recommendations for Further Research 6.4 Credits/References

Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.

EDA Segment and comprehension of the Short Term Fuorier Transform inspired by https://jackschaedler.github.io/circles-sines-signals

https://www.kdnuggets.com/2017/12/audio-classifier-deep-neural-networks.html

https://github.com/lukas/ml-class/blob/master/videos/cnn-audio/audio.ipynb

Speech Intelligibility information courtesy of : https://www.dpamicrophones.com/mic-university/facts-about-speech-intelligibility

AUTHOR=Lech Margaret, Stolar Melissa, Best Christopher, Bolia Robert TITLE=Real-Time Speech Emotion Recognition Using a Pre-trained Image Classification Network: Effects of Bandwidth Reduction and Companding
JOURNAL=Frontiers in Computer Science
VOLUME=2
YEAR=2020 PAGES=14
URL=https://www.frontiersin.org/article/10.3389/fcomp.2020.00014 DOI=10.3389/fcomp.2020.00014
ISSN=2624-9898

McFee, Brian, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. “librosa: Audio and music signal analysis in python.” In Proceedings of the 14th python in science conference, pp. 18-25. 2015.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
assets		assets
imgs		imgs
scripts		scripts
templates		templates
.DS_Store		.DS_Store
.gitignore		.gitignore
POC.ipynb		POC.ipynb
README.md		README.md
Technical Presentation.pptx		Technical Presentation.pptx
file_conversion.py		file_conversion.py
modeling.ipynb		modeling.ipynb
single_actor_eda.ipynb		single_actor_eda.ipynb
~$Technical Presentation.pptx		~$Technical Presentation.pptx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

pycache

pycache

assets

assets

imgs

imgs

scripts

scripts

templates

templates

.DS_Store

.DS_Store

.gitignore

.gitignore

POC.ipynb

POC.ipynb

README.md

README.md

Technical Presentation.pptx

Technical Presentation.pptx

file_conversion.py

file_conversion.py

modeling.ipynb

modeling.ipynb

single_actor_eda.ipynb

single_actor_eda.ipynb

~$Technical Presentation.pptx

~$Technical Presentation.pptx

Repository files navigation

About

Releases

Packages

Languages

kduane/speech_emotion_recognition

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages