messAIh

A dataset for Speech Emotion Recognition

license: mit

task_categories:

audio-classification language:
en tags:
SER
Speech Emotion Recognition
Speech Emotion Classification
Audio Classification
Audio
Emotion
Emo
Speech
Mosei

pretty_name: messiah

size_categories:

10K<n<100K

DATASET DESCRIPTION

The MESSAIH dataset is a fork of CMU MOSEI.

Unlike its parent, MESSAIH is intended for unimodal model development and focuses exclusively on audio classification, more specifically, Speech Emotion Recognition (SER).

Of course, it can be used for bimodal classification by transcribing each audio track.

MESSAIH currently contains 13,234 speech samples annotated according to the CMU MOSEI scheme:

Each sentence is annotated for sentiment on a [-3,3] Likert scale of: [−3: highly negative, −2 negative, −1 weakly negative, 0 neutral, +1 weakly positive, +2 positive, +3 highly positive]. Ekman emotions of {happiness, sadness, anger, fear, disgust, surprise} are annotated on a [0,3] Likert scale for presence of emotion x: [0: no evidence of x, 1: weakly x, 2: x, 3: highly x].

The dataset is provided as a parquet file.

Provisionally, the file is stored on a cloud drive as it is too big for GitHub. Note that the original parquet file from August 10th 2023 was buggy and so was the Python script.

To facilitate inspection, a truncated csv sample file is also provided, but it does not contain the audio arrays.

If you train a model on this dataset, you would make us very happy by letting us know.

UNPACKING THE DATASET

A sample Python script (check the top of the script for the requirements) is also provided for illustrative purposes.

The script reads the parquet file and produces the following:

A csv file with file names and MOSEI values (columns names are self-explanatory).
A folder named "wavs" containing the audio samples.

LEGAL CONSIDERATIONS

Note that producing the wav files might (or might not) constitute copyright infringement as well as a violation of Google's Terms of Service.

Instead, researchers are encouraged to use the numpy arrays contained in the last column of the dataset ("wav2numpy") directly, without actually extracting any playable audio.

That, I believe, may keep us in the grey zone.

CAVEATS

As one can appreciate from the charts contained in the "charts" folder, the dataset is biased towards "positive" emotions, namely happiness.

Certain emotions such as fear may be underrepresented, not only in terms of number of occurences, but, more problematically, in terms of "intensity".

MOSEI is considered a natural or spontaneous emotion dataset (as opposed to an actored or scripted one) showcasing "genuine" emotions.

However, keep in mind that MOSEI was curated from a popular social network and social networks are notoriously abundant in fake emotions.

Moreover, certain emotions may be intrinsically more difficult to detect than others, even from a human perspective.

Yet, MOSEI is possibly one of the best datasets of its kind currently in the public domain.

Also note that the original MOSEI contains nearly twice as many entries as MESSAIH does.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
charts		charts
LICENSE		LICENSE
README.md		README.md
parquet2csv_wav.py		parquet2csv_wav.py
sqe_messai_sample.csv		sqe_messai_sample.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

charts

charts

LICENSE

LICENSE

README.md

README.md

parquet2csv_wav.py

parquet2csv_wav.py

sqe_messai_sample.csv

sqe_messai_sample.csv

Repository files navigation

messAIh

A dataset for Speech Emotion Recognition

About

Releases

Packages

Languages

License

mirix/messaih

Folders and files

Latest commit

History

Repository files navigation

messAIh

A dataset for Speech Emotion Recognition

About

Resources

License

Stars

Watchers

Forks

Languages