How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition

This code is a pytorch version for speechflow model in "How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition", which is modified for our investigations from the original Speechflow.

For ACRNN in this paper, its code can be found at 3-D ACRNN.

Dependencies

This project is built with python 3.6, for other packages, you can install them by pip install -r requirements.txt.

To Prepare Training Data(take VCTK as an example here, for other dataset, you should modify some settings in below files)

Prepare your wavefiles
cd tools
make training data and valid data, for better training, you can make an entire training wave file for each speaker by sh make_cat.sh, and make separate validation wave files for each speaker by sh make_valid.sh, here validation speakers can not appear in training data.
Extract spectrogram and f0: python make_spect_f0_VCTK.py
- you should provide d-vectors by a pre-trained model. D-vectors for speakers in VCTK calculated by our pre-trained model are provided in ./data/VCTK_dvec/dvector_VCTK.npz
- a mapping from speakers to IDs and another from speakers to corresponding genders are needeed
- for validation data, you should change some settings
Generate training metadata: make_metasplit_VCTK.py
Generate validation metadata: make_demodata_VCTK.py

To Train

change setting at hparams.py and run.py
Run the training scripts: python run.py

To implement information decomposition

An example is provide in infer_batch.py, in which you should define the input pickle file.

Final Words

This speechflow model is the most important tool for us to analyse impacts between speech components and performance of mordern emotion recognition systems. This code is modified for our task from the original Speechflow. Thanks for Kaizhi Qian providing the original code, which is much helpful for us.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
tfcompat		tfcompat
tools		tools
README.md		README.md
data_loader.py		data_loader.py
hparams.py		hparams.py
infer_batch.py		infer_batch.py
model.py		model.py
requirements.txt		requirements.txt
run.py		run.py
solver.py		solver.py
synthesis.py		synthesis.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition

Dependencies

To Prepare Training Data(take VCTK as an example here, for other dataset, you should modify some settings in below files)

To Train

To implement information decomposition

Final Words

About

Releases

Packages

Languages

FantSun/Speechflow

Folders and files

Latest commit

History

Repository files navigation

How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition

Dependencies

To Prepare Training Data(take VCTK as an example here, for other dataset, you should modify some settings in below files)

To Train

To implement information decomposition

Final Words

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages