EmoSym: A Symbiotic Framework for Unified Emotional Understanding
and Generation via Latent Reasoning
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen
Great Bay University
*Corresponding author
- [11/2025] The code is released!
- [07/2025] EmoSym has been accepted by ACM MM 2025!
This is the github repository of EmoSym: A Symbiotic Framework for Unified Emotional Understanding and Generation via Latent Reasoning. In this work, we introduce EmoSym, a symbiotic framework that models emotional understanding and generation by leveraging latent reasoning.
The whole framework of EmoSym:
To create the conda environment needed to run the code, run the following command:
conda env create -f environment.yaml
EmoSet is needed to train in this network. You should use GPt-4o to expand the dataset! Please download the following models from Hugging Face: Qwen2-VL-2B-Instruct, Qwen2-VL-7B-Instruct (Or the specific variants used in your experiments.), clip-vit-large-patch14, and stable-diffusion-v1-5. Before training, manually update the dataset-loading code so that the file path correctly points to your local EmoSet directory.
To train our model, follow these steps:
Step 1 — Emotional Understanding Finetuning
Run the initial finetuning script:
bash code/Qwen2-VL-Finetune/scripts/finetune_emo.sh
Step 2 — Reinforcement Learning Finetuning Modify the rf_model_path in the script to the checkpoint obtained in Step 1, then run:
bash code/Qwen2-VL-Finetune/scripts/finetune_RL.sh
Step 3 — Joint Training Run the joint training pipeline:
sh joint_training.sh
To evaluate emotional understanding, run the following script:
python code/Qwen2-VL-Finetune/src/training/evalutaion.py
For emotional generation evaluation, refer to the official EmoGen evaluation pipeline
- Compute the emotion space using the EmoGen method.
- Evaluate generated samples with the same metrics used in EmoGen (e.g., emotion alignment, diversity, intensity consistency).
If you find this work useful, please kindly cite our paper:
@inproceedings{zhu2025emosym,
title={EmoSym: A Symbiotic Framework for Unified Emotional Understanding and Generation via Latent Reasoning},
author={Zhu, Yijie and Lyu, Yibo and Yu, Zitong and Shao, Rui and Zhou, Kaiyang and Nie, Liqiang},
booktitle={Proceedings of the 33nd ACM International Conference on Multimedia},
year={2025}
}
