StimuVAR

The official implementation of StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models accepted by IJCV.

Install

Clone this repository and navigate to StimuVAR folder

git clone https://github.com/EthanG97/StimuVAR.git
cd StimuVAR

Install Packages

conda create -n stimuvar python=3.9 -y
conda activate stimuvar
pip install --upgrade pip
pip install -r requirements.txt

Data

In this work, we conduct experiments on four datasets: VCE, VE-8, YF-6, and EmoSet.

The following examples use the VCE dataset for demonstration purposes.

🎞️ Event-driven Frame Sampling

We provide a preprocessing script to extract motion-salient frames from video clips using dense optical flow analysis. This is particularly useful when preparing frame-level stimuli that capture rapid, key events—often corresponding to dramatic changes in a video's visual dynamics.

python helpers/extract_frames.py \
  --input_json helpers/alltrain.json \
  --video_root /path/to/videos \
  --output_dir /path/to/output_frames \
  --total_frames 6

Training StimuVAR

The training pipeline for StimuVAR consists of two sequential stages:

Stage 1: Visual feature alignment
Stage 2: Emotion reasoning based on aligned features

You can download the necessary resources from Google Drive:

Training Commands

# Stage 1: Train for visual feature alignment
torchrun --nproc_per_node=1 train.py --config config/Stage1.yaml

# Stage 2: Train for emotion reasoning
# (Use the Stage 1 model as the base model, specified in the Stage2 config)
torchrun --nproc_per_node=1 train.py --config config/Stage2.yaml

Inference

To run inference with a trained StimuVAR model:

Update the configuration
Open config/Inference.yaml and set the model_path to the checkpoint of your trained Stage 2 model.

Run the inference script
Use the following command:

torchrun --nproc_per_node=1 inference.py --config config/Inference.yaml

Demo

To run the demo for a single video, fill in the model path, config file and the video name

python demo.py \
--video assets/sample_video.mp4 \
--model_path checkpoints/stage2/checkpoint-150000 \
--config config/Inference.yaml

🧪 Evaluation

CLIP-Score

python Metrics/Clip_score/clip_score.py \
  --response_path /path/to/model_responses.jsonl \
  --img_dir /path/to/extracted_test_set_frames/
python Metrics/Clip_score/ave.py \
  --filename /path/to/_clipscore.json

LLM-judge

python Metrics/LLM_judge/LLMjudge.py </path/to/model_responses.jsonl> <output_file>

Doubly-Right & Rank3

python Metrics/Doubly/gpt_predict.py --response_path /path/to/model_responses.jsonl  
#Emotion prediction based on reason using GPT
python Metrics/Doubly/emo_align.py --response_path /path/to/model_responses.jsonl

🙏 Acknowledgment

Special thanks to Valley for providing high-quality code, which served as the foundation for our implementation.

📖 Citation

If you find this project helpful in your research, please consider citing our paper:

@article{guo2025stimuvar,
  title     = {StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models},
  author    = {Guo, Yuxiang and Siddiqui, Faizan and Zhao, Yang and Chellappa, Rama and Lo, Shao-Yuan},
  journal   = {International Journal of Computer Vision},
  pages     = {1--17},
  year      = {2025},
  publisher = {Springer}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StimuVAR

Install

Data

🎞️ Event-driven Frame Sampling

Training StimuVAR

Training Commands

Inference

Demo

🧪 Evaluation

CLIP-Score

LLM-judge

Doubly-Right & Rank3

🙏 Acknowledgment

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Metrics		Metrics
config		config
dataset		dataset
helpers		helpers
model		model
README.md		README.md
demo.py		demo.py
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py
trainer.py		trainer.py

Folders and files

Latest commit

History

Repository files navigation

StimuVAR

Install

Data

🎞️ Event-driven Frame Sampling

Training StimuVAR

Training Commands

Inference

Demo

🧪 Evaluation

CLIP-Score

LLM-judge

Doubly-Right & Rank3

🙏 Acknowledgment

📖 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages