Skip to content

iLearn-Lab/ACL26-OSCBench

Repository files navigation

OSCBench: Benchmarking Object State Change in Text-to-Video Generation

1National University of Singapore    2Singapore Management University    3Carnegie Mellon University    4Fudan University
Corresponding author

Overview

OSCBench is a benchmark for evaluating whether text-to-video models can generate correct and temporally consistent object state changes.

This repository currently contains prompt resources, action/object taxonomies, frame extraction code, an MLLM-based evaluation script, and a correlation analysis script for comparing automatic judgments with human ratings.

Setup

  1. Install the required dependencies:
pip install openai opencv-python numpy scipy
  1. Set up your OpenAI API credentials in mllm_eval.py:
OPENAI_API_KEY = "YOUR_OPENAI_API_KEY"

Pipeline Components

1. Video Generation and Frame Sampling

Generate videos from OSCBench prompts using your target text-to-video model, then extract evenly sampled frames for automatic evaluation.

Generate Videos

Use prompts from prompts.txt or one of the split files under prompt_splits/.

Sample Frames

This script samples 20 evenly spaced frames from each video and saves them into one subfolder per video.

Usage:

python extract_frames.py

2. Model Evaluation

This script evaluates sampled frames using an MLLM through the OpenAI Responses API. It asks the model to inspect the frames chronologically and return evidence-backed scores for eight dimensions:

  • 1a Subject Alignment
  • 1b Manipulated Object Alignment
  • 2a Action Accuracy
  • 3a Object State Change Accuracy
  • 3b Object Change Continuity and Consistency
  • 4a Scene Alignment
  • 5a Realism
  • 5b Aesthetic

Among these dimensions, 3a and 3b directly measure the object state change ability emphasized by OSCBench.

Usage:

python mllm_eval.py

3. Results Analysis

This script analyzes the correlation between MLLM-based evaluation and human evaluation, following the benchmark's automatic-evaluation validation setting described on the project page and in the paper.

It computes per-dimension:

  1. Kendall's tau
  2. Spearman's rho

Usage:

python result_analyze.py

Citation

If you find our work useful, please cite:

@article{han2026oscbench,
  title={OSCBench: Benchmarking Object State Change in Text-to-Video Generation},
  author={Han, Xianjing and Zhu, Bin and Hu, Shiqi and Li, Franklin Mingzhe and Carrington, Patrick and Zimmermann, Roger and Chen, Jingjing},
  journal={arXiv preprint arXiv:2603.11698},
  year={2026}
}

About

Official Implementation for "OSCBench: Benchmarking Object State Change in Text-to-Video Generation"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages