GitHub - iLearn-Lab/ACL26-OSCBench: Official Implementation for "OSCBench: Benchmarking Object State Change in Text-to-Video Generation"

OSCBench: Benchmarking Object State Change in Text-to-Video Generation

Xianjing Han^1*, Bin Zhu^2*✉, Shiqi Hu¹, Franklin Mingzhe Li³, Patrick Carrington³, Roger Zimmermann¹, Jingjing Chen⁴

¹National University of Singapore ²Singapore Management University ³Carnegie Mellon University ⁴Fudan University
^✉Corresponding author

Overview

OSCBench is a benchmark for evaluating whether text-to-video models can generate correct and temporally consistent object state changes.

This repository currently contains prompt resources, action/object taxonomies, frame extraction code, an MLLM-based evaluation script, and a correlation analysis script for comparing automatic judgments with human ratings.

Setup

Install the required dependencies:

pip install openai opencv-python numpy scipy

Set up your OpenAI API credentials in mllm_eval.py:

OPENAI_API_KEY = "YOUR_OPENAI_API_KEY"

Pipeline Components

1. Video Generation and Frame Sampling

Generate videos from OSCBench prompts using your target text-to-video model, then extract evenly sampled frames for automatic evaluation.

Generate Videos

Use prompts from prompts.txt or one of the split files under prompt_splits/.

Sample Frames

This script samples 20 evenly spaced frames from each video and saves them into one subfolder per video.

Usage:

python extract_frames.py

2. Model Evaluation

This script evaluates sampled frames using an MLLM through the OpenAI Responses API. It asks the model to inspect the frames chronologically and return evidence-backed scores for eight dimensions:

1a Subject Alignment
1b Manipulated Object Alignment
2a Action Accuracy
3a Object State Change Accuracy
3b Object Change Continuity and Consistency
4a Scene Alignment
5a Realism
5b Aesthetic

Among these dimensions, 3a and 3b directly measure the object state change ability emphasized by OSCBench.

Usage:

python mllm_eval.py

3. Results Analysis

This script analyzes the correlation between MLLM-based evaluation and human evaluation, following the benchmark's automatic-evaluation validation setting described on the project page and in the paper.

It computes per-dimension:

Kendall's tau
Spearman's rho

Usage:

python result_analyze.py

Citation

If you find our work useful, please cite:

@article{han2026oscbench,
  title={OSCBench: Benchmarking Object State Change in Text-to-Video Generation},
  author={Han, Xianjing and Zhu, Bin and Hu, Shiqi and Li, Franklin Mingzhe and Carrington, Patrick and Zimmermann, Roger and Chen, Jingjing},
  journal={arXiv preprint arXiv:2603.11698},
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OSCBench: Benchmarking Object State Change in Text-to-Video Generation

Overview

Setup

Pipeline Components

1. Video Generation and Frame Sampling

Generate Videos

Sample Frames

2. Model Evaluation

3. Results Analysis

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
action_object_taxonomy		action_object_taxonomy
prompt_splits		prompt_splits
README.md		README.md
extract_frames.py		extract_frames.py
mllm_eval.py		mllm_eval.py
prompts.txt		prompts.txt
result_analyze.py		result_analyze.py

Folders and files

Latest commit

History

Repository files navigation

OSCBench: Benchmarking Object State Change in Text-to-Video Generation

Overview

Setup

Pipeline Components

1. Video Generation and Frame Sampling

Generate Videos

Sample Frames

2. Model Evaluation

3. Results Analysis

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages