- [09/2025] Release Evaluation Dataset on [Hugging Face](Kevinson-lzp/VC-Bench · Datasets at Hugging Face)
- [05/2025] Create 🔥[VC-Bench](kevinson7515/VC-Bench: Video Connecting Bench)🔥 repository
We propose VC-Bench, a novel benchmark specifically designed for video connecting. It includes 1,579 high-quality videos collected from public platforms, covering 15 main categories and 72 subcategories to ensure diversity and structure. VC-Bench focuses on three core aspects: Video Quality Score
Additionally, we present radar charts separately for the evaluation results of multiple models. The results are normalized per dimension for clearer comparisons.
| Model | Video Resolution | Suport Task | Model Size | Number of Frames | Inference Precision | Frame Rate |
|---|---|---|---|---|---|---|
| Wan-2.1 (1.3B) | 480P ✔️ | T2V, I2V, FLF2V, VC | 1.3B | (default 81) | FP8 | 16fps |
| Wan-2.1 (14B) | 480P ✔️ 720P ✔️ | T2V, I2V, FLF2V, VC | 14B | (default 81) | FP8 | 16fps |
| CogVideoX (2B) | 480P ✔️ | T2V, I2V, FLF2V, VC | 2B | Should be 8N + 1 where N <= 6 (default 49) | FP16 | 8fps |
| CogVideX (5B) | 480P ✔️ | T2V, I2V, FLF2V, VC | 5B | Should be 8N + 1 where N <= 6 (default 49) | BF16 | 8fps |
| OpenSora-2.0 (11B) | 256P ✔️ 768P ✔️ | T2V, I2V, FLF2V, VC | 11B | (default 120) | FP16 | 24fps |
| Ruyi (7B) | 480P ✔️ 720P ✔️ | T2V, I2V, FLF2V, VC | 7B | (default 120) | FP16 | 24fps |
pip install torch torchvision
To evaluate some video generation ability aspects, you need to download some pretrained modules:
git clone https://github.com/kevinson7515/VC-Bench.git
pip install torch torchvision
We support evaluating any video. Simply provide the path to the video file, or the path to the folder that contains your videos. There is no requirement on the videos' names.
To evaluate videos with customized input prompt, run our script with:
python evaluate.py \
--videos_path /path/to/folder_or_video/
To evaluate using multiple gpus, we can use the following commands:
torchrun --nproc_per_node=${GPUS} --standalone evaluate.py ...args...
To calculate the Total Score, we follow these steps:
-
Normalization: To standardize evaluations, we converted negative metrics to positive by subtracting from 1
-
Video Quality Score: The
Video Quality Scoreis a weighted average of the following dimensions: subject consistency, background consistency, flickering severity, aesthetic score, and imaging quality. -
Start-End Consistency Score: The
Start-End Consistency Scoreis a weighted average of the following dimensions: pixel consistency and optical flow error. -
Transition Smoothness Score:
The
Transition Smoothness Scoreis a weighted average of the following dimensions:video connecting distance and local perceptual consistency.
-
Total Score for VC-Bench
The Total Score is a weighted average of the
Video Quality Score,Start-End ConsistencyandSemantic Score:Total Score = w1 * Video Quality Score + w2 * Start-End Consistency Score + w3 * Transition Smoothness Score
🤗Hugging Face
To facilitate future research and to ensure full transparency, we release all the videos we sampled and used for VC-Bench evaluation. You can download them on [Hugging Face](Kevinson-lzp/VC-Bench · Datasets at Hugging Face).
To perform evaluation on one dimension, run this:
python evaluate.py --videos_path $VIDEOS_PATH
-
The complete list of dimensions:
['subject_consistency', 'background_consistency', 'flickering_severity', 'aesthetic_score', 'imaging_quality', 'pixel_consistency', 'optical_flow_error', 'connecting_distance', 'local_perceptual_consistency']
Alternatively, you can evaluate multiple models and multiple dimensions using this script:
bash evaluate.sh
If you find our repo useful for your research, please consider citing our paper:
Order is based on the time joining the project:
This project wouldn't be possible without the following open-sourced repositories: VBench, CLIP, IQA-PyTorch, and LAION Aesthetic Predictor.

