Skip to content

kevinson7515/VC-Bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VC-Bench: Video Connecting Bench

🔥 Updates

📣 Overview

VC-Bench

We propose VC-Bench, a novel benchmark specifically designed for video connecting. It includes 1,579 high-quality videos collected from public platforms, covering 15 main categories and 72 subcategories to ensure diversity and structure. VC-Bench focuses on three core aspects: Video Quality Score $VQS$, Start-End Consistency Score $SECS$, and Transition Smoothness Score $TSS$. Additionally, we employed Qwen2-VL to equip each video with detailed and expressive textual descriptions. We also conduct Human Alignment for the generated videos for each metrics, and show that VC-Bench evaluation results are well aligned with human perceptions. VC-Bench can provide valuable insights from multiple perspectives. VC-Bench supports a wide range of video generation tasks, including Text-to-Video(T2V), Image-to-Video(I2V) and First-Last-Frame-to-Video(FLF2V), with an adaptive Image Suite for fair evaluation across different settings. It evaluates not only technical quality but also the trustworthiness of generative models, offering a comprehensive view of model performance. We expect that VC-Bench will serve as a pioneering benchmark to inspire and guide future research in video connecting.

🎓 Evaluation Results

Additionally, we present radar charts separately for the evaluation results of multiple models. The results are normalized per dimension for clearer comparisons.

📽️ Model Info

Model Video Resolution Suport Task Model Size Number of Frames Inference Precision Frame Rate
Wan-2.1 (1.3B) 480P ✔️ T2V, I2V, FLF2V, VC 1.3B (default 81) FP8 16fps
Wan-2.1 (14B) 480P ✔️ 720P ✔️ T2V, I2V, FLF2V, VC 14B (default 81) FP8 16fps
CogVideoX (2B) 480P ✔️ T2V, I2V, FLF2V, VC 2B Should be 8N + 1 where N <= 6 (default 49) FP16 8fps
CogVideX (5B) 480P ✔️ T2V, I2V, FLF2V, VC 5B Should be 8N + 1 where N <= 6 (default 49) BF16 8fps
OpenSora-2.0 (11B) 256P ✔️ 768P ✔️ T2V, I2V, FLF2V, VC 11B (default 120) FP16 24fps
Ruyi (7B) 480P ✔️ 720P ✔️ T2V, I2V, FLF2V, VC 7B (default 120) FP16 24fps

🔨 Installation

Install with pip

pip install torch torchvision 

To evaluate some video generation ability aspects, you need to download some pretrained modules:


Install with git clone

git clone https://github.com/kevinson7515/VC-Bench.git
pip install torch torchvision

💎 Usage

Evaluate Your Own Videos

We support evaluating any video. Simply provide the path to the video file, or the path to the folder that contains your videos. There is no requirement on the videos' names.

To evaluate videos with customized input prompt, run our script with:

python evaluate.py \
    --videos_path /path/to/folder_or_video/

To evaluate using multiple gpus, we can use the following commands:

torchrun --nproc_per_node=${GPUS} --standalone evaluate.py ...args...

How to Calculate Total Score

To calculate the Total Score, we follow these steps:

  1. Normalization: To standardize evaluations, we converted negative metrics to positive by subtracting from 1

  2. Video Quality Score: The Video Quality Score is a weighted average of the following dimensions: subject consistency, background consistency, flickering severity, aesthetic score, and imaging quality.

  3. Start-End Consistency Score: The Start-End Consistency Score is a weighted average of the following dimensions: pixel consistency and optical flow error.

  4. Transition Smoothness Score:

    The Transition Smoothness Score is a weighted average of the following dimensions:

    video connecting distance and local perceptual consistency.

  5. Total Score for VC-Bench

    The Total Score is a weighted average of the Video Quality Score, Start-End Consistency and Semantic Score:

    Total Score = w1 * Video Quality Score + w2 * Start-End Consistency Score + w3 * Transition Smoothness Score
    

📑 Sampled Videos

🤗Hugging Face

To facilitate future research and to ensure full transparency, we release all the videos we sampled and used for VC-Bench evaluation. You can download them on [Hugging Face](Kevinson-lzp/VC-Bench · Datasets at Hugging Face).

🏄 Evaluation Method Suite

To perform evaluation on one dimension, run this:

python evaluate.py --videos_path $VIDEOS_PATH
  • The complete list of dimensions:

    ['subject_consistency', 'background_consistency', 'flickering_severity', 'aesthetic_score', 'imaging_quality', 'pixel_consistency', 'optical_flow_error', 'connecting_distance', 'local_perceptual_consistency']
    

Alternatively, you can evaluate multiple models and multiple dimensions using this script:

bash evaluate.sh

✒️ Citation

If you find our repo useful for your research, please consider citing our paper:


♥️ Acknowledgement

💪 VBench Contributors

Order is based on the time joining the project:

🤗 Open-Sourced Repositories

This project wouldn't be possible without the following open-sourced repositories: VBench, CLIP, IQA-PyTorch, and LAION Aesthetic Predictor.

About

Video Connecting Bench

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors