This is the official implementation for our paper
"HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models"
- Holistic skills evaluation. Rather than focus on isolated metrics such as accuracy, we measure 13 skills, which could be categorized into five critical skills; accuracy, robustness, generalization, fairness, and bias.
- Broad scenarios coverage. HRS-Bench covers 50 applications, e.g., fashion, animals, transportation, food, and clothes.
- Standardization. We propose a unified benchmark, where we fairly evaluate the existing models across a wide range of metrics.
- Stable-Diffusion V1
- Stable-Diffusion V2
- DALL.E V2
- Structure-Difussion
- CogView V2
- Glide
- Paella
- minDALL-E
- DALLEMini
- Python >= 3.7
- Pytorch >= 1.7.0
- Install other common packages (numpy, pytorch_transformers, etc.)
- First, download our prompts that covers the 13 skills from here.
- Each skill has its own CSV file that contains the prompt and the GT that will be used during the evaluation phase.
The project is inspired from the great language benchmark HELM.
Please consider citing our paper if you find it useful.