Skip to content

facebookresearch/synthetic_data_utility_prompt_complexity

Evaluate T2I models along diversity, quality, consistency as functions of prompt complexity

Environment dependencies

Base environment configuration are in file envirionment.yml. For faiss search, use environment in file environment_faiss.yml. For evaluation, we adopt several packages:

Framework

Use the scripts in folder pipeline to create datasets of different complexities and search clusters.

  1. Run vlm_captioning_multi_compleixity.py to caption the dataset to different lengths.
  2. Run gather_captioning_multi_complexity.py to create a metadata file for image-caption pairs of different lengths.
  3. Run get_siglip_embeddings_img.py and get_siglip_embeddings_text.py to get siglip embeddings.
  4. Run faiss_search.py to find the most similar images for each caption.
  5. Run cluster_formation.py to get the clusters that verify the similarity threshold, minimum cluster size, and sample the needed captions for generaion.

Evaluation

Use the scripts in folder evaluation to compute different metrics.

  1. Aesthetic score: Install aesthetic score. Run sbatch aesthetic_evaluator.sh under aesthetic_evaluation.
  2. DSG score: Clone EvalGIM. Configure the dataset following https://github.com/facebookresearch/EvalGIM/?tab=readme-ov-file#add-your-own-datasets. We prepare the dataset class for CC12M with different complexities in cc12m_dataset_evalgim.py. Clone DSG and use gen_dsg.py to get question graphs. Run sbatch run_sbatch_evaluation_conditional.sh under consistency_evaluation.
  3. Vendi score: Clone vendi score. Run python vendi_calculation.py under diversity_evaluation.
  4. marginal metrics: Clone dgm-eval. Run sbatch run_evaluation_cc12m.sh under marginal_evaluation.

Dataset

  1. Download the cc12m images following https://github.com/google-research-datasets/conceptual-12m. Put the images under metadata/cc12m/images/.
  2. Run the scripts following Framework section, and get the clusters for each complexity. Then create the eval_imgs.zip containing all the remaining real images under metadata/cc12m/. In our scripts, we sharded the eval_imgs to 9 zip files.

License Info

This benchmark is licensed under CC-BY-NC 4.0.

About

code base for paper The Intricate Dance of Prompt Complexity, Quality, Diversity, and Consistency in T2I Models (https://arxiv.org/abs/2510.19557)

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published