Evaluate T2I models along diversity, quality, consistency as functions of prompt complexity

Environment dependencies

Base environment configuration are in file envirionment.yml. For faiss search, use environment in file environment_faiss.yml. For evaluation, we adopt several packages:

For aesthetic_evaluation, check https://github.com/discus0434/aesthetic-predictor-v2-5
For diversity_evaluation, check https://github.com/vertaix/Vendi-Score
For marginal_evaluation, check https://github.com/layer6ai-labs/dgm-eval/
For consistency_evaluation, check https://github.com/facebookresearch/EvalGIM/

Framework

Use the scripts in folder pipeline to create datasets of different complexities and search clusters.

Run vlm_captioning_multi_compleixity.py to caption the dataset to different lengths.
Run gather_captioning_multi_complexity.py to create a metadata file for image-caption pairs of different lengths.
Run get_siglip_embeddings_img.py and get_siglip_embeddings_text.py to get siglip embeddings.
Run faiss_search.py to find the most similar images for each caption.
Run cluster_formation.py to get the clusters that verify the similarity threshold, minimum cluster size, and sample the needed captions for generaion.

Evaluation

Use the scripts in folder evaluation to compute different metrics.

Aesthetic score: Install aesthetic score. Run sbatch aesthetic_evaluator.sh under aesthetic_evaluation.
DSG score: Clone EvalGIM. Configure the dataset following https://github.com/facebookresearch/EvalGIM/?tab=readme-ov-file#add-your-own-datasets. We prepare the dataset class for CC12M with different complexities in cc12m_dataset_evalgim.py. Clone DSG and use gen_dsg.py to get question graphs. Run sbatch run_sbatch_evaluation_conditional.sh under consistency_evaluation.
Vendi score: Clone vendi score. Run python vendi_calculation.py under diversity_evaluation.
marginal metrics: Clone dgm-eval. Run sbatch run_evaluation_cc12m.sh under marginal_evaluation.

Dataset

Download the cc12m images following https://github.com/google-research-datasets/conceptual-12m. Put the images under metadata/cc12m/images/.
Run the scripts following Framework section, and get the clusters for each complexity. Then create the eval_imgs.zip containing all the remaining real images under metadata/cc12m/. In our scripts, we sharded the eval_imgs to 9 zip files.

License Info

This benchmark is licensed under CC-BY-NC 4.0.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
evaluation		evaluation
generation		generation
metadata/cc12m		metadata/cc12m
pipeline		pipeline
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
environment.yml		environment.yml
environment_faiss.yml		environment_faiss.yml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Evaluate T2I models along diversity, quality, consistency as functions of prompt complexity

Environment dependencies

Framework

Evaluation

Dataset

License Info

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

facebookresearch/synthetic_data_utility_prompt_complexity

Folders and files

Latest commit

History

Repository files navigation

Evaluate T2I models along diversity, quality, consistency as functions of prompt complexity

Environment dependencies

Framework

Evaluation

Dataset

License Info

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages