Base environment configuration are in file envirionment.yml.
For faiss search, use environment in file environment_faiss.yml.
For evaluation, we adopt several packages:
- For aesthetic_evaluation, check https://github.com/discus0434/aesthetic-predictor-v2-5
- For diversity_evaluation, check https://github.com/vertaix/Vendi-Score
- For marginal_evaluation, check https://github.com/layer6ai-labs/dgm-eval/
- For consistency_evaluation, check https://github.com/facebookresearch/EvalGIM/
Use the scripts in folder pipeline to create datasets of different complexities and search clusters.
- Run
vlm_captioning_multi_compleixity.pyto caption the dataset to different lengths. - Run
gather_captioning_multi_complexity.pyto create a metadata file for image-caption pairs of different lengths. - Run
get_siglip_embeddings_img.pyandget_siglip_embeddings_text.pyto get siglip embeddings. - Run
faiss_search.pyto find the most similar images for each caption. - Run
cluster_formation.pyto get the clusters that verify the similarity threshold, minimum cluster size, and sample the needed captions for generaion.
Use the scripts in folder evaluation to compute different metrics.
- Aesthetic score: Install aesthetic score. Run
sbatch aesthetic_evaluator.shunderaesthetic_evaluation. - DSG score: Clone EvalGIM. Configure the dataset following https://github.com/facebookresearch/EvalGIM/?tab=readme-ov-file#add-your-own-datasets. We prepare the dataset class for CC12M with different complexities in
cc12m_dataset_evalgim.py. Clone DSG and usegen_dsg.pyto get question graphs. Runsbatch run_sbatch_evaluation_conditional.shunderconsistency_evaluation. - Vendi score: Clone vendi score. Run
python vendi_calculation.pyunderdiversity_evaluation. - marginal metrics: Clone dgm-eval. Run
sbatch run_evaluation_cc12m.shundermarginal_evaluation.
- Download the cc12m images following https://github.com/google-research-datasets/conceptual-12m. Put the images under
metadata/cc12m/images/. - Run the scripts following Framework section, and get the clusters for each complexity. Then create the
eval_imgs.zipcontaining all the remaining real images undermetadata/cc12m/. In our scripts, we sharded the eval_imgs to 9 zip files.
This benchmark is licensed under CC-BY-NC 4.0.