Evaluations:

📌 Fidelity

We rely on human evaluation using AMT. Please refer to the paper for more details.

📌 Counting

First, download the UniDet's weights and configurations.

Then, run the inference code to generate the bounding boxes and save them as follows:

python demo.py --config-file configs/Partitioned_COI_RS101_2x.yaml \
--input "../../../../data/t2i_out/struct_diff/synthetic_counting/*" --pkl_pth "../../counting/struct_diff_pred_synthetic_counting.pkl" \
--output "../../../../data/metrics/det/unified_det" --opts MODEL.WEIGHTS "../../../../weights/unified_det/Partitioned_COI_RS101_2x.pth"

Finally, run the calc_counting_acc.py to calculate the counting accuracy, as follows:

python calc_counting_acc.py [Input pkl path] [GT-csv] [Number of Iteration]

Where:

1) Input pkl path: is the output file from runing UniDet as shown in the above steps.
2) GT-csv: is the csv file which can be generated by running the prompt generation code or downloaded directly from our published prompts.
3) Number of Iteration: is integer number stand for number of runs, then we take the average of their scores. (should be between 1-3)

📌 Visual-Text

We adopt TextSnake for text detection and SAR for text recognition from mmocr framework.

First, run the inference code to detect and recognize the text in the scene and save them as follows:

python inference.py [Input Images Directory] [Output Images Directory] [Recognation Thershold] [Saving pkl Name]

For instance:

python inference.py "../../data/t2i_out/dalle_v2/writing/" "../../data/metrics/writing/" 60 '../mmocr_pred_dalle_v2_writing.pkl'

Then, run the calc_writing_acc.py to calculate the visual-text accuracy, as follows:

python calc_writing_acc.py [GT-csv] [Input pkl path] [Number of Iteration]

For instance:

python calc_writing_acc.py '../../prompt_gen/writing_prompts/synthetic_writing_prompts.csv' 'mmocr_pred_dalle_v2_writing.pkl' 3

📌 Emotions

AC_T2I --> We will publish the eval script soon, please stay tuned.

CLIPScore --> We adopt CLIPScore, with some adjustments to make it more efficient by evaluating all samples at once. We have added a file called clipscore_group.py for this purpose. All dependencies are the same as the official CLIPScore codebase.

BLEU and CIDEr --> We use the pycocoevalcap for calculating BLEU and CIDEr scores.

Data --> All data used for evaluation is provided in the benchmark_data directory. The following files are available:

[skill]_[model_name].json: this file saves all generated captions using the specified model.
[skill]_[model_name]_cider.json: this file saves all CIDEr scores for each sample.
[skill]_[model_name]_bleu.json: this file saves all BLEU scores for each sample.

📌 Consistency

We will publish the eval script soon, please stay tuned.

📌 Typos

We will publish the eval script soon, please stay tuned.

📌 Creativity

We adopted LAION fast retrieval tool to retrieve training data (Nearest Neighbours) from LAION with text prompts for creativity skill.

python retrieve.py --file_path path/to/prompt --save_path path/to/prompt_with_img

And save the training data clip embedding with CLIP

model, preprocess = clip.load('ViT-B/32', device=device)
image = Image.open(image_path).convert('RGB')
inputs = preprocess(images=image, return_tensors="pt").to(device, torch.float16)
with torch.no_grad():
	gen_image_embd = model.encode_image(inputs).half()

Then run:

python eval_creativity.py --name modelname --file_path path/to/prompt --img_emb_path path/to/embedding

📌 Compositions

Action Composition:

Prepare environment with pycocoevalcap and transformers (BLIP2)

conda env create -f environment.yaml

Then run:

python eval_action.py --name modelname --file_path path/to/prompt

Spatial Composition:

First, download the UniDet's weights and configurations.

Then, run the inference code to generate the bounding boxes and save them as follows:

python demo.py --config-file configs/Partitioned_COI_RS101_2x.yaml \
--input "../../../../data/t2i_out/struct_diff/synthetic_counting/*" --pkl_pth "../../counting/struct_diff_pred_synthetic_counting.pkl" \
--output "../../../../data/metrics/det/unified_det" --opts MODEL.WEIGHTS "../../../../weights/unified_det/Partitioned_COI_RS101_2x.pth"

Finally, run the calc_spatial_relation_acc.py to calculate the spatial composition accuracy, as follows:

python calc_spatial_relation_acc.py [Input pkl path] [GT-csv] [Number of Iteration]

Size Composition:

First, download the UniDet's weights and configurations.

Then, run the inference code to generate the bounding boxes and save them as follows:

python demo.py --config-file configs/Partitioned_COI_RS101_2x.yaml \
--input "../../../../data/t2i_out/struct_diff/synthetic_counting/*" --pkl_pth "../../counting/struct_diff_pred_synthetic_counting.pkl" \
--output "../../../../data/metrics/det/unified_det" --opts MODEL.WEIGHTS "../../../../weights/unified_det/Partitioned_COI_RS101_2x.pth"

Finally, run the calc_size_comp_acc.py to calculate the size composition accuracy, as follows:

python calc_size_comp_acc.py [Input pkl path] [GT-csv] [Number of Iteration]

Color Composition:

We adopt MaskDINO [CVPr 2023] for the instance segmentation.

First, run the inference code to predict the masks for each instance and save them as follows:

python demo.py [Config File] [Input Images Directory] [Output Images Directory] [Model Weights]

For instance:

python demo.py --config-file 'T2I_benchmark/codes/eval_metrics/colors/MaskDINO/configs/coco/instance-segmentation/swin/maskdino_R50_bs16_50ep_4s_dowsample1_2048.yaml' \
--input 't2i_benchmark/data/t2i_out/sd_v1/colors/*.png' \
--output T2I_benchmark/data/colors/output/sd_v1/ \
--opts MODEL.WEIGHTS T2I_benchmark/weights/mask_dino/maskdino_swinl_50ep_300q_hid2048_3sd1_instance_maskenhanced_mask52.3ap_box59.pth

Then, run the hue_based_color_classifier.py to calculate the color composition accuracy, as follows:

python hue_based_color_classifier.py [Generated Masks Directory] [GT-csv] [T2I Model Output Directory]

For instance:

python hue_based_color_classifier.py  'T2I_benchmark/data/colors/output/sd_v1' 'T2I_benchmark/data/colors/colors_composition_prompts.csv' 't2i_benchmark/data/t2i_out/sd_v1/colors'

📌 Fairness

We will publish the eval script soon, please stay tuned.

📌 Bias

We will publish the eval script soon, please stay tuned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Evaluations:

📌 Fidelity

📌 Counting

📌 Visual-Text

📌 Emotions

📌 Consistency

📌 Typos

📌 Creativity

📌 Compositions

Action Composition:

Spatial Composition:

Size Composition:

Color Composition:

📌 Fairness

📌 Bias

Files

README.md

Latest commit

History

README.md

File metadata and controls

Evaluations:

📌 Fidelity

📌 Counting

📌 Visual-Text

📌 Emotions

📌 Consistency

📌 Typos

📌 Creativity

📌 Compositions

Action Composition:

Spatial Composition:

Size Composition:

Color Composition:

📌 Fairness

📌 Bias