We rely on human evaluation using AMT. Please refer to the paper for more details.
First, download the UniDet's weights and configurations.
Then, run the inference code to generate the bounding boxes and save them as follows:
python demo.py --config-file configs/Partitioned_COI_RS101_2x.yaml \
--input "../../../../data/t2i_out/struct_diff/synthetic_counting/*" --pkl_pth "../../counting/struct_diff_pred_synthetic_counting.pkl" \
--output "../../../../data/metrics/det/unified_det" --opts MODEL.WEIGHTS "../../../../weights/unified_det/Partitioned_COI_RS101_2x.pth"
Finally, run the calc_counting_acc.py to calculate the counting accuracy, as follows:
python calc_counting_acc.py [Input pkl path] [GT-csv] [Number of Iteration]
Where:
1) Input pkl path: is the output file from runing UniDet as shown in the above steps.
2) GT-csv: is the csv file which can be generated by running the prompt generation code or downloaded directly from our published prompts.
3) Number of Iteration: is integer number stand for number of runs, then we take the average of their scores. (should be between 1-3)
We adopt TextSnake for text detection and SAR for text recognition from mmocr framework.
First, run the inference code to detect and recognize the text in the scene and save them as follows:
python inference.py [Input Images Directory] [Output Images Directory] [Recognation Thershold] [Saving pkl Name]
For instance:
python inference.py "../../data/t2i_out/dalle_v2/writing/" "../../data/metrics/writing/" 60 '../mmocr_pred_dalle_v2_writing.pkl'
Then, run the calc_writing_acc.py to calculate the visual-text accuracy, as follows:
python calc_writing_acc.py [GT-csv] [Input pkl path] [Number of Iteration]
For instance:
python calc_writing_acc.py '../../prompt_gen/writing_prompts/synthetic_writing_prompts.csv' 'mmocr_pred_dalle_v2_writing.pkl' 3
AC_T2I --> We will publish the eval script soon, please stay tuned.
CLIPScore --> We adopt CLIPScore, with some adjustments to make it more efficient by evaluating all samples at once. We have added a file called clipscore_group.py for this purpose. All dependencies are the same as the official CLIPScore codebase.
BLEU and CIDEr --> We use the pycocoevalcap for calculating BLEU and CIDEr scores.
Data --> All data used for evaluation is provided in the benchmark_data directory. The following files are available:
- [skill]_[model_name].json: this file saves all generated captions using the specified model.
- [skill]_[model_name]_cider.json: this file saves all CIDEr scores for each sample.
- [skill]_[model_name]_bleu.json: this file saves all BLEU scores for each sample.
We will publish the eval script soon, please stay tuned.
We will publish the eval script soon, please stay tuned.
We adopted LAION fast retrieval tool to retrieve training data (Nearest Neighbours) from LAION with text prompts for creativity skill.
python retrieve.py --file_path path/to/prompt --save_path path/to/prompt_with_img
And save the training data clip embedding with CLIP
model, preprocess = clip.load('ViT-B/32', device=device)
image = Image.open(image_path).convert('RGB')
inputs = preprocess(images=image, return_tensors="pt").to(device, torch.float16)
with torch.no_grad():
gen_image_embd = model.encode_image(inputs).half()
Then run:
python eval_creativity.py --name modelname --file_path path/to/prompt --img_emb_path path/to/embedding
Prepare environment with pycocoevalcap and transformers (BLIP2)
conda env create -f environment.yaml
Then run:
python eval_action.py --name modelname --file_path path/to/prompt
First, download the UniDet's weights and configurations.
Then, run the inference code to generate the bounding boxes and save them as follows:
python demo.py --config-file configs/Partitioned_COI_RS101_2x.yaml \
--input "../../../../data/t2i_out/struct_diff/synthetic_counting/*" --pkl_pth "../../counting/struct_diff_pred_synthetic_counting.pkl" \
--output "../../../../data/metrics/det/unified_det" --opts MODEL.WEIGHTS "../../../../weights/unified_det/Partitioned_COI_RS101_2x.pth"
Finally, run the calc_spatial_relation_acc.py to calculate the spatial composition accuracy, as follows:
python calc_spatial_relation_acc.py [Input pkl path] [GT-csv] [Number of Iteration]
First, download the UniDet's weights and configurations.
Then, run the inference code to generate the bounding boxes and save them as follows:
python demo.py --config-file configs/Partitioned_COI_RS101_2x.yaml \
--input "../../../../data/t2i_out/struct_diff/synthetic_counting/*" --pkl_pth "../../counting/struct_diff_pred_synthetic_counting.pkl" \
--output "../../../../data/metrics/det/unified_det" --opts MODEL.WEIGHTS "../../../../weights/unified_det/Partitioned_COI_RS101_2x.pth"
Finally, run the calc_size_comp_acc.py to calculate the size composition accuracy, as follows:
python calc_size_comp_acc.py [Input pkl path] [GT-csv] [Number of Iteration]
We adopt MaskDINO [CVPr 2023] for the instance segmentation.
First, run the inference code to predict the masks for each instance and save them as follows:
python demo.py [Config File] [Input Images Directory] [Output Images Directory] [Model Weights]
For instance:
python demo.py --config-file 'T2I_benchmark/codes/eval_metrics/colors/MaskDINO/configs/coco/instance-segmentation/swin/maskdino_R50_bs16_50ep_4s_dowsample1_2048.yaml' \
--input 't2i_benchmark/data/t2i_out/sd_v1/colors/*.png' \
--output T2I_benchmark/data/colors/output/sd_v1/ \
--opts MODEL.WEIGHTS T2I_benchmark/weights/mask_dino/maskdino_swinl_50ep_300q_hid2048_3sd1_instance_maskenhanced_mask52.3ap_box59.pth
Then, run the hue_based_color_classifier.py to calculate the color composition accuracy, as follows:
python hue_based_color_classifier.py [Generated Masks Directory] [GT-csv] [T2I Model Output Directory]
For instance:
python hue_based_color_classifier.py 'T2I_benchmark/data/colors/output/sd_v1' 'T2I_benchmark/data/colors/colors_composition_prompts.csv' 't2i_benchmark/data/t2i_out/sd_v1/colors'
We will publish the eval script soon, please stay tuned.
We will publish the eval script soon, please stay tuned.