This is the official implementation of "Cross-Image Context Matters for Bongard Problems." [Paper] [Project Page]
We introduce two approaches for incorporating cross-image context when solving Bongard problems: support-set standardization and support-set Transformers. We attain state-of-the-art performance on two Bongard datasets. We additionally attain strong performance on a third Bongard dataset where comparison with prior methods is not possible due to different evaluation.
- State-of-the-art on Bongard-HOI
- State-of-the-art on Bongard-LOGO
- Strong performance on Bongard-Classic
To install all needed packages, run the following command:
pip install -r requirements.txt
For information on dataset setup, please see datasets/README.md
.
We include model checkpoints of our best-performing model, SVM-Mimic, on Bongard-HOI and Bongard-LOGO in this Google Drive folder. Please see the commands below to evaluate them. These models attain the following accuracy on Bongard-HOI and Bongard-LOGO test splits:
hoi_checkpoint.pt
unseen object, unseen act | unseen object, seen act | seen object, unseen act | seen object, seen act | Avg |
---|---|---|---|---|
69.70 | 71.04 | 78.43 | 71.30 | 72.62 |
logo_checkpoint.pt
freeform | combinatorial | basic shape | novel | Avg |
---|---|---|---|---|
73.33 | 69.75 | 83.85 | 74.53 | 75.37 |
All command-line arguments for both training and testing are documented in utils/args.py
. We briefly mention the most important ones here:
--dataset
can be used to specify which dataset to train/evaluate on. We currently support one ofhoi
,logo
orclassic
. You can add more by modifying theget_loaders()
function indatasets/__init__.py
.--arch
can be used to specify the encoder backbone. We currently support one ofcustom
,clip
, ordino_vit_base
. You can add more by modifying theget_encoder()
function inmodel/encoder.py
.--train_encoder
specifies whether to keep the encoder frozen or not. To train with thecustom
encoder or with the PMF objective as explained in the paper,--train_encoder
must be set.--balance_dataset
specifies whether to use our cleaned Bongard-HOI dataset annotations, as explained in the paper. It is only useful to set this argument at train time as we have no cleaned annotations on the test set.--baseline
can be used to run one of the baselines in the paper (one ofSVM
,PROTOTYPE
, orKNN
). This is only used at test time.
We log various metrics (accuracy, loss, etc.) to Tensorboard, storing runs data in the runs/
directory. All code for this logic is in eval.py
and main.py
. Feel free to remove this logging or change the code to log using a different platform.
To train SVM-Mimic with our hyperparameter choices, run the following command:
python3 main.py /path/to/hoi/dataset --batch_size 16 --lr 5e-5 --use_augs --arch clip --balance_dataset --train_steps 10000 --use_scheduler --dropout_support_prob 1.0 --label_noise_support_prob 0.25
To train Prototype-Mimic, run the following command:
python3 main.py /path/to/hoi/dataset --batch_size 16 --lr 5e-5 --use_augs --arch clip --balance_dataset --train_steps 10000 --use_scheduler --dropout_support_prob 1.0 --label_noise_support_prob 0.25 --mimic_kind PROTOTYPE
To test SVM-Mimic, run the following command:
python3 test.py /path/to/hoi/dataset --batch_size 1 --arch clip --load_path checkpoints/name_of_hoi_checkpoint
To test Prototype-Mimic, run the following command:
python3 test.py /path/to/hoi/dataset --batch_size 1 --arch clip --load_path checkpoints/name_of_hoi_checkpoint --mimic_kind PROTOTYPE
To test SVM, Prototype, and KNN baselines with and without standardization, run the following:
python3 test.py /path/to/hoi/dataset --batch_size 1 --arch clip --baseline SVM
python3 test.py /path/to/hoi/dataset --batch_size 1 --arch clip --baseline SVM --eval_standardize
python3 test.py /path/to/hoi/dataset --batch_size 1 --arch clip --baseline PROTOTYPE
python3 test.py /path/to/hoi/dataset --batch_size 1 --arch clip --baseline PROTOTYPE --eval_standardize
python3 test.py /path/to/hoi/dataset --batch_size 1 --arch clip --baseline KNN --k 3
python3 test.py /path/to/hoi/dataset --batch_size 1 --arch clip --baseline KNN --k 3 --eval_standardize
To test SVM, Prototype, and KNN baselines with different forms of normalization, run the following:
python3 test.py /path/to/hoi/dataset --batch_size 1 --arch clip --baseline SVM --eval_normalize_l2
python3 test.py /path/to/hoi/dataset --batch_size 1 --arch clip --baseline SVM --eval_standardize_train checkpoints/dataset_statistics.pt
python3 test.py /path/to/hoi/dataset --batch_size 1 --arch clip --baseline PROTOTYPE --eval_standardize_train checkpoints/dataset_statistics.pt
python3 test.py /path/to/hoi/dataset --batch_size 1 --arch clip --baseline KNN --k 3 --eval_standardize_train checkpoints/dataset_statistics.pt
To train PMF, run the following:
python3 main.py /path/to/hoi/dataset --batch_size 4 --lr 5e-7 --use_augs --arch clip --balance_dataset --train_steps 40000 --use_scheduler --dropout_support_prob 1.0 --use_pmf --train_encoder
To train PMF + SVM-Mimic, run the following (for Prototype-Mimic, append --mimic_kind PROTOTYPE
):
python3 main.py /path/to/hoi/dataset --batch_size 16 --lr 5e-5 --use_augs --arch clip --balance_dataset --train_steps 10000 --use_scheduler --dropout_support_prob 1.0 --label_noise_support_prob 0.25 --load_path checkpoints/name_of_pmf_checkpoint
To test PMF + SVM-Mimic, note that it is necessary to load both the PMF encoder (with --load_encoder
) and the SVM-Mimic Transformer model (with --load_path
). You can use the following command:
python3 test.py /path/to/hoi/dataset --batch_size 1 --arch clip --load_encoder checkpoints/name_of_pmf_checkpoint --load_path checkpoints/name_of_hoi_svm_mimic_checkpoint
To test PMF backbones with various baselines, run the following (append --eval_standardize
to standardize, and change SVM
to the baseline of choice). Note that to obtain the PMF + standardize
results, it is sufficient to run with baseline PROTOTYPE
and set --eval_standardize
.
python3 test.py /path/to/hoi/dataset --batch_size 1 --arch clip --load_path checkpoints/name_of_pmf_checkpoint --use_pmf --train_encoder --baseline SVM
Bongard-LOGO commands are similar. E.g., to train SVM-Mimic on Bongard-LOGO with our hyperparameter choices, run the following:
python3 main.py /path/to/logo/dataset --dataset logo --arch custom --train_encoder --batch_size 2 --train_steps 500000 --use_scheduler --dropout_support_prob 1.0 --lr 5e-5 --temperature 0.1 --resolution 512 --use_augs --weight_decay 0.0001
To test SVM-Mimic on Bongard-LOGO, run the following (note that --train_encoder
is set even at test time):
python3 test.py /path/to/logo/dataset --dataset logo --arch custom --train_encoder --resolution 512 --temperature 0.1 --batch_size 1 --load_path checkpoints/name_of_logo_checkpoint
We have no training pipeline on Bongard-Classic, only evaluation pipelines involving encoders and support-set Transformers pre-trained on Bongard-LOGO. To evaluate SVM-Mimic (trained on Bongard-LOGO) on Bongard-Classic, run the following command:
python3 test.py /path/to/classic/dataset --dataset classic --arch custom --train_encoder --batch_size 1 --resolution 512 --temperature 0.1 --load_path checkpoints/name_of_logo_checkpoint
To evaluate with Prototype-Mimic or one of the baselines, append --mimic_kind PROTOTYPE
or --baseline BASELINE_NAME
to the command. See the Bongard-HOI commands for examples.
If you use our project to inform your work, please consider citing us. Thank you!
@article{raghuraman2023cross,
title={Cross-Image Context Matters for Bongard Problems},
author={Nikhil Raghuraman and Adam W. Harley and Leonidas Guibas},
year={2023},
journal={arXiv:1804.04452},
}