Published at ICPR 2026
This codebase builds upon MERU and HyCoCLIP.
Create and configure the environment using Conda:
git clone git@github.com:fdibiton/HAC.git
cd HAC
conda create -n hac python=3.9 --yes
conda activate hacInstall PyTorch and torchvision by following the official guide at pytorch.org. Then install the remaining dependencies and package:
python -m pip install --pre timm
python -m pip install -r requirements.txtThe HAC-B w/ LoRA checkpoint is hosted on Hugging Face and can be downloaded from here. Place the downloaded file in the ./checkpoints directory.
To run zero-shot VQA evaluation with the HAC-B w/ LoRA model:
python scripts/evaluate.py \
--config configs/eval_vqa_all_categories.py \
--train-config configs/train_hac_vit_b_lora.py \
--checkpoint-path checkpoints/hac_vit_b_lora.pthNote: The VQA evaluation datasets need to be downloaded and arranged beforehand. Please refer to the instructions in
scripts/vqa/README.mdfor details on how to obtain and set up the required datasets.
The GRIT dataset is required for training. Download the raw GRIT data in webdataset format and pre-process it to extract bounding box annotations. For detailed download and preparation steps, refer to the HyCoCLIP repository.
To train the HAC-B w/ LoRA model:
python scripts/train.py \
--config configs/train_hac_vit_b_lora.py \
--num-gpus 1 \
--output-dir <your_output_directory> \
--checkpoint-period 100000 \
--log-period 10Note: You need Euclidean CLIP checkpoint to initialize the model. You can download the ViT-B/16 checkpoint from the HyCoCLIP repository.
If you find this work useful, please cite:
@inproceedings{dibiton2026hac,
title={HAC: Parameter-Efficient Hyperbolic Adaptation of CLIP for Zero-Shot VQA},
author={Dibitonto, Francesco and Beyan, Cigdem and Murino, Vittorio},
booktitle={International Conference on Pattern Recognition (ICPR)},
year={2026}
}