Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer (AAAI 2023 Oral)

This is the official repository of our paper Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge Transfer.

Setup

pip install -r requirements.txt

Preparation

Download pretrained VLP(ViT-B/16) model from OpenAI CLIP.
Download images of NUS-WIDE dataset from NUS-WIDE.
Download annotations following the BiAM from here.
Download other files from here.

The organization of the dataset directory is shown as follows.

NUS-WIDE
  ├── features
  ├── Flickr
  ├── Concepts81.txt
  ├── Concepts925.txt
  ├── img_names.pkl
  ├── label_emb.pt
  └── test_img_names.pkl

Training MKT on NUS-WIDE

python3 train_nus_first_stage.py \
        --data-path path_to_dataset \
        --clip-path path_to_clip_model

The checkpoint of the first training stage is here.

python3 -m torch.distributed.launch --nproc_per_node=8 train_nus_second_stage.py \
        --data-path path_to_dataset \
        --clip-path path_to_clip_model \
        --ckpt-path path_to_first_stage_ckpt

The checkpoint of the second training stage is here.

Testing MKT on NUS-WIDE

python3 train_nus_second_stage.py --eval \
        --data-path path_to_dataset \
        --clip-path path_to_clip_model \
        --ckpt-path path_to_first_stage_ckpt \
        --eval-ckpt path_to_first_second_ckpt \

Inference on A Single Image

python3 inference.py \
        --data-path path_to_dataset \
        --clip-path path_to_clip_model \
        --img-ckpt path_to_first_stage_ckpt \
        --txt-ckpt path_to_second_stage_ckpt \
        --image-path figures/test.jpg

Acknowledgement

We would like to thank BiAM and timm for the codebase.

License

MKT is MIT-licensed. The license applies to the pre-trained models as well.

Citation

Consider cite MKT in your publications if it helps your research.

@article{he2022open,
  title={Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge Transfer},
  author={He, Sunan and Guo, Taian and Dai, Tao and Qiao, Ruizhi and Ren, Bo and Xia, Shu-Tao},
  journal={arXiv preprint arXiv:2207.01887},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
clip		clip
figures		figures
models		models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
engine_nus_first_stage.py		engine_nus_first_stage.py
engine_nus_second_stage.py		engine_nus_second_stage.py
inference.py		inference.py
requirements.txt		requirements.txt
train_nus_first_stage.py		train_nus_first_stage.py
train_nus_second_stage.py		train_nus_second_stage.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer (AAAI 2023 Oral)

Setup

Preparation

Training MKT on NUS-WIDE

Testing MKT on NUS-WIDE

Inference on A Single Image

Acknowledgement

License

Citation

About

Languages

License

sunanhe/MKT

Folders and files

Latest commit

History

Repository files navigation

Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer (AAAI 2023 Oral)

Setup

Preparation

Training MKT on NUS-WIDE

Testing MKT on NUS-WIDE

Inference on A Single Image

Acknowledgement

License

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages