Skip to content

gimpong/AAAI21-WSDHQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WSDHQ: Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval

[toc]

1. Introduction

This repository provides the code for our paper at AAAI 2021:

Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval. Jinpeng Wang, Bin Chen, Qiang Zhang, Zaiqiao Meng, Shangsong Liang, Shu-Tao Xia. [link].

We proposed WSDHQ, a weakly supervised deep quantization approach for image retrieval. Instead of requiring ground-truth labels, WSDHQ leverages the informal tags provided by amateur users to guide quantization learning, which can alleviate the reliance on manual annotations and facilitate the feasibility of industrial deployment. In WSDHQ, we propose a tag processing mechanism based on correlation to enhance the weak semantics of such noisy tags. Besides, we learn quantized representations on the hypersphere manifold, on which we design a novel adaptive cosine margin loss for embedding learning and a supervised cosine quantization loss for quantization. Experiments on Flickr-25K and NUS-WIDE datasets demonstrate the superiority of WSDHQ.

In the following, we will guide you how to use this repository step by step. 🤗

2. Preparation

git clone https://github.com/gimpong/AAAI21-WSDHQ.git
cd AAAI21-WSDHQ/
tar -xvzf data.tar.gz
rm -f data.tar.gz

2.1 Requirements

  • python 3.7.8
  • numpy 1.19.1
  • scikit-learn 0.23.1
  • h5py 2.10.0
  • python-opencv 3.4.2
  • tqdm 4.51.0
  • tensorflow 1.15.0

2.2 Download image datasets and pre-trained models. Organize them properly

Before running the code, we need to make sure that everything needed is ready. First, the working directory is expected to be organized as below:

AAAI21-WSDHQ/
  • data/
    • flickr25k/
      • tags
        • FinalTagEmbs.txt
        • TagIdMergeMap.pkl
      • common_tags.txt
      • database_img.txt
      • database_label.txt
      • train_img.txt
      • train_tag.txt
      • test_img.txt
      • test_label.txt
    • nus-wide/
      • tags
        • FinalTagEmbs.txt
        • TagIdMergeMap.pkl
      • TagList1k.txt
      • database_img.txt
      • database_label.txt
      • train_img.txt
      • train_tag.txt
      • test_img.txt
      • test_label.txt
  • datasets/
    • GoogleNews-vectors-negative300.bin.gz
    • flickr25k/
      • mirflickr/
        • im1.jpg
        • im2.jpg
        • ...
    • nus-wide/
      • Flickr/
        • actor/
          • 0001_2124494179.jpg
          • 0002_174174086.jpg
          • ...
        • administrative_assistant/
          • ...
        • ...
  • scripts/
    • run0001.sh
    • run0002.sh
    • ...
    • tag_processing.sh
  • train.py
  • validation.py
  • net.py
  • net_val.py
  • util.py
  • dataset.py
  • alexnet.npy

Notes

  • The data/ folder is the collection of data splits for Flickr25K and NUS-WIDE datasets. The raw images of Flickr25K and NUS-WIDE datasets should be downloaded additionally and arranged in datasets/flickr25k/ and datasets/nus-wide/ respectively. Here we provide copies of these image datasets, you can download them via Google Drive or Baidu Wangpan (Web Drive, password: ocmv).

  • The pre-trained files of AlexNet (alexnet.npy) and Word2Vec (GoogleNews-vectors-negative300.bin.gz) can be downloaded from Baidu Wangpan (Web Drive, password: ocmv).

3. Enhance the weak semantic information of tags via preprocessing (Optional)

We have provided enhanced tag embeddings in this repository. See data/flickr25k/tags/ and data/nus-wide/tags/. If you want to reproduce these files, you can remove them and execute

cd scripts/
# '0' is the id of GPU
bash tag_processing.sh 0

4. Train and then evaluate

To facilitate reproducibility, we provide the scripts with configurations for each experiment. The scripts can be found under the scripts/ folder. For example, if you want to train and evaluate an 8-bit WSDHQ model on Flickr25K dataset, you can do

cd scripts/
# '0' is the id of GPU
bash run0001.sh 0

The script run0001.sh includes the running commands:

#!/bin/bash

cd ..

##8 bits
#                     dataset  lr      iter  lambda    subspace_num  loss   notes  gpu
python train.py       flickr   0.0003  800   0.0001    1             WSDQH  0001   $1
#                     dataset  model_weight                                                                 gpu
python validation.py  flickr   ./checkpoints/flickr_WSDQH_nbits=8_adaMargin_gamma=1_lambda=0.0001_0001.npy  $1

cd -

After running a script, a series of files will be saved under logs/ and checkpoints/. Take run0001.sh as an example:

AAAI21-WSDHQ/
  • logs/
    • flickr_WSDQH_nbits=8_adaMargin_gamma=1_lambda=0.0001_0001.log
  • checkpoints/
    • flickr_WSDQH_nbits=8_adaMargin_gamma=1_lambda=0.0001_0001.npy
    • flickr_WSDQH_nbits=8_adaMargin_gamma=1_lambda=0.0001_0001_retrieval.h5
  • ...

Here we report the results of running the scripts on a GTX 1080 Ti. Results are shown in the following table. We have also uploaded the logs and checkpoint information for reference, which can be downloaded from Baidu Wangpan (Web Drive, password: ocmv).

Note that some values can slightly deviate from the reported results in our original paper. The phenomenon is caused by the randomness of Tensorflow and the software and hardware discrepancies.

Script Dataset Code Length / bits MAP Log
run0001.sh Flickr25K 8 0.766 flickr_WSDQH_nbits=8_adaMargin_gamma=1_lambda=0.0001_0001.log
run0002.sh 16 0.755 flickr_WSDQH_nbits=16_adaMargin_gamma=1_lambda=0.0001_0002.log
run0003.sh 24 0.765 flickr_WSDQH_nbits=24_adaMargin_gamma=1_lambda=0.0001_0003.log
run0004.sh 32 0.767 flickr_WSDQH_nbits=32_adaMargin_gamma=1_lambda=0.0001_0004.log
run0005.sh NUS-WIDE 8 0.717 nuswide_WSDQH_nbits=8_adaMargin_gamma=1_lambda=0.0001_0005.log
run0006.sh 16 0.727 nuswide_WSDQH_nbits=16_adaMargin_gamma=1_lambda=0.0001_0006.log
run0007.sh 24 0.730 nuswide_WSDQH_nbits=24_adaMargin_gamma=1_lambda=0.0001_0007.log
run0008.sh 32 0.729 nuswide_WSDQH_nbits=32_adaMargin_gamma=1_lambda=0.0001_0008.log

5. References

If you find this code useful or use the toolkit in your work, please consider citing:

@inproceedings{wang2021wsdhq,
  title={Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval},
  author={Wang, Jinpeng and Chen, Bin and Zhang, Qiang and Meng, Zaiqiao and Liang, Shangsong and Xia, Shutao},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={35},
  number={4},
  pages={2755--2763},
  year={2021}
}

6. Acknowledgements

We use DeepHash as the code base in our implementation.

7. Contact

If you have any question, you can raise an issue or email Jinpeng Wang (wjp20@mails.tsinghua.edu.cn). We will reply you soon.

About

The code for the paper "Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval" (AAAI'21)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published