Skip to content

Code for our CVPR"18 paper "Bidirectional Retrieval Made Simple"

Notifications You must be signed in to change notification settings

jwehrmann/chain-vse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Bidirectional Retrieval Made Simple

Code for our CVPR"18 paper Bidirectional Retrieval Made Simple. Given that the original code from our work cannot be publicly shared, we adapted the code from VSE++ in order to provide a public version.

Overview:

  1. Summary
  2. Results
  3. Getting started
  4. Train new models
  5. Evaluate models
  6. Citation
  7. License

Summary

Code for training and evaluating our novel CHAIN-VSE models for efficient multimodal retrieval (image annotation and caption retrieval). In summary, CHAIN-VSE uses convolutional layers directly over character-level inputs fully replacing the use of RNNs and word-embeddings. Despite being lighter and conceptually much simpler, those models achieve state-of-the-art results in MS COCO and in some text classification datasets.

chain noiseparam

Highlights

  • Independent from word-embeddings and RNNs
  • Naturally suited for multi-language scenarios without increase of memory requirements due to larger vocabulary
  • Much more robust to input noise
  • Fewer parameters
  • Simple, yet effective

Bidirectional Retrival Results

Results achieved using this repository (COCO-1k test set) using pre-computed features (note that we do not finetune the network in this experiment):

Method Features R@1 R@10 R@1 R@10
RFF-net [baseline@ICCV"17] ResNet152 56.40 91.50 43.90 88.60
chain-v1 (p=1, d=1024) resnet152_precomp 57.80 95.60 44.18 90.66
chain-v1 (p=1, d=2048) resnet152_precomp 59.90 94.80 45.08 90.54
chain-v1 (p=1, d=8192) resnet152_precomp 61.20 95.80 46.60 90.92

Getting Started

For getting started you will need to setup your environment and download the required data.

Dependencies

We recommended to use Anaconda for the following packages.

import nltk
nltk.download()
> d punkt

Download data

Pre-computed features:

wget http://lsa.pucrs.br/jonatas/seam-data/irv2_precomp.tar.gz
wget http://lsa.pucrs.br/jonatas/seam-data/resnet152_precomp.tar.gz
wget http://lsa.pucrs.br/jonatas/seam-data/vocab.tar.gz
  • The directory of the *_precomp.tar.gz files are referred as $DATA_PATH
  • Extract vocab.tar.gz to ./vocab directory (required for baselines only).

Training new models

Run train.py:

To train CHAIN-VSE (p=1, d=2048) using resnet152_precomp features, run:

python train.py \
--data_path "$DATA_PATH" \
--data_name resnet152_precomp \
--logger_name runs/chain-v1/resnet152_precomp/  \
--text_encoder chain-v1 \
--embed_size 2048 \
--vocab_path char

Evaluate pre-trained models

from vocab import Vocabulary
import evaluation
evaluation.evalrank("$RUN_PATH/model_best.pth.tar", data_path="$DATA_PATH", split="test")'

To evaluate in COCO-1cv test set, pass fold5=True with a model trained using --data_name coco.

Citation

If you found this code/paper useful, please cite the following papers:

@InProceedings{wehrmann2018cvpr,
author = {Wehrmann, Jônatas and Barros, Rodrigo C.},
title = {Bidirectional Retrieval Made Simple},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}

@article{faghri2017vse++,
  title={VSE++: Improving Visual-Semantic Embeddings with Hard Negatives},
  author={Faghri, Fartash and Fleet, David J and Kiros, Jamie Ryan and Fidler, Sanja},
  journal={arXiv preprint arXiv:1707.05612},
  year={2017}
}

License

Apache License 2.0

About

Code for our CVPR"18 paper "Bidirectional Retrieval Made Simple"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages