Dual-Path Convolutional Image-Text Embedding
This repository contains the code for our paper Dual-Path Convolutional Image-Text Embedding. Thank you for your kindly attention.
What's New: We updated the paper to the second version, adding more illustration about the mechanism of the proposed instance loss.
Extract wrod2vec weights. Follow the instruction in
Prepocess the dataset. Follow the instruction in
./dataset. You can choose one dataset to run. Three datasets need different prepocessing. I write the instruction for Flickr30k, MSCOCO and CUHK-PEDES.
Download the model pre-trained on ImageNet. And put the model into './data'.
(bash) wget http://www.vlfeat.org/matconvnet/models/imagenet-resnet-50-dag.mat
You may download the three trained models from GoogleDrive.
- For Flickr30k, run
train_flickr_word2_1_pool.mfor Stage I training.
train_flickr_word_Rankloss_shift_hard for Stage II training.
- For MSCOCO, run
train_coco_word2_1_pool.mfor Stage I training.
train_coco_Rankloss_shift_hard.m for Stage II training.
- For CUHK-PEDES, run
train_cuhk_word2_1_pool.mfor Stage I training.
train_cuhk_word_Rankloss_shift for Stage II training.
Select one model and have fun!
For Flickr30k, run
test/extract_pic_feature_word2_plus_52.mand to extract the feature from image and text. Note that you need to change the model path in the code.
For MSCOCO, run
test_coco/extract_pic_feature_word2_plus.mand to extract the feature from image and text. Note that you need to change the model path in the code.
For CUHK-PEDES, run
test_cuhk/extract_pic_feature_word2_plus_52.mand to extract the feature from image and text. Note that you need to change the model path in the code.
Get word2vec weight
Data Preparation (Flickr30k)
Train on Flickr30k
Test on Flickr30k
Data Preparation (MSCOCO)
Train on MSCOCO
Test on MSCOCO
Data Preparation (CUHK-PEDES)
Train on CUHK-PEDES
Test on CUHK-PEDES
Run the code on another machine