Multi-modal Multi-lingual image sentence ranking

This code is based on the official code base for "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives" (Faghri, Fleet, Kiros, Fidler. 2017).

Dependencies

We recommended to use Anaconda for the following packages.

Python 2.7
PyTorch (>0.2)
NumPy (>1.12.1)
TensorBoard
Punkt Sentence Tokenizer:

import nltk
nltk.download()
> d punkt

Download data

Download the Multi30K the caption data by cloneing the official repo:

git clone https://github.com/multi30k/dataset

The pre-computed image-features for Multi30K are available on google-drive.

To run expmerinemts on COCO and F30K download the dataset files and pre-trained models. Splits are the same as Andrej Karpathy. The precomputed image features are from here and here. To use full image encoders, download the images from their original sources here, here and here.

wget http://www.cs.toronto.edu/~faghri/vsepp/vocab.tar
wget http://www.cs.toronto.edu/~faghri/vsepp/data.tar
wget http://www.cs.toronto.edu/~faghri/vsepp/runs.tar

Experiments in the paper

All commands should be concatenated with --data_name m30k --img_dim 2048 --max_violation --patience 10
and given a --seed.

Tables 2-3

Method	Arguments
Monolingual English	`--lang en --num_epochs 1000`
Monolingual German	`--lang de --num_epochs 1000`
Bilingual	`--lang en-de`
Bilingual + c2c	`--lang en-de --sentencepair`

Table 4

Method	Arguments
Monolingual English	`--lang en1 --num_epochs 1000`
Monolingual German	`--lang de1 --num_epochs 1000`
Bi-translation	`--lang en1-de1`
Bi-translation + c2c	`--lang en1-de1 --sentencepair`
Bi-comperable	`--lang en-de --undersample`
Bi-comperable + c2c	`--lang en-de --undersample --sentencepair`

Table 5

Method	Arguments
Full Monolingual English	`--lang en --num_epochs 1000`
Full Monolingual German	`--lang de --num_epochs 1000`
Half Monolingual English	`--lang en --half --num_epochs 1000`
Half Monolingual German	`--lang de --half --num_epochs 1000`
Bi-aligned	`--lang en-de --half`
Bi-aligned + c2c	`--lang en-de --half --sentencepair`
Bi-disjoint	`--lang en-de --half --disaligned`

Table 6

Method	Arguments
Monolingual English	`--lang en1 --num_epochs 1000`
Monolingual German	`--lang de1 --num_epochs 1000`
Monolingual French	`--lang fr --num_epochs 1000`
Monolingual Czech	`--lang cs --num_epochs 1000`
Multi-translation	`--lang en1-de1-fr-cs`
Multi-translation + c2c	`--lang en1-de1-fr-cs --sentencepair`
Multi-comperable	`--lang en-de-fr-cs --undersample`
Multi-comperable + c2c	`--lang en-de-fr-cs --undersample --sentencepair`

Table 7

Method	Arguments
Monolingual French	`--lang fr --num_epochs 1000`
Monolingual Czech	`--lang cs --num_epochs 1000`
Multilingual French	`--lang en1-de1-fr-cs --primary fr`
Multilingual Czech	`--lang en1-de1-fr-cs --primary cs`
+ Comparable French	`--lang en-de-fr-cs --primary fr`
+ Comparable Czech	`--lang en-de-fr-cs --primary cs`
+ Comparable + c2c French	`--lang en-de-fr-cs --primary fr --sentencepair`
+Comparable + c2c Czech	`--lang en-de-fr-cs --primary cs --sentencepair`

Reference

If you found this code useful, please cite the following paper:

@article{kadar2018lessons,
title={Lessons learned in multilingual grounded language learning},
author={K{'a}d{'a}r, {'A}kos and Elliott, Desmond and C{^o}t{'e}, Marc-Alexandre and Chrupa{\l}a, Grzegorz and Alishahi, Afra},
journal={arXiv preprint arXiv:1809.07615},
year={2018}
}

License

Apache License 2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Multi-modal Multi-lingual image sentence ranking

Dependencies

Download data

Experiments in the paper

Tables 2-3

Table 4

Table 5

Table 6

Table 7

Reference

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Multi-modal Multi-lingual image sentence ranking

Dependencies

Download data

Experiments in the paper

Tables 2-3

Table 4

Table 5

Table 6

Table 7

Reference

License