Multilingual image sentence ranking
Switch branches/tags
Nothing to show
Clone or download
Latest commit ea86539 Oct 3, 2018

Multi-modal Multi-lingual image sentence ranking

This code is based on the official code base for "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives" (Faghri, Fleet, Kiros, Fidler. 2017).


We recommended to use Anaconda for the following packages.

import nltk
> d punkt

Download data

Download the Multi30K the caption data by cloneing the official repo:

git clone

The pre-computed image-features for Multi30K are available on google-drive.

To run expmerinemts on COCO and F30K download the dataset files and pre-trained models. Splits are the same as Andrej Karpathy. The precomputed image features are from here and here. To use full image encoders, download the images from their original sources here, here and here.


Experiments in the paper

All commands should be concatenated with --data_name m30k --img_dim 2048 --max_violation --patience 10
and given a --seed.

Tables 2-3

Method Arguments
Monolingual English --lang en --num_epochs 1000
Monolingual German --lang de --num_epochs 1000
Bilingual --lang en-de
Bilingual + c2c --lang en-de --sentencepair

Table 4

Method Arguments
Monolingual English --lang en1 --num_epochs 1000
Monolingual German --lang de1 --num_epochs 1000
Bi-translation --lang en1-de1
Bi-translation + c2c --lang en1-de1 --sentencepair
Bi-comperable --lang en-de --undersample
Bi-comperable + c2c --lang en-de --undersample --sentencepair

Table 5

Method Arguments
Full Monolingual English --lang en --num_epochs 1000
Full Monolingual German --lang de --num_epochs 1000
Half Monolingual English --lang en --half --num_epochs 1000
Half Monolingual German --lang de --half --num_epochs 1000
Bi-aligned --lang en-de --half
Bi-aligned + c2c --lang en-de --half --sentencepair
Bi-disjoint --lang en-de --half --disaligned

Table 6

Method Arguments
Monolingual English --lang en1 --num_epochs 1000
Monolingual German --lang de1 --num_epochs 1000
Monolingual French --lang fr --num_epochs 1000
Monolingual Czech --lang cs --num_epochs 1000
Multi-translation --lang en1-de1-fr-cs
Multi-translation + c2c --lang en1-de1-fr-cs --sentencepair
Multi-comperable --lang en-de-fr-cs --undersample
Multi-comperable + c2c --lang en-de-fr-cs --undersample --sentencepair

Table 7

Method Arguments
Monolingual French --lang fr --num_epochs 1000
Monolingual Czech --lang cs --num_epochs 1000
Multilingual French --lang en1-de1-fr-cs --primary fr
Multilingual Czech --lang en1-de1-fr-cs --primary cs
+ Comparable French --lang en-de-fr-cs --primary fr
+ Comparable Czech --lang en-de-fr-cs --primary cs
+ Comparable + c2c French --lang en-de-fr-cs --primary fr --sentencepair
+Comparable + c2c Czech --lang en-de-fr-cs --primary cs --sentencepair


If you found this code useful, please cite the following paper:

title={Lessons learned in multilingual grounded language learning},
author={K{'a}d{'a}r, {'A}kos and Elliott, Desmond and C{^o}t{'e}, Marc-Alexandre and Chrupa{\l}a, Grzegorz and Alishahi, Afra},
journal={arXiv preprint arXiv:1809.07615},


Apache License 2.0