This code implements the algorithms described in the paper:
"Bridging Languages through Images with Deep Partial Canonical Correlation Analysis" by Guy Rotman, Ivan Vulić and Roi Reichart.
Please cite the paper if you are using this code.
The code was implemented in python 3.6.3 with anaconda environment.
All requirements are included in the requirements.txt file.
They can be installed by running the following command from the command line:
pip install -r requirements.txt
.h5 files with all samples of the WIW dataset (including textual and visual features) are available in the following link: WIW Feature Set.
After downloading the "wiw_data.zip" file, please unzip it into the "data" directory.
Lastly, in order to run the models please make sure to first split the WIW dataset to train/val/test by running the following command from the command line:
python split_wiw.py
The full dataset (including the set of images) can be downloaded from the following link: WIW Dataset
The directory contains the following models:
- dpcca_a.py - An implementation for Deep Partial Canonical Correlation Analysis by the NOI optimization algorithm for variant A.
- dpcca_b.py - An implementation for Deep Partial Canonical Correlation Analysis by the NOI optimization algorithm for variant B.
-
Each model can be executed (training + evaluation) by running the following command from the command line:
python model_name.py
(e.g.python dpcca_a.py
). -
The architectures of the models are implemented in the model_architecture.py file.
All hyperparameters and default settings appear in the cfg.py file. A detailed explanation of them appears inside the file.
The cross-lingual word retrieval task can be described as follows: Given a word in one language the goal is to retrieve the correct translation of it from a lexicon of a second language.
- To train and test on the EN-DE version of WIW please set in the cfg.py file:
self.feats = ['eng','ger','vis']
- To train and test on the EN-IT version of WIW please set in the cfg.py file:
self.feats = ['eng','it','vis']
- To train and test on the EN-RU version of WIW please set in the cfg.py file:
self.feats = ['eng','ru','vis']
Evaluation of R@K (Recall at K) for the task is implemented in the retrieval_eval.py file.
This project is licensed under the MIT License - see the LICENSE.txt file for details
- English vectors were taken from: "Pennington, Jeffrey, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of EMNLP."
- German vectors were taken from: "Ivan Vulic and Anna Korhonen. 2014. Is "universal ´syntax" universally useful for learning distributed representations? In Proceedings of ACL, pages 518–524."
- Italian vectors were taken from: "Georgiana Dinu, Angeliki Lazaridou, and Marco Baroni. 2015. Improving zero-shot learning by mitigating the hubness problem. In Proceedings of ICLR: Workshop Papers."
- Russian vectors were taken from: "Andrey Kutuzov and Igor Andreev. 2015. Texts in, meaning out: neural language models in semantic similarity task for Russian. In Proceedings of DIALOG."
- Visual vectors were taken from: "Simonyan, Karen, and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556."