MaRVL

This is the implementation of the approaches described in the paper:

Fangyu Liu*, Emanuele Bugliarello*, Edoardo M. Ponti, Siva Reddy, Nigel Collier and Desmond Elliott. Visually Grounded Reasoning over Languages and Cultures. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), Nov 2021.

We provide the pretrained models, data and the code for reproducing our results.

The code will also be integrated into VOLTA, upon which our repository was origally built.

Repository Setup

You can clone this repository issuing:
git clone git@github.com:marvl-challenge/marvl-code

1. Create create a new virtual environment:

python3 -m venv /path/to/new/virtual/environment/marvl
source /path/to/new/virtual/environment/marvl/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

2. Install apex. For example by issuing:

cd volta/apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

If you use a cluster, you may want to first run commands like the following:

module load cuda/10.1.105
module load gcc/8.3.0-cuda

3. Setup the refer submodule for Referring Expression Comprehension:

cd volta/tools/refer; make
deactivate

Data

We distribute the texts and features under the CC BY 4.0 license.

Text data (original and machine translations) is available under data/.

Preprocessed visual features can be downloaded from ERDA. You will need to convert these H5 files into LMDB format by:

cd feature_extraction
bash h5_to_lmdb.sh

The feature extraction process can be replicated by running (bash marvl_proposal.sh). It is based on Hao Tan's Bottom-up Attention with Detectron2.

The MaRVL team does not own the images and we provide access to the images only for (non-commercial) research purposes. They can be download from the Dataverse portal.

Models

Pretrained mUNITER and xUNITER can be downloaded from ERDA.

Model configuration files are stored in volta/config/.

These models were trained with the same hyperparameters and multimodal data as the controlled models in VOLTA. They, however, also use text-only Wikipedia data in 104 languages during pretraining. We used the Plaintext Wikipedia dump 2018.

Training and Evaluation

We provide sample scripts to train (i.e. pretrain or fine-tune) and evaluate models in experiments/.

Task configuration files are stored in volta/config_tasks/.

Analyses and results can be found under notebooks/.

License

We distribute the texts and features under the CC BY 4.0 license.

This work is licensed under the MIT license. See LICENSE for details. Third-party software and data sets are subject to their respective licenses.
If you find our code/data/models or ideas useful in your research, please consider citing the paper:

@inproceedings{liu-etal-2021-visually,
    title = "Visually Grounded Reasoning across Languages and Cultures",
    author = "Liu, Fangyu  and
      Bugliarello, Emanuele  and
      Ponti, Edoardo Maria  and
      Reddy, Siva  and
      Collier, Nigel  and
      Elliott, Desmond",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.818",
    pages = "10467--10485",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MaRVL

Repository Setup

Data

Models

Training and Evaluation

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

MaRVL

Repository Setup

Data

Models

Training and Evaluation

License