Code for the ACL 2023 paper Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination
conda env create -f environments/full.yml
conda activate UMMT-VSH
pip install -e fairseq/
pip install -e taming-transformers/
-
MMT data
- Multi30k
-
NMT data with image source
- WMT14 En→De, En→Fr
- WMT16 1032 En→Ro
- WIT-images
-
Binarize translation data for fairseq
bash scripts/multi30k/preproc.sh
-
Download Flickr30K Flickr30K and MS-COCO image, then create symbolic link
ln -s /xxx/flickr30k ln -s /xxx/mscoco
-
Download WIT translation data from with parallel corpora organized for machine translation. The archive also includes tokenized and BPE encoded sentences.
-
For each translation task, download images in
[train|valid|test]_url.txt
to corresponding paths provided in[train|valid|test]_img.txt
. Image filenames are the MD5 hashes of their URLs. -
Binarize translation data for fairseq
bash scripts/wit/preproc.sh
parse the SG structures for all images and texts by the tools in SG-parsing/VSG
and SG-parsing/LSG
.
- run
scripts/multi30k-train.sh
script for multi30k - run
scripts/wmt-train.sh
script for wmt
- run
scripts/test.sh
script