Skip to content

ictnlp/SAMMT

Repository files navigation

Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation

This repository contains code for EMNLP'23 submission "Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation".

Get started

Text-to-image Generation Environment:
conda create -n stable python==3.8
pip install torch==2.0.1
pip install Pillow==9.5.0
pip install transformers==4.27.4
pip install diffusers==0.16.1
pip install scipy==1.10.1
pip install accelerate==0.18.0

Training environment:
conda create -n sammt python==3.6.7
pip install -r requirements.txt
pip install --editable ./

Data

Multi30K texts and images can be downloaded here and here. We get Multi30K text data from fairseq_mmt.

cd fairseq_sammt
git clone https://github.com/multi30k/dataset.git
git clone https://github.com/BryanPlummer/flickr30k_entities.git
# Organize the downloaded dataset
flickr30k
├─ flickr30k-images
├─ test_2017_flickr
└─ test_2017_mscoco
multi30k-dataset
└─ data
    └─ task1
        ├─ tok
        └─ image_splits

Text-to-image Generation

conda activate stable
python train_stable_diffusion_step50.py train

script parameters:

  • dataset: $1: choices=['train','valid','test', 'test1', 'test2']

Extract Image Feature

conda activate sammt
python image_process.py train synth

script parameters:

  • dataset:$1: choices=['train','valid','test', 'test1', 'test2']
  • synthetic or authentic images: $2: choices=['synth','authe']

The pre-extracted image features can also be downloaded here.

Train and Test

1. Preprocess

conda activate sammt
bash preprocess.sh

2. Train

bash train_mmt.sh

3. Test

# bash translate_mmt.sh $1 $2 $3
bash translate_mmt.sh clip test synth

script parameters:

  • image feature: $1: choices=['clip']
  • test set: $2: choices=['test', 'test1', 'test2']
  • inference with synthetic or authentic images: $3: choices=['synth', 'authe']

Acknowledgements

This project is built on several open-source repositories/codebases, including:

About

Code for EMNLP 2023 paper "Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages