Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation

This repository contains code for EMNLP'23 submission "Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation".

Get started

Text-to-image Generation Environment:
conda create -n stable python==3.8
pip install torch==2.0.1
pip install Pillow==9.5.0
pip install transformers==4.27.4
pip install diffusers==0.16.1
pip install scipy==1.10.1
pip install accelerate==0.18.0

Training environment:
conda create -n sammt python==3.6.7
pip install -r requirements.txt
pip install --editable ./

Data

Multi30K texts and images can be downloaded here and here. We get Multi30K text data from fairseq_mmt.

cd fairseq_sammt
git clone https://github.com/multi30k/dataset.git
git clone https://github.com/BryanPlummer/flickr30k_entities.git
# Organize the downloaded dataset
flickr30k
├─ flickr30k-images
├─ test_2017_flickr
└─ test_2017_mscoco
multi30k-dataset
└─ data
    └─ task1
        ├─ tok
        └─ image_splits

Text-to-image Generation

conda activate stable
python train_stable_diffusion_step50.py train

script parameters:

dataset: $1: choices=['train','valid','test', 'test1', 'test2']

Extract Image Feature

conda activate sammt
python image_process.py train synth

script parameters:

dataset:$1: choices=['train','valid','test', 'test1', 'test2']
synthetic or authentic images: $2: choices=['synth','authe']

The pre-extracted image features can also be downloaded here.

Train and Test

1. Preprocess

conda activate sammt
bash preprocess.sh

2. Train

bash train_mmt.sh

3. Test

# bash translate_mmt.sh $1 $2 $3
bash translate_mmt.sh clip test synth

script parameters:

image feature: $1: choices=['clip']
test set: $2: choices=['test', 'test1', 'test2']
inference with synthetic or authentic images: $3: choices=['synth', 'authe']

Acknowledgements

This project is built on several open-source repositories/codebases, including:

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
config		config
data		data
docs		docs
examples		examples
fairseq		fairseq
fairseq_cli		fairseq_cli
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
image_process.py		image_process.py
preprocess.sh		preprocess.sh
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py
train_mmt.sh		train_mmt.sh
train_stable_diffusion_step50.py		train_stable_diffusion_step50.py
translate_mmt.sh		translate_mmt.sh

License

ictnlp/SAMMT

Folders and files

Latest commit

History

Repository files navigation

Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation

Get started

Data

Text-to-image Generation

Extract Image Feature

Train and Test

1. Preprocess

2. Train

3. Test

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages