LACMA (Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following)

This is the code base to reproduce: LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following (EMNLP 2023)

Cheng-Fu Yang, Yen-Chun Chen, Jianwei Yang, Xiyang Dai, Lu Yuan, Yu-Chiang Frank Wang, Kai-Wei Chang

Installation

Clone repo:

$ git clone https://github.com/joeyy5588/LACMA.git LACMA
$ export LACMA_ROOT=$(pwd)/LACMA
$ export LACMA_LOGS=$LACMA_ROOT/logs
$ export LACMA_DATA=$LACMA_ROOT/data
$ export PYTHONPATH=$PYTHONPATH:$LACMA_ROOT

Install requirements:

$ conda create -n lacma_env python=3.7
$ conda activate lacma_env
$ cd $LACMA_ROOT
$ pip install --upgrade pip
$ pip install -r requirements.txt

Running on remote headless server:

$ tmux
$ sudo python scripts/startx.py 0
$ export DISPLAY=:0

Preparing data

Download ALFRED dataset:

$ cd $LACMA_DATA
$ sh download_data.sh json_feat

Download the pretrained detection model from google drive or using gdown:

$ gdown 1lXEIX3cM6iVanRiVGYM_E46pJM71bkFQ
$ mkdir $LACMA_LOGS/pretrained
$ unzip pretrained.zip -d $LACMA_LOGS/pretrained

Rollout the trajectories to get the field of view (FOV) images or download it from here:

# Manually rollout the trajectories
$ python -m alfred.gen.render_trajs

# Or download the rollout images
$ wget https://acvrpublicycchen.blob.core.windows.net/lacma/data.tar
$ tar -xvf data.tar -C $LACMA_DATA

Create an LMDB dataset with natural language annotations:

$ python -m alfred.data.create_lmdb with args.visual_checkpoint=$LACMA_LOGS/pretrained/fasterrcnn_model.pth args.data_output=lmdb_human args.vocab_path=$LACMA_ROOT/files/human.vocab

Note #1: For rendering, you may need to set args.x_display to DISPLAY correspond to an X server number running on your machine.
Note #2: We do not use JPG images from the full dataset as they would differ from the images rendered during evaluation due to the JPG compression.

Parse meta-actions from low-level action sequences (Algorithm 1 and Algorithm 2 in paper).

$ python scripts/meta_action.py [data_split] # train/valid_seen/valid_unseen
$ python scripts/write_vocab.py

Pretrained models evaluation

Evaluate LACMA on ALFRED dataset:

$ python -m alfred.eval.eval_agent with eval.exp=pretrained eval.checkpoint=p$LACMA_LOGS/pretrained/lacma_pretrained.pth eval.object_predictor=$LACMA_LOGS/pretrained/maskrcnn_model.pth exp.num_workers=5 eval.eval_range=None eval.split=valid_seen exp.data.valid=lmdb_human

Note: make sure that your LMDB database is called exactly lmdb_human as the word embedding won't be loaded otherwise.

Note: For evaluation, you may need to configure eval.x_display to correspond to an X server number running on your machine.

Language pretraining

Language encoder pretraining with the translation objective:

$ python -m alfred.model.train with exp.model=speaker exp.name=translator exp.data.train=lmdb_human

Note: you may need to train up to 5 agents using different random seeds to reproduce the results of the paper.

Meta-action contrastive pretraining

$ python -m alfred.model.train with exp.model=newmeta exp.name=meta_pretrain exp.data.train=lmdb_human train.seed=42 exp.pretrained_path=logs/translator/model_19.pth

Finetuning with low-level actions

$ python -m alfred.model.train with exp.model=newmeta exp.name=meta_finetune exp.data.train=lmdb_human train.seed=42 exp.pretrained_path=logs/meta_pretrain/model_19.pth

When finetuning, comment out L281, L290 and uncomment L272 in alfred/model/newmeta.py

Evaluate the trained LACMA agent:

$ python -m alfred.eval.eval_agent with eval.exp=meta_finetune eval.object_predictor=$LACMA_LOGS/pretrained/maskrcnn_model.pth exp.num_workers=5 eval.eval_range=None eval.split=valid_seen

Acknowledgements

This project heavily borrows from Episodic Transformer by Alexander Pashevich. Huge thanks to them for their work!

Citation

If you find this repository useful, please cite our work:

@inproceedings{yang2023lacma,
  title = {LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following},
  author = {Yang, Cheng-Fu and Chen, Yen-Chun and Yang, Jianwei and Dai, Xiyang and Yuan, Lu and Wang, Yu-Chiang Frank and Chang, Kai-Wei},
  booktitle = {EMNLP},
  year = {2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
alfred		alfred
files		files
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alfred

alfred

files

files

scripts

scripts

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

LACMA (Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following)

Installation

Preparing data

Pretrained models evaluation

Language pretraining

Meta-action contrastive pretraining

Finetuning with low-level actions

Acknowledgements

Citation

About

Releases

Packages

Languages

License

joeyy5588/LACMA

Folders and files

Latest commit

History

Repository files navigation

LACMA (Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following)

Installation

Preparing data

Pretrained models evaluation

Language pretraining

Meta-action contrastive pretraining

Finetuning with low-level actions

Acknowledgements

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages