Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior

This is the code repository for the paper: "Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior", Noriyuki Kojima, Alane Suhr, and Yoav Artzi (TACL 2021, presented at EMNLP 2021).

About

paper | arXiv | talk | project page

We study continual learning for natural language instruction generation, by observing human users' instruction execution. We focus on a collaborative scenario, where the system both acts and delegates tasks to human users using natural language. We compare user execution of generated instructions to the original system intent as an indication to the system's success communicating its intent. We show how to use this signal to improve the system's ability to generate instructions via contextual bandit learning. In interaction with real users, our system demonstrates dramatic improvements in its ability to generate language over time.

Codebase

Installation

Create the conda environment: conda create -n cb_gen python=3.7
Clone the repo.
Install the requirements: pip install -r requirements.txt

Subdirectories

model/ defines the model architectures for instruction generation in the CerealBar
agents/ defines information about the CerealBar agent and environment
learning/ is used to train models.
data/ contains data.
checkpoints/ contains model checkpoints.

Training models

Pre-training models on human-human interaction data

Download formatted GPT-2 weights from the link and put it under checkpoints/gpt-2
Follow data/README.md to download processed human-human interaction data.

python -m learning.training --train_config_file_name learning/configs/pretraining.yml --experiment_name pretraining --max_epochs 400 --checkpoint_step 40000 --turnoff_wandb

Training models from scratch on aggregated human-human & human-system interaction data

Follow data/README.md to download processed human-system interaction data.

# This example is training a model on the aggregated data after the second round of human-system interactions.
python -m learning.training --train_config_file_name learning/configs/continual_example.yml --experiment_name pretraining --max_epochs 400 --checkpoint_step 40000 --turnoff_wandb

Evaluating models on human written instrucions using automated evaluation metrics

Please make sure you already downloaded the processed human-human interaction data by following data/README.md.
Please refer checkpoints/README.md to download checkpoints for the trained models (also change the path under pretrained_checkpoint_path in learning/configs/eval.yml to your desired checkpoint path).
Note that all the evaluation scores reported in our paper is solely based on human users actually playing the games, and we do not report automated evaluation metrics based on human written instructions.

# This example is evaluating one of the models deployed at Round 10
python -m learning.training --train_config_file_name learning/configs/eval.yml --experiment_name eval_automated --turnoff_wandb --validate_only

Notes

The current release of the repo does not include following components

A pipeline for human-system interactions (e.g., the Unity game, the backend server implementation to communicate with the Unity game, the implmentation of deterministic planner to generate system's path-plan, and scripts to evaluate and process interaction data).
A visualization tool to interpret the model behaviors on the Unity Graphics.

License

MIT (partilaly Apache License 2.0, see model/instruction_generator_model.py)

Citing

If you find this our work useful in your research, please consider citing the following paper:

@article{Kojima2021:gen-learn,
  author  = {Kojima, Noriyuki and Suhr, Alane and Artzi, Yoav},
  title   = {Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior},
  journal = {Transactions of the Association for Computational Linguistics},
  volume  = {9},
  pages   = {1303-1319},
  year    = {2021},
  month   = {12},
  issn    = {2307-387X},
  doi     = {10.1162/tacl_a_00428},
  url     = {https://doi.org/10.1162/tacl\_a\_00428},
  eprint  = {https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl\_a\_00428/1976207/tacl\_a\_00428.pdf}
}

Ackowledegement

This research was supported by ARO W911NF21-1-0106, a Google Focused Award, Masason Foundation, a Facebook Fellowship, and NSF under grants No. 1750499 and DGE-1650441. We thank Jonathan Chang, Sasha Rush, the Cornell NLP Group, Robert Hawkins, Dipendra Misra, and John Langford for discussion and comments; Suyi Diao for Unity development; Anna Effenberger for code to compute syntax complexity; Ge Gao, Koji Shiono, and Takayuki Kojima for feedback on our interaction platform; and the crowdsourcing workers for participating in our data collection. Finally, we thank the action editor and the anonymous reviewers for detailed comments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agent

agent

checkpoints

checkpoints

data

data

learning

learning

media

media

model

model

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior

About

Codebase

Installation

Subdirectories

Training models

Pre-training models on human-human interaction data

Training models from scratch on aggregated human-human & human-system interaction data

Evaluating models on human written instrucions using automated evaluation metrics

Notes

License

Citing

Ackowledegement

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
agent		agent
checkpoints		checkpoints
data		data
learning		learning
media		media
model		model
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

lil-lab/cerealbar_generation

Folders and files

Latest commit

History

Repository files navigation

Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior

About

Codebase

Installation

Subdirectories

Training models

Pre-training models on human-human interaction data

Training models from scratch on aggregated human-human & human-system interaction data

Evaluating models on human written instrucions using automated evaluation metrics

Notes

License

Citing

Ackowledegement

About

Resources

Stars

Watchers

Forks

Languages