Skip to content

The official repo for the paper "Teacher Forcing Recovers Reward Functions for Text Generation"

Notifications You must be signed in to change notification settings

MANGA-UOFA/LMReward

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Teacher Forcing Recovers Reward Functions for Text Generation

This is the official code repo for the paper Teacher Forcing Recovers Reward Functions for Text Generation.

Setup

Downlaod the code

# ensure virtual environment
git clone https://github.com/MANGA-UOFA/LMReward
cd LMReward
pip install -r requirements.txt

Prepare dataset

You can download the deduplicated dialogue datasets here.

For the Quora dataset, you can download it here.

Run

Train a reward model

You should first fill all the variables in scripts/teacher.sh. Then executing it will learn a reward model using teacher forcing. In the same time, the reward model is also an initialization point for the next step.

REINFORCE with the reward

Fill all variables in scripts/reinforce.sh. Please use the non-parallel data and the trained reward model in this step.

Validation and test

You can run scripts/evaluate.sh to decode all checkpoints.

Cite our work

If you find this repo helpful, please consider cite our work:

@inproceedings{
    hao2022teacher,
    title={Teacher Forcing Recovers Reward Functions for Text Generation},
    author={Yongchang Hao and Yuxin Liu and Lili Mou},
    booktitle={Thirty-Sixth Conference on Neural Information Processing Systems},
    year={2022},
    url={https://openreview.net/forum?id=1_gypPuWUC3}
}

Disclaimer

The code is refactored for public. It has not been tested extensively. If you have any concerns or troubles, please open an issue.

About

The official repo for the paper "Teacher Forcing Recovers Reward Functions for Text Generation"

Resources

Stars

Watchers

Forks