Skip to content

MichiganNLP/dynaopt

Repository files navigation

DynaOpt: Dynamic Reward Adjustment in Multi-Reward Reinforcement Learning for Counselor Reflection Generation

Code for Dynamic Reward Adjustment in Multi-Reward Reinforcement Learning for Counselor Reflection Generation

Quickstart

1. Prerequisites

Clone the repository:

git clone https://github.com/mindojune/dynaopt.git
cd dynaopt

Next, create a new environment and install the required packages:

conda create -n dynaopt python=3.9
conda activate bolt
pip install -r requirements

2. Download the pretrained weight for the reflection scorer.

Download the weight here. Put the weight inside dynaopt/weights.

3. Running the models

First, train the warm-start model in a supervised manner.

python supervised_train.py --experiment MI --num_epochs 5

Save the path of your trained model and run the inference step over test data.

start_dir={your supervised model}
python test_util.py --experiment MI_rl --model_start_dir $start_dir

Next, you can train the rl models. We use the k-self critical sequencec training algorithm in this project. Refer to the following table to train differnet models.

model script / flag
Round rl_train.py / round
Uniform Weighted rl_train.py / weighted
DORB rl_train.py / bandit
DynaOpt (Ours) rl_train.py / bandit_weighted
C-Dynaopt (Ours) con_rl_train.py / None
python rl_train.py --learning_mode bandit_weighted --seed $i --experiment MI_rl --model_start_dir $start_dir
python con_rl_train.py --seed $i  --experiment MI_rl --model_start_dir $start_dir
python rl_train.py --learning_mode weighted --seed $i --experiment MI_rl --model_start_dir $start_dir
python rl_train.py --learning_mode round --seed $i  --experiment MI_rl --model_start_dir $start_dir
python rl_train.py --learning_mode bandit --seed $i  --experiment MI_rl --model_start_dir $start_dir

And compute the statistics on the test data.

python compute_stats.py --dir ./outputs

License

Our project is licensed under the Apache License 2.0, ensuring open access and collaboration, with due credit given to the original work which forms the backbone of our codebase.

This code makes use of the Keep it Simple code by Laban et al.

Specifically, we use the code for the k self-critical sequence training algorithm..

@inproceedings{laban2021keep_it_simple,
  title={Keep It Simple: Unsupervised Simplification of Multi-Paragraph Text},
  author={Philippe Laban and Tobias Schnabel and Paul N. Bennett and Marti A. Hearst},
  booktitle={Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics},
  volume={1},
  year={2021}
}

Cite the work

If you make use of the code, models, or algorithm, please cite our paper:

@inproceedings{min-etal-2024-bandit,
  title={Dynamic Reward Adjustment in Multi-Reward Reinforcement Learning for Counselor Reflection Generation},
    author = "Min, Do June  and
    P{\'e}rez-Rosas, Ver{\'o}nica  and
    Resnicow, Kenneth  and
    Mihalcea, Rada",
  booktitle={COLING 2024},
  volume={},
  year={2024}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published