Dialogue Learning With Human-in-the-Loop
This project contains code for the dialog-based learning MemN2N setup in the following paper: "Dialogue Learning with Human-in-the-Loop".
This code requires Torch7 and its luarocks packages cutorch, cunn, nngraph, torchx, and tds.
To get the synthetic data, from this directory first run ./setup_data.sh to download the data (90M download, unpacks to 435M).
After running ./setup_data.sh:
./data/ contains synthetic data for simulations.
The synthetic data includes babi ("babi1_*") tasks and WikiMovies ("movieQA_*") data.
We additionally have another dataset available, which contains human-annotated versions of WikiMovies data. This data is in a slightly simpler format, so the code here does not yet run on it out-of-the-box. It is a 4M download which unpacks to 14M.
You can use one of the *.sh scripts as examples of how to train the model on one of the datasets.
As demonstrated there, to train run:
th online_simulate.lua [params]
Available options are:
-batch_size (default 32, the batch size for model training) -token_size (default 0, number of tokens) -init_weight (default 0.1, initialization weights) -N_hop (default 3, number of hops) -lr (default 0.01, learning rate) -thres (default 40, threshold for gradient clipping) -gpu_index (default 1, which GPU to use) -dataset (default 'babi', choose from 'babi' or 'movieQA') -setting (default 'RBI', choose from 'RBI' or 'FP') -randomness (default 0.2, random exploration rate for epsilon greedy) -simulator_batch_size (default 32, the batch size of data generation. It is different from model batch size) -task (default 3, which task to test) -nepochs (default 20, number of iterations) -negative (default 5, number of negative samples for FP) -REINFORCE (default false, where to train the REINFORCE algorithm) -REINFORCE_reg (default 0.1, entropy regularizer for the REINFORCE algorithm) -RF_lr (default 0.0005, learning rate used by the REINFORCE baseline) -log_freq (default 200, how often we log) -balance (default false, enable label balancing experience replay strategy for FP)
- Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato and Jason Weston, "Dialogue Learning with Human-in-the-Loop", arXiv:1611.09823 [cs.AI].