Find file History
Latest commit 7f17ef6 Dec 17, 2016 @Tuzongyu Tuzongyu copyright notice

Dialogue Learning With Human-in-the-Loop

This project contains code for the dialog-based learning MemN2N setup in the following paper: "Dialogue Learning with Human-in-the-Loop".


This code requires Torch7 and its luarocks packages cutorch, cunn, nngraph, torchx, and tds.

To get the synthetic data, from this directory first run ./ to download the data (90M download, unpacks to 435M).


After running ./

./data/ contains synthetic data for simulations.

The synthetic data includes babi ("babi1_*") tasks and WikiMovies ("movieQA_*") data.

We additionally have another dataset available, which contains human-annotated versions of WikiMovies data. This data is in a slightly simpler format, so the code here does not yet run on it out-of-the-box. It is a 4M download which unpacks to 14M.


You can use one of the *.sh scripts as examples of how to train the model on one of the datasets.

As demonstrated there, to train run:

th online_simulate.lua [params]

Available options are:

-batch_size     (default 32, the batch size for model training)
-token_size     (default 0, number of tokens)
-init_weight    (default 0.1, initialization weights)
-N_hop          (default 3, number of hops)
-lr             (default 0.01, learning rate)
-thres          (default 40, threshold for gradient clipping)
-gpu_index      (default 1, which GPU to use)
-dataset        (default 'babi', choose from 'babi' or 'movieQA')
-setting        (default 'RBI', choose from 'RBI' or 'FP')
-randomness     (default 0.2, random exploration rate for epsilon greedy)
-simulator_batch_size   (default 32, the batch size of data generation. It is different from model batch size)
-task           (default 3, which task to test)
-nepochs        (default 20, number of iterations)
-negative       (default 5, number of negative samples for FP)
-REINFORCE      (default false, where to train the REINFORCE algorithm)
-REINFORCE_reg  (default 0.1, entropy regularizer for the REINFORCE algorithm)
-RF_lr          (default 0.0005, learning rate used by the REINFORCE baseline)
-log_freq       (default 200, how often we log)
-balance        (default false, enable label balancing experience replay strategy for FP)