Imperfect also Deserves Reward: Multi-Level and Sequential Reward Modeling for Better Dialog Management
This is the codebase for paper: "Imperfect also Deserves Reward: Multi-Level and Sequential Reward Modeling for Better Dialog Management".
To reproduce our results, you should follow the following two steps:
-
Training a auto encoder.
-
Training a multi-GAN network based on the trained auto encoder.
-
Fixing the trained reward function to guide dialogue policy learing in ConvLab.