The multilingual versions datasets of DailyDialog and DSTC7, which includes seven languages: English, Chinese, German, Russian, Spanish, French and Italian. You can download the dataset from the following link:
We provide hred, vhcr, vhred, hran, transformer and HTransformer Baselines, our manuscript only provides the experimental results of hred, vhred, transformer and HTransformer.
- prepare the datasets
python run.py \
--corpus DailyDialog \
--do_prepare
- train model by Multilingual learning(specify the model parameter to switch between different models)
python run.py \
--corpus DailyDialog \
--do_train \
--hier true \
--model hred \
--bidirectional true \
- evaluate
python run.py \
--corpus DailyDialog \
--do_eval \
--hier true \
--model hred \
--bidirectional true \
An case study is provided in Table 9 to demonstrate the values of augmented data. We can observe that the responses of HRED and VHRD contain context-independent information without using data augmentation. Specifically, "like my experience and i have a good idea" in HRED and "i like sports" in VHRED are context independent. The responses generated by HRED and VHRED are more informative and mroe coherent when using data augmentation. Models can utilize cross linguistic knowledge to generate more informative and coherent responses by multilingual data augmentation.