Skip to content

misonsky/MultilingualDatasets

Repository files navigation

Zero-shot Dialog Generation

Datasets

The multilingual versions datasets of DailyDialog and DSTC7, which includes seven languages: English, Chinese, German, Russian, Spanish, French and Italian. You can download the dataset from the following link:

MultilingualDatasets

Models

We provide hred, vhcr, vhred, hran, transformer and HTransformer Baselines, our manuscript only provides the experimental results of hred, vhred, transformer and HTransformer.

This code using the MBERT Tokenizer

How to run

  1. prepare the datasets
 python run.py \
 	--corpus DailyDialog \
 	--do_prepare
  1. train model by Multilingual learning(specify the model parameter to switch between different models)
 python run.py \
 	--corpus DailyDialog \
 	--do_train \
 	--hier true \
 	--model hred \
 	--bidirectional true \
 	
  1. evaluate
python run.py \
 	--corpus DailyDialog \
 	--do_eval \
 	--hier true \
 	--model hred \
 	--bidirectional true \

Case Study

Case study

An case study is provided in Table 9 to demonstrate the values of augmented data. We can observe that the responses of HRED and VHRD contain context-independent information without using data augmentation. Specifically, "like my experience and i have a good idea" in HRED and "i like sports" in VHRED are context independent. The responses generated by HRED and VHRED are more informative and mroe coherent when using data augmentation. Models can utilize cross linguistic knowledge to generate more informative and coherent responses by multilingual data augmentation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages