Skip to content

qywu/PRAL

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 

PRAL

Code for the paper: A Tailored Pre-Training Model for Task-Oriented Dialog Generation

Pretrain Dataset

For the pretrain dataset, first download the repository.

# download
git clone https://github.com/qywu/DialogCorpus.git
cd DialogCorpus

You can manually download and process the dataset.

# download data for daily_dialog
python daily_dialog/download_data.py
# process the data
python daily_dialog/process_data.py
# the processed data is stored as the {folder_name}.json
vi daily_dialog/data/daily_dialog.json

Or you can just use one command.

python prepare_all_data.py \
       --download \
       --process \
       --join

Or you can just download our processed version: https://drive.google.com/file/d/1VS9GndEAsrdiyIzlyhy2LAKyu_bR2Lpz/view?usp=sharing

Detailed Dialog Processing for each dataset:

  • Daily Dialog

    • Removed tokenization space for punctuations
  • Persona Chat

    • Used huggingface's version [link]
    • Recovered lower cased utterances
    • Removed tokenization space for punctuations
  • Cornell Movie Corpus

    • Ignored UTF-8 Errors
    • Extracted Names
  • Task Master

    • Nothing specific
  • CCPE

    • Nothing specific
  • Frames

    • Nothing specific
  • Chit-Chat Challenge

    • Nothing specific
  • Self-dialogue

    • Nothing specific
  • Schema Dialog

    • Nothing specific

Links

Training

After process or download the data, put dialog_corpus.json in the current directory and train the model with the following:

python main.py

Evaluation

You can refer to ARDM's evaluation code https://github.com/qywu/ARDM. For the chatbot demo, you can checkout the colab example and load the pretrained weights: https://colab.research.google.com/drive/1ib7YCeNhkIDAzuOKotSlw1CfIBP_zE4r

Pretrained Weights

We provide the download option to our pretrained weights: https://drive.google.com/file/d/17S0TYjbUQmMzsNvfgZwY2DFULYlPQZ7h/view?usp=sharing

Citation

You can cite the paper with:

@article{PRAL,
  author    = {Jing Gu and
               Qingyang Wu and
               Chongruo Wu and
               Weiyan Shi and
               Zhou Yu},
  title     = {A Tailored Pre-Training Model for Task-Oriented Dialog Generation},
  journal   = {CoRR},
  volume    = {abs/2004.13835},
  year      = {2020},
  url       = {https://arxiv.org/abs/2004.13835},
  archivePrefix = {arXiv},
  eprint    = {2004.13835},
  timestamp = {Sat, 02 May 2020 19:17:26 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2004-13835.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages