DialEvalML

This repo implements the Paper Towards Multilingual Automatic Open-Domain Dialogue Evaluation. It also includes competition code for DSTC11 Track 4 Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems, which is introduced in the Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation paper.

Section
Apply DialEvalML to your own dialogues
Training encoder models from scratch
Citation

Apply DialEvalML to your own dialogues

1. Download models

Obtain the models used in both papers from Drive.

Towards Multilingual Automatic Open-Domain Dialogue Evaluation: VSP-ML5, NSP-ML75;
Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation:
- VSP-EN, VSP-PA, VSP-ML5;
- NSP-PA, NSP-ML75, NSP-Sia;
- ENG-ML10, ENG-ML20, ENG-ML50.

2. Apply Tokenization

Raw dialog corpora may have very different data structures, so we leave to the user to convert their own data to the DialQualityML format.

The format is straightforward:

The tokenizer receives as input res and optionally ctx (context is needed to evaluate context dependent metrics).
ctx can be multi-turn, the only limitation relates to max_length.
Who said what is determined by appending the speaker token at the start of the sentence.

A: Gosh, you took all the word right out of my mouth. Let's go out and get crazy tonight.
B: Let's go to the new club on West Street .
A: I'm afraid I can't.


ctx = "<speaker1>Gosh , you took all the word right out of my mouth . Let's go out and get crazy tonight .</s><s><speaker2>Let's go to the new club on West Street ."
res = "<speaker1>I ' m afraid I can ' t ."

3. Run prediction code

Adjust the script predict.py to your requirements.

Training encoder models from scratch

1. Download/format data

You can download the preprocessed data used to train our best models. This data is already preprocessed with seperate columns for context and response and follows the DialEvalML format. To train the multilingual models, simply train with the subsets concatenated.

2. Training

Adjust and run train_xxx.sh.

Citation

If you use this work, please consider citing:

@inproceedings{mendoncaetal2023towards,
    title = "Towards Multilingual Automatic Open-Domain Dialogue Evaluation",
    author = "Mendonça, John and Lavie, Alon and Trancoso, Isabel",
    booktitle = "Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue",
    month = sep,
    year = "2023",
    address = "Prague, Czechia",
    publisher = "Association for Computational Linguistics",
}

@inproceedings{mendoncaetal2023simple,
    author    = "Mendonça, John and Pereira, Patricia and Moniz, Helena and Carvalho, João Paulo and Lavie, Alon and Trancoso, Isabel",
    title     = "Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation",
    booktitle = "DSTC11: The Eleventh Dialog System Technology Challenge",
    series    = "24th Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)",
    year      = 2023,
    month     = "September",
    address   = "Prague, Czechia"
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
DSTC11		DSTC11
DialEvalML		DialEvalML
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
predict.py		predict.py
requirements.txt		requirements.txt
train_eng.sh		train_eng.sh
train_nsp.sh		train_nsp.sh
train_nsp_sm.sh		train_nsp_sm.sh
train_vsp.sh		train_vsp.sh

License

johndmendonca/DialEvalML

Folders and files

Latest commit

History

Repository files navigation

DialEvalML

Apply DialEvalML to your own dialogues

1. Download models

2. Apply Tokenization

3. Run prediction code

Training encoder models from scratch

1. Download/format data

2. Training

Citation

About

Resources

License

Stars

Watchers

Forks

Languages