This repo implements the Paper Towards Multilingual Automatic Open-Domain Dialogue Evaluation. It also includes competition code for DSTC11 Track 4 Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems, which is introduced in the Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation paper.
Section |
---|
Apply DialEvalML to your own dialogues |
Training encoder models from scratch |
Citation |
Obtain the models used in both papers from Drive.
- Towards Multilingual Automatic Open-Domain Dialogue Evaluation: VSP-ML5, NSP-ML75;
- Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation:
Raw dialog corpora may have very different data structures, so we leave to the user to convert their own data to the DialQualityML format.
The format is straightforward:
- The tokenizer receives as input
res
and optionallyctx
(context is needed to evaluate context dependent metrics). ctx
can be multi-turn, the only limitation relates tomax_length
.- Who said what is determined by appending the speaker token at the start of the sentence.
A: Gosh, you took all the word right out of my mouth. Let's go out and get crazy tonight.
B: Let's go to the new club on West Street .
A: I'm afraid I can't.
ctx = "<speaker1>Gosh , you took all the word right out of my mouth . Let's go out and get crazy tonight .</s><s><speaker2>Let's go to the new club on West Street ."
res = "<speaker1>I ' m afraid I can ' t ."
- Adjust the script
predict.py
to your requirements.
- You can download the preprocessed data used to train our best models. This data is already preprocessed with seperate columns for context and response and follows the DialEvalML format. To train the multilingual models, simply train with the subsets concatenated.
- Adjust and run
train_xxx.sh
.
If you use this work, please consider citing:
@inproceedings{mendoncaetal2023towards,
title = "Towards Multilingual Automatic Open-Domain Dialogue Evaluation",
author = "Mendonça, John and Lavie, Alon and Trancoso, Isabel",
booktitle = "Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue",
month = sep,
year = "2023",
address = "Prague, Czechia",
publisher = "Association for Computational Linguistics",
}
@inproceedings{mendoncaetal2023simple,
author = "Mendonça, John and Pereira, Patricia and Moniz, Helena and Carvalho, João Paulo and Lavie, Alon and Trancoso, Isabel",
title = "Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation",
booktitle = "DSTC11: The Eleventh Dialog System Technology Challenge",
series = "24th Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)",
year = 2023,
month = "September",
address = "Prague, Czechia"
}