Skip to content

johndmendonca/DialEvalML

Repository files navigation

DialEvalML

License Python 3.10+ Code style: black

This repo implements the Paper Towards Multilingual Automatic Open-Domain Dialogue Evaluation. It also includes competition code for DSTC11 Track 4 Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems, which is introduced in the Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation paper.

Section
Apply DialEvalML to your own dialogues
Training encoder models from scratch
Citation

Apply DialEvalML to your own dialogues

1. Download models

Obtain the models used in both papers from Drive.

2. Apply Tokenization

Raw dialog corpora may have very different data structures, so we leave to the user to convert their own data to the DialQualityML format.

The format is straightforward:

  • The tokenizer receives as input res and optionally ctx (context is needed to evaluate context dependent metrics).
  • ctx can be multi-turn, the only limitation relates to max_length.
  • Who said what is determined by appending the speaker token at the start of the sentence.
A: Gosh, you took all the word right out of my mouth. Let's go out and get crazy tonight.
B: Let's go to the new club on West Street .
A: I'm afraid I can't.


ctx = "<speaker1>Gosh , you took all the word right out of my mouth . Let's go out and get crazy tonight .</s><s><speaker2>Let's go to the new club on West Street ."
res = "<speaker1>I ' m afraid I can ' t ."

3. Run prediction code

  • Adjust the script predict.py to your requirements.

Training encoder models from scratch

1. Download/format data

  • You can download the preprocessed data used to train our best models. This data is already preprocessed with seperate columns for context and response and follows the DialEvalML format. To train the multilingual models, simply train with the subsets concatenated.

2. Training

  • Adjust and run train_xxx.sh.

Citation

If you use this work, please consider citing:

@inproceedings{mendoncaetal2023towards,
    title = "Towards Multilingual Automatic Open-Domain Dialogue Evaluation",
    author = "Mendonça, John and Lavie, Alon and Trancoso, Isabel",
    booktitle = "Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue",
    month = sep,
    year = "2023",
    address = "Prague, Czechia",
    publisher = "Association for Computational Linguistics",
}
@inproceedings{mendoncaetal2023simple,
    author    = "Mendonça, John and Pereira, Patricia and Moniz, Helena and Carvalho, João Paulo and Lavie, Alon and Trancoso, Isabel",
    title     = "Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation",
    booktitle = "DSTC11: The Eleventh Dialog System Technology Challenge",
    series    = "24th Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)",
    year      = 2023,
    month     = "September",
    address   = "Prague, Czechia"
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published