Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

ROSCOE: What is the correspondence between the metrics returned in the code and defined in the paper? #4923

Closed
SahanaRamnath opened this issue Dec 26, 2022 · 2 comments

Comments

@SahanaRamnath
Copy link

@Golovneva
The code returns 20 scores (namely faithfulness, informativeness_step, informativeness_chain, faithfulness_ww, repetition_word, repetition_step, reasoning_alignment, external_hallucination, redundancy, common_sense_error, missing_step, semantic_coverage_step, semantic_coverage_chain, discourse_representation, coherence_step_vs_step, perplexity_step, perplexity_chain, perplexity_step_max, grammar_step, grammar_step_max), and while some of these correspond exactly with the scores defined in the paper, some of them don't (such as discourse_representation, coherence_step_vs_step etc.).

(1) Can you let me know which score in the code corresponds to which definition in the paper?
(2) For almost all the scores, it looks like the higher the score is, the better the model's reasoning is. However, it looks like it is the opposite case for scores such as repetition_step. Can you also clarify what is the best score (i.e., 0 or 1), for each of these 20 scores?

@Golovneva
Copy link
Contributor

(1) Sorry about inconvenience, we renamed some scores for the paper in attempt to make them more user-friendly. The mapping is defined here: https://github.com/facebookresearch/ParlAI/blob/main/projects/roscoe/meta_evaluation/table_file_writing.py#L44

perplexity_step_max and grammar_step_max showed same behavior as their mean version (perplexity_step and grammar_step), so we report only the later.

(2) For all scores the higher the better. For repetition_step specifically, score 1 would mean that all steps are completely different from each other (in reality, this only occurs when there is only 1 step in the chain and there is nothing to compare with), while 0 means that there are two completely identical steps/sentences in the chain.

@SahanaRamnath
Copy link
Author

Thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants