ROSCOE: What is the correspondence between the metrics returned in the code and defined in the paper? #4923

SahanaRamnath · 2022-12-26T12:16:50Z

@Golovneva
The code returns 20 scores (namely faithfulness, informativeness_step, informativeness_chain, faithfulness_ww, repetition_word, repetition_step, reasoning_alignment, external_hallucination, redundancy, common_sense_error, missing_step, semantic_coverage_step, semantic_coverage_chain, discourse_representation, coherence_step_vs_step, perplexity_step, perplexity_chain, perplexity_step_max, grammar_step, grammar_step_max), and while some of these correspond exactly with the scores defined in the paper, some of them don't (such as discourse_representation, coherence_step_vs_step etc.).

(1) Can you let me know which score in the code corresponds to which definition in the paper?
(2) For almost all the scores, it looks like the higher the score is, the better the model's reasoning is. However, it looks like it is the opposite case for scores such as repetition_step. Can you also clarify what is the best score (i.e., 0 or 1), for each of these 20 scores?

The text was updated successfully, but these errors were encountered:

Golovneva · 2023-01-03T15:51:19Z

(1) Sorry about inconvenience, we renamed some scores for the paper in attempt to make them more user-friendly. The mapping is defined here: https://github.com/facebookresearch/ParlAI/blob/main/projects/roscoe/meta_evaluation/table_file_writing.py#L44

perplexity_step_max and grammar_step_max showed same behavior as their mean version (perplexity_step and grammar_step), so we report only the later.

(2) For all scores the higher the better. For repetition_step specifically, score 1 would mean that all steps are completely different from each other (in reality, this only occurs when there is only 1 step in the chain and there is nothing to compare with), while 0 means that there are two completely identical steps/sentences in the chain.

SahanaRamnath · 2023-01-15T21:48:52Z

Thank you!

SahanaRamnath closed this as completed Jan 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROSCOE: What is the correspondence between the metrics returned in the code and defined in the paper? #4923

ROSCOE: What is the correspondence between the metrics returned in the code and defined in the paper? #4923

SahanaRamnath commented Dec 26, 2022

Golovneva commented Jan 3, 2023

SahanaRamnath commented Jan 15, 2023

ROSCOE: What is the correspondence between the metrics returned in the code and defined in the paper? #4923

ROSCOE: What is the correspondence between the metrics returned in the code and defined in the paper? #4923

Comments

SahanaRamnath commented Dec 26, 2022

Golovneva commented Jan 3, 2023

SahanaRamnath commented Jan 15, 2023