About evaluate_response.py #42

heyzude · 2021-09-23T10:36:36Z

Hi,

In evaluate_response.py, I see the following snippet

def parse_response_from_file(input_path):
"""Parses the response from a flattened file.
Args:
input_path: Path to read the responses from.
"""
lines = []
with open(input_path, "r") as file_id:
for ii in file_id.readlines():
split_line = ii.split("<SOR>", 1)
lines.append(
(split_line[0].strip("\n"), split_line[1].strip("\n").strip(""))
)
return lines

Here we have <SOR>, but this is only used at noblief mode, while the baseline also uses belief.
Is it allowed to fix evaluation code a little for cases like this? or should I conform to this eval script?

satwikkottur · 2021-09-23T23:47:50Z

Hello @heyzude ,

Thanks for your interest.

The official evaluation script for response generation is given here model/utils/response_evaluation.py.
The file you mention is derived from the official script, to be used only for the baseline since it produces output in a flattened form. Please use the official script to evaluate the performance numbers.

Feel free to re-open the issue if you have further questions.

heyzude · 2021-09-24T01:35:31Z

Thanks for solving my question!

satwikkottur closed this as completed Sep 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About evaluate_response.py #42

About evaluate_response.py #42

heyzude commented Sep 23, 2021 •

edited

satwikkottur commented Sep 23, 2021

heyzude commented Sep 24, 2021

About evaluate_response.py #42

About evaluate_response.py #42

Comments

heyzude commented Sep 23, 2021 • edited

satwikkottur commented Sep 23, 2021

heyzude commented Sep 24, 2021

heyzude commented Sep 23, 2021 •

edited