Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

About evaluate_response.py #42

Closed
heyzude opened this issue Sep 23, 2021 · 2 comments
Closed

About evaluate_response.py #42

heyzude opened this issue Sep 23, 2021 · 2 comments

Comments

@heyzude
Copy link

heyzude commented Sep 23, 2021

Hi,

In evaluate_response.py, I see the following snippet

def parse_response_from_file(input_path):
"""Parses the response from a flattened file.
Args:
input_path: Path to read the responses from.
"""
lines = []
with open(input_path, "r") as file_id:
for ii in file_id.readlines():
split_line = ii.split("<SOR>", 1)
lines.append(
(split_line[0].strip("\n"), split_line[1].strip("\n").strip(""))
)
return lines

Here we have <SOR>, but this is only used at noblief mode, while the baseline also uses belief.
Is it allowed to fix evaluation code a little for cases like this? or should I conform to this eval script?

@satwikkottur
Copy link
Contributor

Hello @heyzude ,

Thanks for your interest.

The official evaluation script for response generation is given here model/utils/response_evaluation.py.
The file you mention is derived from the official script, to be used only for the baseline since it produces output in a flattened form. Please use the official script to evaluate the performance numbers.

Feel free to re-open the issue if you have further questions.

@heyzude
Copy link
Author

heyzude commented Sep 24, 2021

Thanks for solving my question!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants