New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract human response from 6k multi-ref dataset #48
Comments
Hi, the human reference file has been uploaded. Please find it here: |
Thanks for the update. The file and the I observed 2 problems and fixed them by the following script:
and run
The result looks almost correct, except NIST4 which is 4.25 in the paper
NIST2 and NIST4 are very close in all other experiments, 3.5 seems more reasonable. |
I obtained the same results as you so it's possible an error was made in the paper. |
Hi,
I'm trying to reproduce the human response result in the paper and encounter some problem.
I copied
test.scored_refs.txt
todstc/data
folder and use the first column as the keys.The eval result after running
python extract_human.py
andpython batch_eval.py
iswhich is different from the paper, even the
avg_len
is wrong.I'm wondering which step is wrong and how to reproduce the result.
Thanks!
The text was updated successfully, but these errors were encountered: