Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review测评指标失真,Qwen被严重低估了 #53

Open
fengzhu1 opened this issue May 14, 2024 · 1 comment
Open

Review测评指标失真,Qwen被严重低估了 #53

fengzhu1 opened this issue May 14, 2024 · 1 comment

Comments

@fengzhu1
Copy link

teval.evaluators.review_evaluator.py代码中,根据“:”(英文半角符判断答案),但是测试样本中的的指令却是“Answer:”(中文全角符)。Qwen1.5-14B-Chat的大多数结果都是"Answer:A"、"Answer:B"、"Answer:C"...这样的。根据下面的代码,截取出来的结果就是“Answer”的第一字符“A”。也就是说Review指标上Qwen1.5-14B-Chat 基本都是A,与事实不符。这样写死的判断代码,测出的结果失真。

代码:
pred_data = pred_data[pred_data.find(":") + 1:]
pred_data = pred_data.strip()
if len(pred_data) > 0 and pred_data[0] in ['A', 'B', 'C', 'D', 'E']:

测试样本指令:“你的输出应遵循以下格式:\n```\nAnswer:[在此处插入你的选择,从A、B、C、D和E中选择。这应该是一个字符。]\”

@zehuichen123
Copy link
Collaborator

感谢您指出问题,我们将会在下一版数据中fix这个问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants