Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

中文数据集Chase微调Codellama2-13b,评估模型准确率时,key值报错。KeyError: '电视连续剧' #141

Open
cccatcxy opened this issue Nov 20, 2023 · 2 comments

Comments

@cccatcxy
Copy link

chase数据集的数据库有中文名称的数据库和英文名称的数据库,使用Chase数据集微调codellama时,训练正常进行,预测正常进行。但是最后评估模型准确率的时候,对比到英文名称数据库的时候出现了键值错误。中文名称的数据库下的所有SQL均可以正常进行。

报错信息如下:

compare pred idx 328
compare pred idx 329
compare pred idx 330
compare pred idx 331
compare pred idx 332
compare pred idx 333
Traceback (most recent call last):
File "/home/bml/storage/DB-GPT-Hub-main/dbgpt_hub/eval/evaluation.py", line 1258, in
evaluate(
File "/home/bml/storage/DB-GPT-Hub-main/dbgpt_hub/eval/evaluation.py", line 754, in evaluate
g_sql = get_sql(schema, g_str)
File "/home/bml/storage/DB-GPT-Hub-main/dbgpt_hub/eval/process_sql.py", line 635, in get_sql
_, sql = parse_sql(toks, 0, tables_with_alias, schema)
File "/home/bml/storage/DB-GPT-Hub-main/dbgpt_hub/eval/process_sql.py", line 577, in parse_sql
from_end_idx, table_units, conds, default_tables = parse_from(
File "/home/bml/storage/DB-GPT-Hub-main/dbgpt_hub/eval/process_sql.py", line 451, in parse_from
idx, table_unit, table_name = parse_table_unit(
File "/home/bml/storage/DB-GPT-Hub-main/dbgpt_hub/eval/process_sql.py", line 293, in parse_table_unit
key = tables_with_alias[toks[idx]]
KeyError: '电视连续剧'

@Aolius
Copy link

Aolius commented Dec 7, 2023

Chase的数据库有的没翻译过来,数据集的问题

@wangyongshuai88
Copy link

请问如何用中文数据集训练 codellama ,求开源项目

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants