中文数据集Chase微调Codellama2-13b,评估模型准确率时，key值报错。KeyError: '电视连续剧' #141

cccatcxy · 2023-11-20T05:15:18Z

chase数据集的数据库有中文名称的数据库和英文名称的数据库，使用Chase数据集微调codellama时，训练正常进行，预测正常进行。但是最后评估模型准确率的时候，对比到英文名称数据库的时候出现了键值错误。中文名称的数据库下的所有SQL均可以正常进行。

报错信息如下：

compare pred idx 328
compare pred idx 329
compare pred idx 330
compare pred idx 331
compare pred idx 332
compare pred idx 333
Traceback (most recent call last):
File "/home/bml/storage/DB-GPT-Hub-main/dbgpt_hub/eval/evaluation.py", line 1258, in
evaluate(
File "/home/bml/storage/DB-GPT-Hub-main/dbgpt_hub/eval/evaluation.py", line 754, in evaluate
g_sql = get_sql(schema, g_str)
File "/home/bml/storage/DB-GPT-Hub-main/dbgpt_hub/eval/process_sql.py", line 635, in get_sql
_, sql = parse_sql(toks, 0, tables_with_alias, schema)
File "/home/bml/storage/DB-GPT-Hub-main/dbgpt_hub/eval/process_sql.py", line 577, in parse_sql
from_end_idx, table_units, conds, default_tables = parse_from(
File "/home/bml/storage/DB-GPT-Hub-main/dbgpt_hub/eval/process_sql.py", line 451, in parse_from
idx, table_unit, table_name = parse_table_unit(
File "/home/bml/storage/DB-GPT-Hub-main/dbgpt_hub/eval/process_sql.py", line 293, in parse_table_unit
key = tables_with_alias[toks[idx]]
KeyError: '电视连续剧'

Aolius · 2023-12-07T08:29:21Z

Chase的数据库有的没翻译过来，数据集的问题

wangyongshuai88 · 2024-01-27T12:55:09Z

请问如何用中文数据集训练 codellama ，求开源项目

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

中文数据集Chase微调Codellama2-13b,评估模型准确率时，key值报错。KeyError: '电视连续剧' #141

中文数据集Chase微调Codellama2-13b,评估模型准确率时，key值报错。KeyError: '电视连续剧' #141

cccatcxy commented Nov 20, 2023

Aolius commented Dec 7, 2023

wangyongshuai88 commented Jan 27, 2024

中文数据集Chase微调Codellama2-13b,评估模型准确率时，key值报错。KeyError: '电视连续剧' #141

中文数据集Chase微调Codellama2-13b,评估模型准确率时，key值报错。KeyError: '电视连续剧' #141

Comments

cccatcxy commented Nov 20, 2023

Aolius commented Dec 7, 2023

wangyongshuai88 commented Jan 27, 2024