Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问如果要自己准备dataset做训练或者测试,有什么格式要求吗? #51

Closed
dyc1998108 opened this issue Jul 26, 2023 · 5 comments

Comments

@dyc1998108
Copy link

请问如果要自己准备dataset,有什么格式要求吗?

@lihaoyang-ruc
Copy link
Contributor

这个问题在之前类似的issue中有回答过。以下是原答案:

To train RESDSQL on your dataset, you have to prepare at least three files (Take Spider's file as an example):

  • database, a folder where the sqlite databases are saved.
  • train_spider.json, a json file that contains pairs of training data, each of them should contain three fields: db_id, query, and question.
  • tables.json, a json file that describes the schema of all databases.

To run inference and evaluation, you should prepare a separate dev_gold.sql file containing the gold SQL query and its corresponding db_id.

Originally posted by @lihaoyang-ruc in #45 (comment)

@dyc1998108
Copy link
Author

好的谢谢,那如果不需要从database执行SQL,只是需要生成SQL的话还需要database吗?除此之外只需要生成SQL还需要哪些调整呢?

@lihaoyang-ruc
Copy link
Contributor

目前是需要的,因为我依赖database来采样生成没有语法错误的SQL,你可以在sqlite中创建空数据库,不插入任何数据,只当作语法检查器使用。又或者你可以自己实现一个语法检查器,替代掉去database中执行的步骤。

@dyc1998108
Copy link
Author

好的,十分感谢,database执行步骤具体是在哪个文件呢?我看一下如何自己实现检查器替代?

@lihaoyang-ruc
Copy link
Contributor

utils/text2sql_decoding_utils.py文件的results = execute_sql(cursor, pred_sql)行。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants