New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Text-to-SQL增强参考 #77

Open

cnsky2016 opened this issue Sep 13, 2023 · 1 comment

cnsky2016 commented Sep 13, 2023 •

edited

根据https://yale-lily.github.io/spider Leaderboard - Execution with Values章节排名前3并且公布论文的有DIN-SQL和C3

DIN-SQL+GPT4

论文地址：https://arxiv.org/abs/2304.11015
核心思想：将错误分类，然后针对每一类错误用一个子任务实现，最终组合成解决方案，分为以下几个模块
Schema-linking：利用COT（思维链）提取需要的表和字段
Classification & Decomposition Module：将查询分类：简单、非嵌套复杂查询、嵌套复杂查询
SQL Generation Module：根据上一步分类，分别处理：
- 非嵌套复杂查询：使用COT，增加中间步骤提示SQL，其中中间步骤提示内容来自NatSQL
- 嵌套复杂查询：先生成每个子查询，再组合
Self-correction Module: 对于生成的SQL，根据模型类型不同CodeX/GPT4使用不同的promot
- 将生成的SQL作为有错误的SQL，让模型尝试修正错误
- 根据给定的tips让模型修正SQL

C3

论文地址：https://arxiv.org/abs/2307.07306
核心思想：将错误分类，然后针对每一类错误优化Prompt
Clear Prompting（CP）：
- 将prompt分为指令、上下文（表schema）、问题三部分，提高准确率
- 通过表召回、字段召回两步prompt，实现上下文部分的生成
Calibration with Hints (CH) ：对于常见的使用多余字段、join错误，在prompt中增加了部分提示语
Consistency Output（CO）：针对模型输出不稳定，每次调用让模型生成多个SQL，在数据库上执行SQL，剔除有错误的，然后投票选一个SQL作为最终SQL

Member

wangzaistone commented Sep 13, 2023

Great suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment