Skip to content

powerfist01/text-to-sql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Text-to-SQL with Claude

Convert natural language questions into SQL queries using Claude AI β€” built with progressive prompting techniques from basic to self-improving.

Python Anthropic VoyageAI License


Overview

Text-to-SQL is a project that demonstrates how to build a natural language database interface using Claude AI. Instead of writing SQL manually, you ask questions in plain English and the system generates and executes the SQL for you.

This project is structured as a learning progression β€” five approaches, each building on the last, from a simple one-shot prompt to a fully self-correcting pipeline.

uv run main.py --approach self_improvement --query "Which departments have the highest salary ratio?"
 Attempt 1 of 3
 Claude's Reasoning:
   1. I need MAX and MIN salary per department...
   2. I'll use HAVING to filter ratios above 3...

 Generated SQL:
   SELECT d.name, MAX(e.salary) / MIN(e.salary) AS salary_ratio
   FROM employees e
   JOIN departments d ON e.department_id = d.id
   GROUP BY d.name
   HAVING MAX(e.salary) / MIN(e.salary) > 3;

βœ… Success on attempt 1!

 department        salary_ratio
 Engineering       3.11
 Finance           4.47
 HR                4.54
 ...

Features

  • 5 prompting approaches β€” from basic to self-improving, all runnable from a single CLI
  • Pluggable architecture β€” every approach is self-contained and independently runnable
  • RAG-powered schema retrieval β€” only relevant schema columns are sent to Claude, scales to large databases
  • Self-correcting pipeline β€” Claude sees its own errors and fixes them automatically
  • Chain-of-thought reasoning β€” Claude explains its thinking before writing SQL
  • Persistent vector index β€” embeddings are built once and reused across runs

The 5 Approaches

# Approach Key Technique Best For
01 Basic Schema + question β†’ SQL Simple queries, quick prototyping
02 Few-Shot Add examples to guide style Consistent output formatting
03 Chain of Thought Step-by-step reasoning Complex multi-table queries
04 RAG Semantic schema retrieval Large databases with many tables
05 Self Improvement Automatic retry on failure Production reliability

Each approach builds on the previous one β€” RAG uses CoT, Self Improvement uses RAG.


Project Structure

text-to-sql/
β”œβ”€β”€ .env                         ← API keys (not committed)
β”œβ”€β”€ config.py                    ← Central environment config
β”œβ”€β”€ main.py                      ← Single CLI entry point
β”œβ”€β”€ pyproject.toml               ← Project dependencies (uv)
β”‚
β”œβ”€β”€ db/
β”‚   └── setup.py                 ← Database creation + shared utilities
β”‚                                   (get_schema_info, run_sql)
β”‚
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ 01_basic.py              ← Approach 1: Basic prompt
β”‚   β”œβ”€β”€ 02_few_shot.py           ← Approach 2: Few-shot examples
β”‚   β”œβ”€β”€ 03_chain_of_thought.py   ← Approach 3: CoT reasoning
β”‚   β”œβ”€β”€ 04_rag.py                ← Approach 4: RAG retrieval
β”‚   └── 05_self_improvement.py  ← Approach 5: Self-correcting loop
β”‚
└── data/
    β”œβ”€β”€ data.db                  ← SQLite database (auto-created)
    └── vector_db.pkl            ← VoyageAI vector index (auto-created)

Quickstart

Prerequisites

Installation

# Clone the repository
git clone https://github.com/yourusername/text-to-sql.git
cd text-to-sql

# Install dependencies
uv sync

# Set up environment variables
cp .env.example .env
# Edit .env and add your API keys

Environment Variables

Create a .env file in the project root:

ANTHROPIC_API_KEY=sk-ant-xxxxxxxx
VOYAGE_API_KEY=pa-xxxxxxxx

Set up the database

uv run db/setup.py

This creates data/data.db with:

  • 10 departments across US cities
  • 200 employees with names, ages, salaries, and hire dates

Usage

Run any approach via CLI

uv run main.py --approach <approach> --query "<your question>"

Examples

# Basic
uv run main.py --approach basic \
  --query "Who are the 5 highest paid employees?"

# Few-Shot
uv run main.py --approach few_shot \
  --query "What is the total salary expenditure per department?"

# Chain of Thought
uv run main.py --approach cot \
  --query "Which departments have an average salary above 120000?"

# RAG
uv run main.py --approach rag \
  --query "Which city location has the highest average employee salary?"

# Self Improvement
uv run main.py --approach self_improvement \
  --query "Show salary ratio per department where ratio exceeds 3"

Run an approach directly

Each file is independently runnable with its own test queries:

uv run src/03_chain_of_thought.py
uv run src/04_rag.py

How It Works

Database Layer (db/setup.py)

The shared foundation. Two utilities used by every approach:

  • get_schema_info() β€” reads and formats the database schema for Claude
  • run_sql() β€” executes generated SQL and returns a pandas DataFrame

Approach 1 β€” Basic

The simplest possible pipeline. Give Claude the full schema and the question, ask it to return SQL wrapped in <sql> tags.

Schema + Question β†’ Claude β†’ <sql>...</sql> β†’ Run β†’ Results

Approach 2 β€” Few-Shot

Add handcrafted examples of question β†’ SQL pairs to the prompt. Claude learns preferred SQL style β€” table aliases, JOIN patterns, aggregation formatting.

Approach 3 β€” Chain of Thought

Ask Claude to reason step by step inside <thought_process> tags before writing SQL. Forces deliberate reasoning, significantly improves accuracy on complex queries.

Approach 4 β€” RAG

Uses VoyageAI embeddings to find which schema columns are semantically relevant to the question. Only those columns are sent to Claude β€” keeping prompts small and focused regardless of database size.

Question β†’ VoyageAI embedding β†’ similarity search β†’ top columns β†’ Claude

The vector index is built once and saved to data/vector_db.pkl for reuse.

Approach 5 β€” Self Improvement

Wraps a retry loop around RAG + CoT. If the SQL fails, the error is sent back to Claude with the original question and failed SQL. Claude debugs and corrects itself, up to MAX_ATTEMPTS times.

Generate SQL β†’ Run β†’ βœ… Done
                  β†’ ❌ Send error back to Claude β†’ Generate fixed SQL β†’ Run β†’ ...

Tech Stack

Tool Purpose
Claude (claude-sonnet-4-5) SQL generation and reasoning
VoyageAI Text embeddings for RAG
SQLite Lightweight local database
pandas Query result formatting
NumPy Vector similarity computation
uv Python package management

Acknowledgements

This project is based on the Text to SQL with Claude cookbook by Mahesh Murag at Anthropic.


License

MIT License β€” see LICENSE for details.

About

Natural language to SQL using Claude AI - 5 prompting techniques from basic to self-improving.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages