Skip to content

santiagodsm/sql-agent-loop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQL Agent Loop — POC

A proof of concept demonstrating that with a data dictionary and agentic AI, you can answer any business questions—even when your schema uses cryptic column names.

The idea

Business data is often stored with technical or obscure column names (e.g., geo_c, chn_x, st_f). A traditional BI tool or analyst needs to know these mappings. This POC shows that an LLM agent, equipped with:

  1. A data dictionary — plain-language definitions of tables and columns
  2. Agentic tools — ability to inspect schema, read the dictionary, and run SQL in a loop

can interpret natural-language questions, figure out the right mappings, write correct SQL, and return clear answers—without hardcoding queries or mappings.

Why metadata and governance matter

Maintaining good metadata — clear definitions, valid values, and examples for each column — is what makes this POC possible. Without it, the agent would be guessing at column meanings and likely produce wrong results or hallucinate mappings.

Data governance — consistent definitions, ownership, and change control — ensures the data dictionary stays accurate as schemas evolve. Poor or outdated metadata leads to AI that confidently gives wrong answers.

This POC illustrates that metadata and governance are the foundation for effective AI use. Invest in that foundation first; then agentic AI, natural-language querying, and self-service analytics become much more reliable and scalable.

How it works

  1. You ask a natural-language question (e.g., "What's the best-selling product?").
  2. The LLM receives the question and decides which tools to call.
  3. Tools the agent can use:
    • get_table_columns — list columns and types for a table
    • get_data_dictionary — fetch plain-language definitions for a table's columns
    • run_sql — execute a SQL statement and return results
  4. Agent loop — the model calls tools, receives results, and can call more tools until it has enough to answer (up to 15 turns).
  5. Output — the agent returns an explanation, the SQL it ran, and the final answer.

The column names in the schema are intentionally cryptic (e.g., st_f, geo_c, chn_x). The agent must use the data dictionary to map your business question to the right columns and values.

Data loaded

The demo loads two tables into SQLite (agent_demo.db):

sales

Column Type Description
tx_id INTEGER Transaction ID (unique key)
d_k TEXT Transaction date (YYYY-MM-DD)
geo_c TEXT Region code: NE, SE, MW, W
chn_x TEXT Channel code: W=Web, S=Store, M=Mobile
sku_z TEXT Product SKU (e.g., SKU-001)
cst_r TEXT Customer reference ID
u_n INTEGER Units sold
v_n REAL Net sales (USD, after discounts)
v_g REAL Gross sales (USD, before discounts)
st_f TEXT Status: P=Paid, R=Refunded

~135 rows of synthetic sales data across 4 regions, 3 channels, 4 SKUs, and 5 customers, spanning about 45 days.

data_dictionary

Stores business definitions for each column: table_name, column_name, data_type, definition, and example. The agent uses this to understand what columns mean without prior knowledge.

Example queries you can make

Ask in plain language:

  • "What's the best-selling product? Exclude refunds."
  • "Who is the best customer by revenue?"
  • "Which region had the highest net sales last week?"
  • "Compare sales by channel: Web vs Store vs Mobile."
  • "What was total revenue in February?"
  • "List top 3 products by units sold."
  • "How many refunded transactions are there?"
  • "Which customer bought the most units?"

The agent will call the data dictionary, build the right SQL, and return the answer.

Requirements

  • Python 3.9+
  • OpenAI API key in .env
  • Packages in requirements.txt

Quick start

  1. Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Configure environment variables:
cp .env.example .env

Then edit .env and set your OPENAI_API_KEY.

  1. Run the Gradio app:
python sql_agent_gradio.py

Or run the original notebook:

jupyter lab

Open 5_extra copy.ipynb and run the cells.

Project layout

  • sql_agent_gradio.py — Gradio app; prompts come from the UI instead of being hardcoded
  • 5_extra copy.ipynb — Original notebook with the agent loop and trace output
  • agent_demo.db — Created at runtime; sample sales data with a data dictionary

Notes

  • The demo creates agent_demo.db automatically on startup.
  • The Gradio app lets you edit the system and user prompts to test different questions and behaviors.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages