SQL Agent Loop — POC

A proof of concept demonstrating that with a data dictionary and agentic AI, you can answer any business questions—even when your schema uses cryptic column names.

The idea

Business data is often stored with technical or obscure column names (e.g., geo_c, chn_x, st_f). A traditional BI tool or analyst needs to know these mappings. This POC shows that an LLM agent, equipped with:

A data dictionary — plain-language definitions of tables and columns
Agentic tools — ability to inspect schema, read the dictionary, and run SQL in a loop

can interpret natural-language questions, figure out the right mappings, write correct SQL, and return clear answers—without hardcoding queries or mappings.

Why metadata and governance matter

Maintaining good metadata — clear definitions, valid values, and examples for each column — is what makes this POC possible. Without it, the agent would be guessing at column meanings and likely produce wrong results or hallucinate mappings.

Data governance — consistent definitions, ownership, and change control — ensures the data dictionary stays accurate as schemas evolve. Poor or outdated metadata leads to AI that confidently gives wrong answers.

This POC illustrates that metadata and governance are the foundation for effective AI use. Invest in that foundation first; then agentic AI, natural-language querying, and self-service analytics become much more reliable and scalable.

How it works

You ask a natural-language question (e.g., "What's the best-selling product?").
The LLM receives the question and decides which tools to call.
Tools the agent can use:
- get_table_columns — list columns and types for a table
- get_data_dictionary — fetch plain-language definitions for a table's columns
- run_sql — execute a SQL statement and return results
Agent loop — the model calls tools, receives results, and can call more tools until it has enough to answer (up to 15 turns).
Output — the agent returns an explanation, the SQL it ran, and the final answer.

The column names in the schema are intentionally cryptic (e.g., st_f, geo_c, chn_x). The agent must use the data dictionary to map your business question to the right columns and values.

Data loaded

The demo loads two tables into SQLite (agent_demo.db):

`sales`

Column	Type	Description
`tx_id`	INTEGER	Transaction ID (unique key)
`d_k`	TEXT	Transaction date (YYYY-MM-DD)
`geo_c`	TEXT	Region code: NE, SE, MW, W
`chn_x`	TEXT	Channel code: W=Web, S=Store, M=Mobile
`sku_z`	TEXT	Product SKU (e.g., SKU-001)
`cst_r`	TEXT	Customer reference ID
`u_n`	INTEGER	Units sold
`v_n`	REAL	Net sales (USD, after discounts)
`v_g`	REAL	Gross sales (USD, before discounts)
`st_f`	TEXT	Status: P=Paid, R=Refunded

~135 rows of synthetic sales data across 4 regions, 3 channels, 4 SKUs, and 5 customers, spanning about 45 days.

`data_dictionary`

Stores business definitions for each column: table_name, column_name, data_type, definition, and example. The agent uses this to understand what columns mean without prior knowledge.

Example queries you can make

Ask in plain language:

"What's the best-selling product? Exclude refunds."
"Who is the best customer by revenue?"
"Which region had the highest net sales last week?"
"Compare sales by channel: Web vs Store vs Mobile."
"What was total revenue in February?"
"List top 3 products by units sold."
"How many refunded transactions are there?"
"Which customer bought the most units?"

The agent will call the data dictionary, build the right SQL, and return the answer.

Requirements

Python 3.9+
OpenAI API key in .env
Packages in requirements.txt

Quick start

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Configure environment variables:

cp .env.example .env

Then edit .env and set your OPENAI_API_KEY.

Run the Gradio app:

python sql_agent_gradio.py

Or run the original notebook:

jupyter lab

Open 5_extra copy.ipynb and run the cells.

Project layout

sql_agent_gradio.py — Gradio app; prompts come from the UI instead of being hardcoded
5_extra copy.ipynb — Original notebook with the agent loop and trace output
agent_demo.db — Created at runtime; sample sales data with a data dictionary

Notes

The demo creates agent_demo.db automatically on startup.
The Gradio app lets you edit the system and user prompts to test different questions and behaviors.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
sql_agent_gradio.py		sql_agent_gradio.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SQL Agent Loop — POC

The idea

Why metadata and governance matter

How it works

Data loaded

`sales`

`data_dictionary`

Example queries you can make

Requirements

Quick start

Project layout

Notes

About

Uh oh!

Releases

Packages

Languages

santiagodsm/sql-agent-loop

Folders and files

Latest commit

History

Repository files navigation

SQL Agent Loop — POC

The idea

Why metadata and governance matter

How it works

Data loaded

sales

data_dictionary

Example queries you can make

Requirements

Quick start

Project layout

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`sales`

`data_dictionary`

Packages