A natural language to SQL chatbot that generates and executes queries against PostgreSQL and Redshift databases.
- Natural language to SQL query generation
- Database schema analysis and context
- Safe query execution (SELECT only)
- Interactive Web UI (Streamlit) - Modern chat interface
- Interactive CLI interface
- Conversation history support
- Support for PostgreSQL and Redshift
- Local LLM support via Ollama (or OpenAI)
# Start Ollama in Docker (pulls model on first run)
docker-compose up -d
# Wait for model to download (first time only, ~1-2 minutes)
docker-compose logs -f ollamapip install -r requirements.txtCopy .env.example to .env and configure your settings:
Copy-Item .env.example .envEdit .env with your database credentials:
# LLM (already configured for local Ollama)
LLM_PROVIDER=ollama
OLLAMA_URL=http://localhost:11434
OLLAMA_MODEL=llama3.2:1b
# Database
DB_TYPE=redshift # or postgresql
DB_HOST=your-cluster.region.redshift.amazonaws.com
DB_PORT=5439
DB_NAME=your_database
DB_USER=your_username
DB_PASSWORD=your_password
DB_SCHEMA=public # Optional: specify schema to analyze
SSLMODE=requireOption A: Web UI (Recommended)
.\start_ui.ps1Or directly:
streamlit run src\streamlit_app.pyThe app will open in your browser at http://localhost:8501
Option B: CLI Interface
python src\main.pyOnce running, type your questions in natural language:
π€ You: Show me all customers who made purchases last month
π€ Generating query...
π SQL Query:
```sql
SELECT DISTINCT c.customer_id, c.name, c.email
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= DATE('now', 'start of month', '-1 month')
AND o.order_date < DATE('now', 'start of month');
π‘ Explanation: This query finds customers with orders in the previous month.
Execute this query? (Y/n): y
## Streamlit Web UI
The Streamlit interface provides a modern, user-friendly chat experience:
### Features
- **Chat Interface**: Natural conversation flow with message history
- **SQL Preview**: See generated queries before executing them
- **Interactive Execution**: Click to execute or cancel queries
- **Result Visualization**: Results displayed in formatted tables
- **Sidebar Controls**:
- View connection status
- Clear chat history
- View database schema
- Reconnect to database
- **Real-time Feedback**: Loading states and error handling
### Usage
1. Launch the web UI:
```powershell
.\start_ui.ps1
-
Click "Connect to Database" to initialize the connection
-
Start chatting! Type questions like:
- "Show me all tables"
- "List the top 10 customers by revenue"
- "What are the columns in the orders table?"
-
When a query is generated:
- Review the SQL and explanation
- Click "
βΆοΈ Execute" to run it - Click "β Cancel" to skip execution
-
Use sidebar buttons to:
- View database schema
- Clear chat history
- Reconnect if needed
schema- View the database schema for the current schemaschemas- List all available schemas in the databaseclear- Clear conversation historyquitorexit- Exit the application
SQLite (default):
DB_TYPE=sqlite
DB_NAME=example.dbPostgreSQL:
DB_TYPE=postgresql
DB_HOST=localhost
DB_PORT=5432
DB_NAME=mydb
DB_USER=readonly_user
DB_PASSWORD=your_password
DB_SCHEMA=public # Optional: specify which schema to analyzeRedshift:
DB_TYPE=redshift
DB_HOST=your-cluster.region.redshift.amazonaws.com
DB_PORT=5439
DB_NAME=mydb
DB_USER=readonly_user
DB_PASSWORD=your_password
DB_SCHEMA=analytics # Optional: specify which schema to analyze
SSLMODE=requireBy default, the chatbot analyzes the default schema (public for PostgreSQL/Redshift). You can specify a different schema using the DB_SCHEMA environment variable:
DB_SCHEMA=your_schema_nameThis is useful when:
- You have multiple schemas in your database
- You want to focus on a specific schema (e.g.,
analytics,staging,production) - You want to reduce the amount of schema information sent to the LLM
Commands:
- Use
schemascommand in CLI to see all available schemas - Use "List All Schemas" button in Streamlit UI to browse schemas
- Current schema is marked in the schema list
Currently supports OpenAI:
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
LLM_MODEL=gpt-4sql-chatbot/
βββ src/
β βββ main.py # CLI entry point
β βββ streamlit_app.py # Streamlit web UI
β βββ database/
β β βββ connector.py # Database connection
β β βββ schema_analyzer.py # Schema extraction
β β βββ executor.py # Query execution
β βββ agent/
β β βββ query_generator.py # LLM-based query generation
β βββ utils/
β βββ formatters.py # Result formatting
βββ config/
βββ requirements.txt
βββ start_ui.ps1 # Start Streamlit UI script
βββ .env.example
βββ README.md
To quickly test the chatbot, create a sample SQLite database:
import sqlite3
conn = sqlite3.connect('example.db')
cursor = conn.cursor()
# Create sample tables
cursor.execute('''
CREATE TABLE customers (
customer_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
''')
cursor.execute('''
CREATE TABLE orders (
order_id INTEGER PRIMARY KEY,
customer_id INTEGER,
order_date DATE,
order_total DECIMAL(10,2),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
)
''')
# Insert sample data
cursor.execute("INSERT INTO customers (name, email) VALUES ('John Doe', 'john@example.com')")
cursor.execute("INSERT INTO orders (customer_id, order_date, order_total) VALUES (1, '2025-10-15', 99.99)")
conn.commit()
conn.close()- Read-only mode: Only SELECT queries are allowed
- Query validation: SQL is parsed and validated before execution
- Result limits: Maximum row count to prevent memory issues
- Timeout protection: Query timeout configuration
- Python 3.8+
- SQLAlchemy 2.0+
- OpenAI API key (or other LLM provider)
- Database access (SQLite, PostgreSQL, or MySQL)
This repository includes a docker-compose.yml that attempts to run an Ollama
service alongside the chatbot. Ollama provides a local model server that
exposes a simple HTTP API. The ollama service in the compose file is a
placeholder - replace the image with your preferred Ollama image or run
Ollama on the host and point OLLAMA_URL to it.
To run the chatbot and Ollama together:
docker compose up --buildThe chatbot container is configured to use:
LLM_PROVIDER=ollamaOLLAMA_URL=http://ollama:11434OLLAMA_MODEL=mosaicml/mpt-7b-instruct(example)
Model recommendation (local):
- Small/CPU-only:
ggml/gpt4all-*,llama-family small models (fast, lower quality) - Medium/GPU:
mosaicml/mpt-7b-instruct,meta-llama/Llama-2-7b-chat(better quality, needs a GPU for low latency) - Large/high-quality (production-grade):
meta-llama/Llama-2-13b-chator larger (requires multi-GPU or specialized infra)
Notes:
- Always check the model's license before deploying.
- For production traffic, consider using a GPU-backed inference server (TGI) or a managed API.
Import errors for database drivers:
- PostgreSQL:
pip install psycopg2-binary - MySQL:
pip install pymysql
OpenAI API errors:
- Verify your API key in
.env - Check your OpenAI account has credits
Database connection errors:
- Verify database credentials
- Ensure database server is running
- Check network connectivity
MIT