SQLify.AI - Multi Agent System

A robust system for converting natural language queries into precise SQL statements using a multi-stage pipeline combining Retrieval-Augmented Generation (RAG) and Language Model (LLM) processing.

📋 Project Overview

This project implements an advanced natural language to SQL conversion system that enhances query generation through a specialized pipeline:

Table Selection: Identifies the most relevant tables needed for a given query
Column Pruning: Determines the essential columns from those tables
SQL Generation: Constructs accurate SQL queries using the filtered schema and similar examples

The system combines vector database similarity search (RAG) with strategic LLM prompting to achieve improved results compared to single-step conversion approaches.

🏗️ System Architecture

The pipeline consists of the following components:

RAG-Based Table Selection: Uses vector similarity to identify potentially relevant tables
LLM-Based Table Refinement: Determines which tables are truly necessary for the query
Column Pruning: Filters columns to include only those needed for the query
Similar Query Retrieval: Finds analogous queries from the training data to serve as examples
Final SQL Generation: Creates the SQL query using the pruned schema and example queries

🛠️ Key Components

Database and Schema Management

Vector database for storing embeddings of table schemas and example queries
PostgreSQL integration for storing and retrieving data

NLP and LLM Integration

Embedding generation using text-embedding-ada-002
LLM-powered processing using Groq API with Qwen and Llama models
Multi-stage prompting strategy for improved query accuracy

RAG Implementation

Schema embedding and retrieval
Similar query identification based on vector similarity
Integration of retrieved context into LLM prompting

📦 Core Files

main.py: Orchestrates the complete pipeline and evaluates query accuracy
single_pipeline.py: Implements the individual pipeline components (table selection, column pruning, SQL generation)
RAG/embedding_creator.py: Handles embedding generation and vector search for tables and example queries
RAG/db_conf.py: Database connection configuration
test_schema_embedding.py: Utilities for schema embedding and vector database operations

📊 Performance Metrics

Metric Type	Component	Performance
Tokens	Table Selection	~500
	Column Pruning	~800
	SQL Generation	~1200
Time (sec)	Table Selection	0.8s
	Column Pruning	1.2s
	SQL Generation	1.5s
Totals	Tokens/Query	~2500
	Time/Query	3.5s

�‍💻 User Interface

The project features an interactive web interface built with Streamlit, allowing users to easily:

Input natural language queries
Visualize each stage of the Text-to-SQL pipeline
See generated SQL queries and performance metrics in real-time

The interface is implemented in view.py which:

Provides a clean dashboard for query input
Shows step-by-step processing with visual cues
Displays performance metrics for each stage
Presents the final SQL query with syntax highlighting

�🚀 Getting Started

Prerequisites

Python 3.10+
PostgreSQL database with pgvector extension
Required Python packages (see requirements.txt)
API keys for LLM services (Groq, Azure OpenAI)

Installation

Clone the repository:

git clone https://github.com/dash-dash-org/Adobe.git
cd Adobe

Install dependencies:
```
pip install -r requirements.txt
```

Set up environment variables:

# Create .env file with necessary API keys
GROQ_API_KEY_RAG=your_api_key
AZURE_OPENAI_API_KEY_GPT4=your_api_key
AZURE_OPENAI_ENDPOINT=your_endpoint

Using the Pipeline

Streamlit Web Interface

Launch the user-friendly web interface:

streamlit run view.py

This opens an interactive dashboard where you can:

Enter natural language queries in plain English
Watch as the system processes each stage of the pipeline
View the selected tables and columns
Get the final SQL query with syntax highlighting
See detailed performance metrics for each step

Command Line Usage

Run the main script with your test data:

python main.py

Or use the single pipeline directly in your own code:

from single_pipeline import table_agent, prune_agent, final_sql_query_generator
import RAG.embedding_creator as search_api

# Get relevant tables through RAG
relevant_tables = search_api.search_tables(nl_query)

# Refine tables through LLM
refined_tables, tokens = table_agent(nl_query, relevant_tables)

# Prune columns
relevant_schema, tokens = prune_agent(nl_query, refined_tables)

# Get similar queries as examples
similar_queries = search_api.search_similar_query(nl_query)

# Generate SQL query
sql_query, tokens = final_sql_query_generator(nl_query, relevant_schema, similar_queries)

🔬 How It Works

The system processes natural language queries through multiple stages:

Initial RAG Search: Converts the query to an embedding and finds similar tables in the vector database
Table Agent: Uses an LLM to analyze which tables are actually required
Column Pruning: Determines which columns from the selected tables are necessary
Similar Query Search: Finds examples of similar queries and their SQL translations
SQL Generation: Combines the pruned schema and examples to generate the final SQL

This multi-step process achieves better results than single-step conversion by breaking down the complex task into manageable components.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

👏 Acknowledgments

Groq for providing fast LLM inference
Azure OpenAI for embedding generation
pgvector for efficient vector similarity search

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
RAG		RAG
__pycache__		__pycache__
files		files
images		images
other		other
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
result.json		result.json
schema_description.json		schema_description.json
single_pipeline.py		single_pipeline.py
table_schemas.json		table_schemas.json
test_schema_embedding.py		test_schema_embedding.py
view.py		view.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SQLify.AI - Multi Agent System

📋 Project Overview

🏗️ System Architecture

🛠️ Key Components

Database and Schema Management

NLP and LLM Integration

RAG Implementation

📦 Core Files

📊 Performance Metrics

�‍💻 User Interface

�🚀 Getting Started

Prerequisites

Installation

Using the Pipeline

Streamlit Web Interface

Command Line Usage

🔬 How It Works

🤝 Contributing

👏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

shank250/Multi-agent-System-for-converting-NL-to-SQL

Folders and files

Latest commit

History

Repository files navigation

SQLify.AI - Multi Agent System

📋 Project Overview

🏗️ System Architecture

🛠️ Key Components

Database and Schema Management

NLP and LLM Integration

RAG Implementation

📦 Core Files

📊 Performance Metrics

�‍💻 User Interface

�🚀 Getting Started

Prerequisites

Installation

Using the Pipeline

Streamlit Web Interface

Command Line Usage

🔬 How It Works

🤝 Contributing

👏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages