AI-Powered Database Assistant

A sophisticated terminal-based database assistant that uses artificial intelligence to understand natural language queries and execute complex database operations. The system has been migrated to use LangGraph for orchestrating AI-powered workflows, intelligently determining user intent, planning multi-step database operations, and providing clear, contextual responses.

Features

Natural Language Processing: Understands and processes natural language database queries
Intelligent Intent Recognition: Automatically classifies queries as questions, context updates, or combined requests
Ambiguity Resolution: Iteratively clarifies ambiguous queries through AI-powered conversation
Automated Task Planning: Generates detailed, step-by-step execution plans for complex database operations
Multi-Step Execution: Executes database operations with proper schema exploration and relationship handling
Context Management: Maintains conversation context across interactions
LangGraph Orchestration: Uses LangGraph for stateful, graph-based workflow management
Web Interface: Clean Streamlit web interface
Comprehensive Logging: Detailed execution logging for debugging and monitoring

Architecture

The system has been migrated to use LangGraph, a framework for building stateful, multi-actor applications with LLMs. The application is structured as a directed graph where each node represents a processing step, and edges define the flow of execution based on the current state.

LangGraph Nodes

intent_determination: Analyzes user queries to classify intent as QUESTION, CONTEXT, or BOTH
context_update: Updates conversation context when needed (for CONTEXT or BOTH intents)
task_planning: Generates detailed execution plans for database operations using AI
task_execution: Executes planned tasks step-by-step using database tools
output_generation: Creates natural language responses from execution results

State Flow

START → intent_determination → context_update → [CONDITIONAL ROUTING]
                                      ↓
                            ┌─────────┴─────────┐
                            │                   │
                    CONTEXT │                   │ QUESTION/BOTH
                            │                   │
                            └─────────┬─────────┘
                                      ↓
                            output_generation ← task_execution ← task_planning
                                      ↓
                                     END

Core Modules

langgraph_app.py: LangGraph workflow definition and node implementations
main.py: Entry point with terminal UI loop
context_manager.py: Manages conversation context and information persistence
model_inference.py: Handles AI model interactions (Google Gemini API)
logger.py: Comprehensive logging system
terminal.py: Terminal UI management and display functions

Database Tools

The system provides three primary database interaction tools:

query_database(query: str): Execute SQL queries and return formatted results
get_tables(): Retrieve list of all database tables
get_table_schema(table_name: str): Get detailed schema information for specific tables

Data Storage

Database: SQLite database (datasets/chinook.db) - sample database with music store data
Context: Persistent context storage in data/context.txt
Logs: Execution logs stored in logs/ directory

How It Works

The LangGraph workflow processes user queries through a series of orchestrated nodes:

Query Input: User enters natural language queries through the terminal interface
Intent Determination: The intent_determination node analyzes whether the input is a QUESTION, CONTEXT, or BOTH
Context Update: The context_update node updates conversation context for CONTEXT/BOTH intents
Conditional Routing: Based on intent, routes to either output generation (for context-only) or task planning
Task Planning: The task_planning node generates detailed step-by-step execution plans using AI
Task Execution: The task_execution node executes each plan step, exploring schemas and running queries
Output Generation: The output_generation node creates natural language responses from results
State Persistence: Results update the conversation context for future interactions

Installation

Prerequisites

Python 3.13+
Google AI Studio API key
SQLite3

Setup

Clone the repository:

git clone <repository-url>
cd project-i

Create a virtual environment:

python -m venv project-1-venv
source project-1-venv/bin/activate  # On Windows: project-1-venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

The key dependencies include:

langgraph: Framework for building stateful, multi-actor applications with LLMs
langchain-core: Core LangChain abstractions and interfaces
google-generativeai: Google Gemini AI API client

Create a .env file with your API credentials:

API_KEY=your_google_ai_api_key_here
DEFAULT_MODEL=gemini-1.5-flash  # or your preferred model

Usage

Starting the Assistant

python app.py

The application uses Streamlit to provide a web-based interface for natural language database queries.

Example Interactions

Simple Query:

What is the capital of France?

Response: The capital of France is Paris.

Context + Query:

My name is John. What albums do I have?

Response: Based on the context that your name is John, here are your albums: ...

Complex Database Query:

Show me all albums by artists from Canada, including the artist name and album title.

*The LangGraph workflow will automatically:

Classify the query as a QUESTION
Generate a detailed execution plan exploring database schemas
Execute multi-step database operations with JOIN queries
Generate a natural language response*

Commands

Enter natural language queries
Type quit to exit the application

LangGraph Workflow Execution

The system processes each query through the following LangGraph nodes:

intent_determination: Classifies query type
context_update: Updates conversation context
task_planning: Creates execution plans (for questions)
task_execution: Runs database operations
output_generation: Formats responses

Database Schema

The system works with the Chinook database, which includes tables for:

Artists: Artist information
Albums: Album details with artist relationships
Tracks: Individual track information
Genres: Music genre classifications
Customers: Customer data
Invoices: Purchase records
Employees: Staff information

Configuration

Environment Variables

API_KEY: Google AI API key (required)
COHERE_API_KEY: Cohere API key (optional, for Cohere models)
BASE_URL: LM Studio base URL (optional, default: http://localhost:1234)
DATABASE_PATH: Path to SQLite database (default: datasets/chinook.db)
DEFAULT_MODEL: AI model to use (default: gemini-1.5-flash)

Database Configuration

The application now supports flexible database selection through the web interface:

Enter Path Method: Manually enter the path to your SQLite database
- Supports relative paths: datasets/chinook.db
- Supports absolute paths: /Users/username/Documents/mydb.db
Upload Database Method: Upload a SQLite database file directly through the UI
- Supported formats: .db, .sqlite, .sqlite3
- Uploaded files are stored in data/uploaded_databases/

See Database Configuration Documentation for detailed information.

To configure the database:

Open the Streamlit application
Click Settings in the sidebar
Under Database Configuration, choose your method
Click Save Settings to persist your configuration

Project Structure

Project-I/
├── src/                        # Main source code (organized by concern)
│   ├── config/                # ✅ Configuration (single source of truth)
│   │   ├── config.yaml       # Main configuration
│   │   ├── model_config.json # Model selection persistence
│   │   ├── settings.py       # Configuration loader
│   │   └── defaults.py       # Default values
│   ├── core/                 # Core business logic & agents
│   ├── services/             # External service integrations
│   ├── tools/                # MCP server & tool registry
│   ├── workflow/             # LangGraph workflow definitions
│   └── shared/               # Shared utilities (DI, paths)
├── data/                      # ✅ Runtime data only (gitignored)
│   ├── context.txt           # Runtime context storage
│   ├── chroma_db/            # Vector database
│   ├── vectors/              # Vector embeddings
│   └── uploaded_databases/   # User uploads
├── datasets/                  # Development datasets (gitignored)
├── tests/                     # Comprehensive test suite
├── docs/                      # Detailed documentation
├── logs/                      # Runtime logs (gitignored)
├── app.py                     # Streamlit web interface
├── requirements.txt           # Python dependencies
├── ARCHITECTURE.md            # ✅ Detailed architecture guide
└── README.md                  # This file

See ARCHITECTURE.md for detailed architecture documentation.

Adding New Features

LangGraph Nodes: Add new nodes to langgraph_app.py for additional processing steps
Database Tools: Extend database functions in langgraph_app.py with new operations
State Management: Modify the GraphState TypedDict to include new state fields
Conditional Routing: Update routing logic in the workflow for new execution paths
UI Components: Modify terminal.py for interface enhancements
AI Capabilities: Update prompt engineering in LangGraph node functions
Context Management: Enhance context_manager.py for better information handling

Logging

The system provides comprehensive logging through logger.py:

Execution Stages: Tracks major operation phases
Database Operations: Logs all database queries and results
AI Interactions: Records model inference calls and responses
Error Handling: Captures and logs exceptions with context

Logs are stored in the logs/ directory with timestamps and detailed metadata.

Troubleshooting

Common Issues

API Key Errors: Ensure API_KEY is set in .env file
Database Connection: Verify chinook.db exists in datasets/ directory
Permission Errors: Check write permissions for data/ and logs/ directories
LangGraph Import Errors: Ensure all LangGraph dependencies are installed (pip install -r requirements.txt)
Graph Compilation Errors: Check for syntax errors in langgraph_app.py node definitions
State Flow Issues: Verify that all required state fields are properly initialized and passed between nodes

LangGraph-Specific Troubleshooting

Node Execution Failures: Check individual node logs in the logs/ directory for specific error messages
Conditional Routing Problems: Verify that state fields used in routing conditions are correctly set
Async Operation Errors: Ensure all node functions are properly defined as async and handle exceptions
State Persistence: Check that state updates are correctly applied and persisted between nodes

Debug Mode

Enable debug output by modifying logging levels in logger.py or checking terminal debug prints. For LangGraph-specific debugging, examine the state transitions and node execution logs.

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

[Add license information here]

Acknowledgments

Built using Google's Gemini AI models
Uses the Chinook sample database for demonstrations
Terminal UI powered by Python curses library

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github		.github
.playwright-mcp		.playwright-mcp
data/chroma_db		data/chroma_db
datasets		datasets
db		db
docs		docs
plugins		plugins
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Architecture.png		Architecture.png
EVALUATION_STATUS.md		EVALUATION_STATUS.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
eval.py		eval.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
run_tests.py		run_tests.py
simple_test.py		simple_test.py
spider_tableqa_results.csv		spider_tableqa_results.csv
test_schema.py		test_schema.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Powered Database Assistant

Features

Architecture

LangGraph Nodes

State Flow

Core Modules

Database Tools

Data Storage

How It Works

Installation

Prerequisites

Setup

Usage

Starting the Assistant

Example Interactions

Commands

LangGraph Workflow Execution

Database Schema

Configuration

Environment Variables

Database Configuration

Project Structure

Adding New Features

Logging

Troubleshooting

Common Issues

LangGraph-Specific Troubleshooting

Debug Mode

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

kalki-eshwar/Project-I

Folders and files

Latest commit

History

Repository files navigation

AI-Powered Database Assistant

Features

Architecture

LangGraph Nodes

State Flow

Core Modules

Database Tools

Data Storage

How It Works

Installation

Prerequisites

Setup

Usage

Starting the Assistant

Example Interactions

Commands

LangGraph Workflow Execution

Database Schema

Configuration

Environment Variables

Database Configuration

Project Structure

Adding New Features

Logging

Troubleshooting

Common Issues

LangGraph-Specific Troubleshooting

Debug Mode

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages