Skip to content

kalki-eshwar/Project-I

Repository files navigation

AI-Powered Database Assistant

A sophisticated terminal-based database assistant that uses artificial intelligence to understand natural language queries and execute complex database operations. The system has been migrated to use LangGraph for orchestrating AI-powered workflows, intelligently determining user intent, planning multi-step database operations, and providing clear, contextual responses.

Features

  • Natural Language Processing: Understands and processes natural language database queries
  • Intelligent Intent Recognition: Automatically classifies queries as questions, context updates, or combined requests
  • Ambiguity Resolution: Iteratively clarifies ambiguous queries through AI-powered conversation
  • Automated Task Planning: Generates detailed, step-by-step execution plans for complex database operations
  • Multi-Step Execution: Executes database operations with proper schema exploration and relationship handling
  • Context Management: Maintains conversation context across interactions
  • LangGraph Orchestration: Uses LangGraph for stateful, graph-based workflow management
  • Web Interface: Clean Streamlit web interface
  • Comprehensive Logging: Detailed execution logging for debugging and monitoring

Architecture

The system has been migrated to use LangGraph, a framework for building stateful, multi-actor applications with LLMs. The application is structured as a directed graph where each node represents a processing step, and edges define the flow of execution based on the current state.

LangGraph Nodes

  • intent_determination: Analyzes user queries to classify intent as QUESTION, CONTEXT, or BOTH
  • context_update: Updates conversation context when needed (for CONTEXT or BOTH intents)
  • task_planning: Generates detailed execution plans for database operations using AI
  • task_execution: Executes planned tasks step-by-step using database tools
  • output_generation: Creates natural language responses from execution results

State Flow

START → intent_determination → context_update → [CONDITIONAL ROUTING]
                                      ↓
                            ┌─────────┴─────────┐
                            │                   │
                    CONTEXT │                   │ QUESTION/BOTH
                            │                   │
                            └─────────┬─────────┘
                                      ↓
                            output_generation ← task_execution ← task_planning
                                      ↓
                                     END

Core Modules

  • langgraph_app.py: LangGraph workflow definition and node implementations
  • main.py: Entry point with terminal UI loop
  • context_manager.py: Manages conversation context and information persistence
  • model_inference.py: Handles AI model interactions (Google Gemini API)
  • logger.py: Comprehensive logging system
  • terminal.py: Terminal UI management and display functions

Database Tools

The system provides three primary database interaction tools:

  • query_database(query: str): Execute SQL queries and return formatted results
  • get_tables(): Retrieve list of all database tables
  • get_table_schema(table_name: str): Get detailed schema information for specific tables

Data Storage

  • Database: SQLite database (datasets/chinook.db) - sample database with music store data
  • Context: Persistent context storage in data/context.txt
  • Logs: Execution logs stored in logs/ directory

How It Works

The LangGraph workflow processes user queries through a series of orchestrated nodes:

  1. Query Input: User enters natural language queries through the terminal interface
  2. Intent Determination: The intent_determination node analyzes whether the input is a QUESTION, CONTEXT, or BOTH
  3. Context Update: The context_update node updates conversation context for CONTEXT/BOTH intents
  4. Conditional Routing: Based on intent, routes to either output generation (for context-only) or task planning
  5. Task Planning: The task_planning node generates detailed step-by-step execution plans using AI
  6. Task Execution: The task_execution node executes each plan step, exploring schemas and running queries
  7. Output Generation: The output_generation node creates natural language responses from results
  8. State Persistence: Results update the conversation context for future interactions

Installation

Prerequisites

  • Python 3.13+
  • Google AI Studio API key
  • SQLite3

Setup

  1. Clone the repository:
git clone <repository-url>
cd project-i
  1. Create a virtual environment:
python -m venv project-1-venv
source project-1-venv/bin/activate  # On Windows: project-1-venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

The key dependencies include:

  • langgraph: Framework for building stateful, multi-actor applications with LLMs
  • langchain-core: Core LangChain abstractions and interfaces
  • google-generativeai: Google Gemini AI API client
  1. Create a .env file with your API credentials:
API_KEY=your_google_ai_api_key_here
DEFAULT_MODEL=gemini-1.5-flash  # or your preferred model

Usage

Starting the Assistant

python app.py

The application uses Streamlit to provide a web-based interface for natural language database queries.

Example Interactions

Simple Query:

What is the capital of France?

Response: The capital of France is Paris.

Context + Query:

My name is John. What albums do I have?

Response: Based on the context that your name is John, here are your albums: ...

Complex Database Query:

Show me all albums by artists from Canada, including the artist name and album title.

*The LangGraph workflow will automatically:

  1. Classify the query as a QUESTION
  2. Generate a detailed execution plan exploring database schemas
  3. Execute multi-step database operations with JOIN queries
  4. Generate a natural language response*

Commands

  • Enter natural language queries
  • Type quit to exit the application

LangGraph Workflow Execution

The system processes each query through the following LangGraph nodes:

  • intent_determination: Classifies query type
  • context_update: Updates conversation context
  • task_planning: Creates execution plans (for questions)
  • task_execution: Runs database operations
  • output_generation: Formats responses

Database Schema

The system works with the Chinook database, which includes tables for:

  • Artists: Artist information
  • Albums: Album details with artist relationships
  • Tracks: Individual track information
  • Genres: Music genre classifications
  • Customers: Customer data
  • Invoices: Purchase records
  • Employees: Staff information

Configuration

Environment Variables

  • API_KEY: Google AI API key (required)
  • COHERE_API_KEY: Cohere API key (optional, for Cohere models)
  • BASE_URL: LM Studio base URL (optional, default: http://localhost:1234)
  • DATABASE_PATH: Path to SQLite database (default: datasets/chinook.db)
  • DEFAULT_MODEL: AI model to use (default: gemini-1.5-flash)

Database Configuration

The application now supports flexible database selection through the web interface:

  1. Enter Path Method: Manually enter the path to your SQLite database

    • Supports relative paths: datasets/chinook.db
    • Supports absolute paths: /Users/username/Documents/mydb.db
  2. Upload Database Method: Upload a SQLite database file directly through the UI

    • Supported formats: .db, .sqlite, .sqlite3
    • Uploaded files are stored in data/uploaded_databases/

See Database Configuration Documentation for detailed information.

To configure the database:

  1. Open the Streamlit application
  2. Click Settings in the sidebar
  3. Under Database Configuration, choose your method
  4. Click Save Settings to persist your configuration

Project Structure

Project-I/
├── src/                        # Main source code (organized by concern)
│   ├── config/                # ✅ Configuration (single source of truth)
│   │   ├── config.yaml       # Main configuration
│   │   ├── model_config.json # Model selection persistence
│   │   ├── settings.py       # Configuration loader
│   │   └── defaults.py       # Default values
│   ├── core/                 # Core business logic & agents
│   ├── services/             # External service integrations
│   ├── tools/                # MCP server & tool registry
│   ├── workflow/             # LangGraph workflow definitions
│   └── shared/               # Shared utilities (DI, paths)
├── data/                      # ✅ Runtime data only (gitignored)
│   ├── context.txt           # Runtime context storage
│   ├── chroma_db/            # Vector database
│   ├── vectors/              # Vector embeddings
│   └── uploaded_databases/   # User uploads
├── datasets/                  # Development datasets (gitignored)
├── tests/                     # Comprehensive test suite
├── docs/                      # Detailed documentation
├── logs/                      # Runtime logs (gitignored)
├── app.py                     # Streamlit web interface
├── requirements.txt           # Python dependencies
├── ARCHITECTURE.md            # ✅ Detailed architecture guide
└── README.md                  # This file

See ARCHITECTURE.md for detailed architecture documentation.

Adding New Features

  1. LangGraph Nodes: Add new nodes to langgraph_app.py for additional processing steps
  2. Database Tools: Extend database functions in langgraph_app.py with new operations
  3. State Management: Modify the GraphState TypedDict to include new state fields
  4. Conditional Routing: Update routing logic in the workflow for new execution paths
  5. UI Components: Modify terminal.py for interface enhancements
  6. AI Capabilities: Update prompt engineering in LangGraph node functions
  7. Context Management: Enhance context_manager.py for better information handling

Logging

The system provides comprehensive logging through logger.py:

  • Execution Stages: Tracks major operation phases
  • Database Operations: Logs all database queries and results
  • AI Interactions: Records model inference calls and responses
  • Error Handling: Captures and logs exceptions with context

Logs are stored in the logs/ directory with timestamps and detailed metadata.

Troubleshooting

Common Issues

  1. API Key Errors: Ensure API_KEY is set in .env file
  2. Database Connection: Verify chinook.db exists in datasets/ directory
  3. Permission Errors: Check write permissions for data/ and logs/ directories
  4. LangGraph Import Errors: Ensure all LangGraph dependencies are installed (pip install -r requirements.txt)
  5. Graph Compilation Errors: Check for syntax errors in langgraph_app.py node definitions
  6. State Flow Issues: Verify that all required state fields are properly initialized and passed between nodes

LangGraph-Specific Troubleshooting

  • Node Execution Failures: Check individual node logs in the logs/ directory for specific error messages
  • Conditional Routing Problems: Verify that state fields used in routing conditions are correctly set
  • Async Operation Errors: Ensure all node functions are properly defined as async and handle exceptions
  • State Persistence: Check that state updates are correctly applied and persisted between nodes

Debug Mode

Enable debug output by modifying logging levels in logger.py or checking terminal debug prints. For LangGraph-specific debugging, examine the state transitions and node execution logs.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

License

[Add license information here]

Acknowledgments

  • Built using Google's Gemini AI models
  • Uses the Chinook sample database for demonstrations
  • Terminal UI powered by Python curses library

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages