# Automated Text-to-SQL Agent Demo

This notebook demonstrates the **FULLY AUTOMATED** workflow of the Text-to-SQL Agent.

## Key Features

✅ **Fully Automated** - No manual steps required

✅ **Intelligent** - Uses LLM for understanding and generation

✅ **Resilient** - Built-in error handling with retries

✅ **Recoverable** - Session persistence for interrupted workflows

✅ **Flexible** - Handles corrections and ambiguity

## What You'll Learn

1. Basic automated queries
2. Query with BigQuery execution
3. Handling ambiguity and corrections
4. Session persistence and recovery
5. Comparison with manual approach

## Setup and Imports

First, let's import the necessary modules. The agent automatically loads configuration from `.env` file.

In [None]:
import sys
from pathlib import Path
import json

# Add parent directory to path
sys.path.insert(0, str(Path.cwd().parent))

from src import Text2SQLAgent, settings
from src.utils import setup_logger

logger = setup_logger(__name__)

print("✅ Imports successful!")

## Helper Functions

These functions help format and display results nicely.

In [None]:
def print_section(title: str):
    """Print a formatted section header."""
    print("\n" + "=" * 70)
    print(f"  {title}")
    print("=" * 70)


def print_result(result: dict):
    """Print query result in a formatted way."""
    if result["success"]:
        print("✅ SUCCESS!")
        print(f"\n📝 Generated SQL:")
        print("-" * 70)
        print(result["sql"])
        print("-" * 70)

        if "results" in result:
            print(f"\n📊 Results: {result.get('row_count', 0)} rows")
            if result.get("results", {}).get("rows"):
                print("\nFirst few rows:")
                for i, row in enumerate(result["results"]["rows"][:5], 1):
                    print(f"  {i}. {row}")
    else:
        print(f"❌ FAILED: {result.get('error')}")
        print(f"Message: {result.get('message')}")

        if "failure_summary" in result:
            print("\n📋 Failure Summary:")
            summary = result["failure_summary"]
            print(f"  - Identified tables: {summary.get('identified_tables', [])}")
            print(f"  - Iterations: {summary.get('attempted_iterations', 0)}")
            print(f"  - Recommendations:")
            for rec in summary.get("recommendations", []):
                print(f"    • {rec}")

print("✅ Helper functions loaded!")

## Check Configuration

Before we begin, let's verify that the necessary configuration is in place.

### Required Configuration:

Create a `.env` file in the project root with:

```bash
# Azure OpenAI Configuration
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_KEY=your-api-key-here
AZURE_OPENAI_DEPLOYMENT=gpt-4

# Google Cloud BigQuery Configuration
GCP_PROJECT_ID=your-gcp-project-id
BIGQUERY_DATASET=your_dataset_name
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

# Schema Configuration
SCHEMA_DIRECTORY=/path/to/your/schema_directory
```

## ConnectChain Support (AMEX Enterprise)

This system supports **ConnectChain**, AMEX's enterprise AI framework, which provides:

- ✅ **EAS (Enterprise Auth Service)** integration
- ✅ **Proxy configuration** support for corporate networks
- ✅ **Certificate management** for secure connections
- ✅ **Centralized LLM configuration** via YAML

### Using ConnectChain

To use ConnectChain instead of direct Azure OpenAI:

**1. Set environment variable:**
```bash
USE_CONNECTCHAIN=true
CONFIG_PATH=connectchain.config.yml
WORKDIR=.
```

**2. Configure `connectchain.config.yml`** with your model settings

**3. Add EAS credentials** (if required):
```bash
CONSUMER_ID=your-consumer-id
CONSUMER_SECRET=your-consumer-secret
```

**4. No code changes needed!** The agent automatically uses ConnectChain when configured.

For detailed setup instructions, see **[CONNECTCHAIN_SETUP.md](../CONNECTCHAIN_SETUP.md)**

For more on ConnectChain: https://github.com/americanexpress/connectchain

In [None]:
# Check configuration
try:
    endpoint = settings.get("azure_openai.endpoint")
    project_id = settings.get("bigquery.project_id")
    schema_dir = settings.get("schema.schema_directory")
    
    print("✅ Configuration check passed!")
    print(f"   Azure OpenAI Endpoint: {endpoint}")
    print(f"   GCP Project ID: {project_id}")
    print(f"   Schema Directory: {schema_dir}")
    config_ok = True
except ValueError as e:
    print(f"⚠️  Configuration incomplete: {e}")
    print("\n💡 Please set up your .env file before continuing.")
    config_ok = False

---

# DEMO 1: Basic Automated Queries

## Overview

This demo shows how the agent automatically handles simple queries without any manual intervention.

## What Happens Automatically:

1. ✅ **Schema Loading** - Loads all tables from directory
2. ✅ **Query Understanding** - Uses LLM to understand the query
3. ✅ **Table Identification** - Identifies relevant tables automatically
4. ✅ **Column Selection** - Selects appropriate columns
5. ✅ **SQL Generation** - Generates BigQuery-compliant SQL
6. ✅ **Validation** - Validates SQL syntax

In [None]:
print_section("DEMO 1: Basic Automated Queries")

print("\n🤖 Initializing Text-to-SQL Agent...")
print("   (This automatically loads schema from directory)")

try:
    # Initialize agent - it loads schema automatically
    agent = Text2SQLAgent()

    print(f"\n✅ Agent initialized with {len(agent.schema.tables)} tables")
    print(f"   Tables: {list(agent.schema.tables.keys())}")

except Exception as e:
    print(f"\n⚠️  Could not initialize agent: {e}")
    print("\n💡 To run this demo:")
    print("   1. Set SCHEMA_DIRECTORY in your .env file")
    print("   2. Place your Excel schema files in that directory")
    print("   3. Configure Azure OpenAI credentials")

### Example Query 1: Simple Table Query

Let's start with a simple query that requires only one table.

In [None]:
query = "Show me all customers"

print(f"\n💬 Query: \"{query}\"")
print("\n🔄 Processing automatically...")

try:
    # THIS IS IT - Just call query()!
    # Everything else happens automatically:
    # - Identifies tables (Customers)
    # - Generates SQL (SELECT * FROM Customers)
    result = agent.query(query, execute=False)

    print_result(result)

except Exception as e:
    print(f"❌ Error: {e}")

### Example Query 2: Aggregation Query

Now let's try a query that requires aggregation and filtering.

In [None]:
query = "What are the top 5 products by sales?"

print(f"\n💬 Query: \"{query}\"")
print("\n🔄 The agent will automatically:")
print("   - Identify the Products/Sales table")
print("   - Add ORDER BY and LIMIT clauses")
print("   - Apply appropriate aggregation")

try:
    result = agent.query(query, execute=False)
    print_result(result)

except Exception as e:
    print(f"❌ Error: {e}")

### Example Query 3: Filtered Query

A query with specific filtering criteria.

In [None]:
query = "List customers from the North region"

print(f"\n💬 Query: \"{query}\"")
print("\n🔄 The agent will automatically:")
print("   - Identify the Customers table")
print("   - Add WHERE clause for region filter")

try:
    result = agent.query(query, execute=False)
    print_result(result)

except Exception as e:
    print(f"❌ Error: {e}")

### Example Query 4: Date Aggregation

A more complex query with date extraction and grouping.

In [None]:
query = "Show total revenue by month for 2024"

print(f"\n💬 Query: \"{query}\"")
print("\n🔄 The agent will automatically:")
print("   - Identify the Orders/Sales table")
print("   - Extract month from date column")
print("   - Add GROUP BY clause")
print("   - Filter for year 2024")

try:
    result = agent.query(query, execute=False)
    print_result(result)

except Exception as e:
    print(f"❌ Error: {e}")

---

# DEMO 2: Query with BigQuery Execution

## Overview

This demo shows how to execute queries directly on BigQuery and retrieve results.

## What Happens:

1. ✅ All steps from Demo 1
2. ✅ **SQL Validation** - Dry-run on BigQuery
3. ✅ **Cost Estimation** - Estimates bytes to be processed
4. ✅ **Query Execution** - Runs on BigQuery
5. ✅ **Result Retrieval** - Fetches and formats results

**Note:** This requires BigQuery credentials to be properly configured.

In [None]:
print_section("DEMO 2: Automated Query with Execution")

print("\n🎯 This example executes the query on BigQuery")

query = "Show me the top 10 customers by total orders"
print(f"\n💬 Query: \"{query}\"")
print("\n🔄 Processing...")
print("   1. Understanding query with LLM...")
print("   2. Identifying relevant tables...")
print("   3. Inferring joins automatically...")
print("   4. Generating SQL...")
print("   5. Validating SQL...")
print("   6. Executing on BigQuery...")

try:
    # Set execute=True to run on BigQuery
    result = agent.query(query, execute=True)

    print_result(result)

except Exception as e:
    print(f"\n⚠️  {e}")
    print("\n💡 This requires:")
    print("   - BigQuery credentials configured")
    print("   - GCP_PROJECT_ID and BIGQUERY_DATASET set")

---

# DEMO 3: Handling Ambiguity and Corrections

## Overview

Sometimes queries can be ambiguous (e.g., multiple ways to join tables). The agent automatically:

1. ✅ **Detects Ambiguity** - Identifies when multiple interpretations exist
2. ✅ **Pauses Execution** - Stops before generating incorrect SQL
3. ✅ **Saves Session** - Persists state for recovery
4. ✅ **Requests Clarification** - Presents options to user
5. ✅ **Applies Correction** - Resumes with user's clarification

## Correction Flow:

```
Query → Ambiguity Detected → Session Saved → User Clarifies → Resume
```

In [None]:
print_section("DEMO 3: Automated Handling of Ambiguity")

print("\n🔀 When ambiguity is detected, the agent automatically:")
print("   1. Pauses execution")
print("   2. Saves session state")
print("   3. Asks for user clarification")
print("   4. Resumes with correction")

# Query that might have ambiguous joins
query = "Show me sales data with customer information"
print(f"\n💬 Query: \"{query}\"")

try:
    # First attempt
    result = agent.query(query, execute=False, return_session=True)

    if result["success"]:
        print_result(result)
    else:
        if result["error"] == "ambiguity":
            print(f"\n⚠️  Ambiguity detected!")
            print(f"Message: {result['message']}")
            print(f"\nOptions:")
            for i, option in enumerate(result.get("options", []), 1):
                print(f"  {i}. {option}")

            # Save session ID for later
            session_id = result["session_id"]
            print(f"\n💾 Session saved: {session_id}")
        else:
            print_result(result)

except Exception as e:
    print(f"\n⚠️  {e}")

### Applying Correction

If ambiguity was detected above, let's apply a correction.

In [None]:
# Check if we have a session_id from previous cell
if 'session_id' in locals() and session_id:
    print(f"\n👤 User provides correction: 'Use customer_id to join tables'")

    try:
        correction_result = agent.query_with_correction(
            session_id=session_id,
            correction="Use customer_id to join tables",
            execute=False
        )

        print(f"\n🔄 Reprocessing with correction...")
        print_result(correction_result)

    except Exception as e:
        print(f"❌ Error applying correction: {e}")
else:
    print("\n💡 No ambiguity was detected, so no correction needed.")
    print("   This cell is only relevant if the previous query had ambiguity.")

---

# DEMO 4: Session Persistence & Recovery

## Overview

Sessions are automatically saved and can be resumed later. This is useful for:

- **API Timeouts** - Resume after API becomes available
- **Review Before Execution** - Generate SQL, review, then execute
- **Incremental Corrections** - Apply multiple corrections iteratively
- **Audit Trail** - Track all attempts and decisions

## Session Contents:

Each session stores:
- Original query
- Identified tables and columns
- Inferred joins
- Generated SQL (all attempts)
- Corrections applied
- State transitions
- Timestamps

In [None]:
print_section("DEMO 4: Session Persistence & Recovery")

print("\n💾 Sessions are automatically saved and can be resumed")
print("   Useful when:")
print("   - API calls timeout")
print("   - User wants to review before executing")
print("   - Need to apply corrections later")

query = "Analyze customer purchase patterns"
print(f"\n💬 Query: \"{query}\"")

try:
    # Generate SQL without executing
    result = agent.query(query, execute=False, return_session=True)

    if result["success"]:
        print(f"\n✅ SQL generated and session saved")
        print(f"Session ID: {result['session']['session_id']}")
        print(f"\n📝 Generated SQL:")
        print("-" * 70)
        print(result["sql"])
        print("-" * 70)

        # Save the session ID for the next cell
        saved_session_id = result['session']['session_id']

        print(f"\n💡 You can now:")
        print(f"   - Review the SQL")
        print(f"   - Execute it later")
        print(f"   - Apply corrections if needed")
    else:
        print_result(result)

except Exception as e:
    print(f"\n⚠️  {e}")

### Resume Session and Execute

Now let's resume the saved session and execute the query (or apply corrections).

In [None]:
if 'saved_session_id' in locals() and saved_session_id:
    print(f"\n🔄 Resuming session: {saved_session_id[:16]}...")
    print("\nOption 1: Execute without changes")
    print(f"  result = agent.query_with_correction(")
    print(f"      session_id='{saved_session_id[:16]}...',")
    print(f"      correction='',  # empty for no changes")
    print(f"      execute=True")
    print(f"  )")

    print("\nOption 2: Apply correction first")
    print(f"  result = agent.query_with_correction(")
    print(f"      session_id='{saved_session_id[:16]}...',")
    print(f"      correction='Include only active customers',")
    print(f"      execute=True")
    print(f"  )")

    # Example: Execute without changes
    try:
        print("\n🚀 Executing the saved query...")
        execution_result = agent.query_with_correction(
            session_id=saved_session_id,
            correction="",  # No correction
            execute=False  # Set to True if you want to run on BigQuery
        )
        print("\n✅ Session resumed successfully!")
        print(f"   Same SQL was retrieved from session")
    except Exception as e:
        print(f"❌ Error resuming session: {e}")
else:
    print("\n💡 No session was saved in the previous cell.")

---

# DEMO 5: Comparison - Manual vs Automated

## Manual Workflow (Before Agent)

Using the building blocks manually requires ~15 lines of code:

```python
# Step 1: Load schema (manual)
schema = schema_loader.load_from_excel()

# Step 2: Identify tables (manual inspection)
understanding = query_understanding.analyze(query)
tables = understanding["tables"]  # You need to review this

# Step 3: Infer joins (manual selection)
if len(tables) >= 2:
    joins = join_inference.infer_joins(tables[0], tables[1])
    if len(joins) > 1:
        # Manual choice needed!
        selected_join = joins[0]

# Step 4: Generate SQL (manual)
sql = sql_generator.generate(query, tables, [selected_join])

# Step 5: Validate (manual)
validation = bigquery_client.validate_query(sql)

# Step 6: Execute (manual)
if validation["success"]:
    result = bigquery_client.execute_query(sql)
```

## Automated Workflow (With Agent)

The agent handles everything in 2 lines:

```python
agent = Text2SQLAgent()
result = agent.query("Your query here")
```

### Benefits:

- ✅ **87% less code**
- ✅ **No manual decisions** needed
- ✅ **Automatic error handling**
- ✅ **Built-in recovery** from failures
- ✅ **Session persistence** for interrupted workflows
- ✅ **Handles ambiguity** automatically

In [None]:
print_section("DEMO 5: Automated vs Manual Approach")

print("\n🔄 BEFORE (Manual - from Jupyter notebook):")
print("─" * 70)
print("""
# Step 1: Load schema (manual)
schema = schema_loader.load_from_excel()

# Step 2: Identify tables (manual)
tables = ["Customers", "Orders"]  # You decide

# Step 3: Infer joins (manual)
joins = join_inference.infer_joins(tables[0], tables[1])

# Step 4: Generate SQL (manual)
sql = generate_sql(...)

# Step 5: Execute (manual)
result = bigquery_client.execute(sql)
""")

print("\n✨ AFTER (Automated - with Agent):")
print("─" * 70)
print("""
# That's it - ONE line!
agent = Text2SQLAgent()
result = agent.query("Show me top customers by orders")

# Everything happens automatically:
# ✓ Schema loading
# ✓ Table identification via LLM
# ✓ Join inference
# ✓ SQL generation via LLM
# ✓ Validation & execution
""")

print("\n💡 The agent handles:")
print("   ✓ Multi-table queries automatically")
print("   ✓ Complex joins without manual specification")
print("   ✓ Ambiguity detection and resolution")
print("   ✓ Error recovery with retries")
print("   ✓ Session persistence")
print("   ✓ Correction application")

---

# Summary

## Key Takeaways

1. **Agent handles EVERYTHING automatically** - No manual table identification or join inference
2. **Just provide natural language → get SQL + results** - Simple 2-line usage
3. **Built-in error handling and recovery** - Automatic retries with exponential backoff
4. **Session persistence for interrupted workflows** - Resume from any point
5. **Automatic ambiguity detection and correction** - Pauses and asks for clarification when needed

## Quick Reference

### Basic Usage

```python
from src import Text2SQLAgent

agent = Text2SQLAgent()
result = agent.query("Your natural language query here")

print(result["sql"])      # Generated SQL
print(result["results"])  # Query results
```

### Query Without Execution

```python
result = agent.query("Show me sales data", execute=False)
# Review SQL before running
```

### With Session Return

```python
result = agent.query("Complex query", execute=False, return_session=True)
session_id = result["session"]["session_id"]
# Resume or apply corrections later
```

### Apply Corrections

```python
result = agent.query_with_correction(
    session_id="abc-123",
    correction="Use customer_id to join tables",
    execute=True
)
```

## Next Steps

- Read detailed documentation: `AGENT_GUIDE.md`
- Explore the API: Check `src/agent/orchestrator.py`
- Customize prompts: See `src/llm/prompts.py`
- Try with your own data: Update `.env` with your configuration