# Explainable NL Query Database Agents - Interactive Notebook



In [1]:

import sqlite3
import json
import pandas as pd
import os

In [None]:
# LangChain imports for AI functionality
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

In [2]:

os.environ["OPENAI_API_KEY"] = "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"

In [3]:
from openai import OpenAI
client = OpenAI()


In [7]:
response = client.responses.create (
    model = "gpt-5-mini",
    input = "how much gold would it take to coat the statue of liberty in a 1mm layer?",
    reasoning = {
        "effort": "minimal"
    }
)

print(response.output[1].content[0].text)

Good question ‚Äî we can estimate it. Steps: get the statue‚Äôs surface area, convert a 1 mm thick layer to volume, then multiply by gold‚Äôs density to get mass (and optionally convert to value).

1) Surface area
- The commonly cited height of the Statue of Liberty (from base to torch) is about 93 m; the statue itself (from heel to top of head) is about 46 m. Published estimates for the statue‚Äôs external surface area vary; a reasonable value for the outer copper skin‚Äôs surface area is about 1,000‚Äì1,500 m¬≤. Many sources use about 1,200 m¬≤ as a practical estimate. I‚Äôll use 1,200 m¬≤ for the calculation; you can scale proportionally if you prefer a different area.

2) Volume of a 1 mm layer
- Thickness t = 1 mm = 0.001 m.
- Volume V = area √ó thickness = 1,200 m¬≤ √ó 0.001 m = 1.2 m¬≥.

3) Mass of gold
- Density of gold œÅ ‚âà 19,320 kg/m¬≥ (sometimes rounded to 19,300 kg/m¬≥).
- Mass m = œÅ √ó V = 19,320 kg/m¬≥ √ó 1.2 m¬≥ ‚âà 23,184 kg ‚âà 23.2 metric tons.

4) Optional: appro

## 5. Database Setup (SQLite)

Let's set up a sample database to demonstrate the NL query functionality.

In [None]:
# Create a sample database for testing
db_path = "sample_database.db"
conn = sqlite3.connect(db_path)
cursor = conn.cursor()

# Create a sample table
cursor.execute('''
CREATE TABLE IF NOT EXISTS employees (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    department TEXT,
    salary REAL,
    hire_date DATE
)
''')

# Insert sample data
sample_data = [
    (1, 'Alice Johnson', 'Engineering', 75000, '2023-01-15'),
    (2, 'Bob Smith', 'Marketing', 65000, '2023-02-20'),
    (3, 'Charlie Brown', 'Engineering', 80000, '2022-11-10'),
    (4, 'Diana Prince', 'HR', 70000, '2023-03-05'),
    (5, 'Eve Wilson', 'Engineering', 85000, '2022-08-12')
]

cursor.executemany('INSERT OR REPLACE INTO employees VALUES (?, ?, ?, ?, ?)', sample_data)
conn.commit()

print("‚úÖ Sample database created with employee data")
print(f"Database location: {db_path}")

In [None]:
# Verify the data was inserted
df = pd.read_sql_query("SELECT * FROM employees", conn)
print("üìä Sample Data:")
print(df)

## 6. Vector Database Setup (FAISS)

Now let's set up FAISS for semantic search capabilities.

In [None]:
# Initialize OpenAI embeddings
embeddings = OpenAIEmbeddings()

# Create sample text documents about our database schema
schema_docs = [
    "employees table contains employee information",
    "employees table has columns: id, name, department, salary, hire_date",
    "department column contains values like Engineering, Marketing, HR",
    "salary column contains numeric salary values",
    "hire_date column contains employment start dates"
]

# Create FAISS vector store
try:
    vectorstore = FAISS.from_texts(schema_docs, embeddings)
    print("‚úÖ FAISS vector store created successfully")
    print(f"Number of documents indexed: {len(schema_docs)}")
except Exception as e:
    print(f"‚ùå Error creating vector store: {e}")

## 7. Interactive Testing

This is where notebooks shine! You can now test different queries and see results immediately.

In [None]:
# Test semantic search
query = "show me information about employee salaries"
try:
    results = vectorstore.similarity_search(query, k=2)
    print(f"üîç Query: '{query}'")
    print("üìù Most relevant schema information:")
    for i, doc in enumerate(results, 1):
        print(f"{i}. {doc.page_content}")
except Exception as e:
    print(f"‚ùå Error in similarity search: {e}")

## 8. Cleanup

Don't forget to close database connections!

In [None]:
# Close database connection
conn.close()
print("‚úÖ Database connection closed")

## Summary: Script vs Notebook Execution

### Python Script (`.py` file):
- ‚úÖ Runs all code at once
- ‚úÖ Good for production/final code
- ‚ùå Hard to debug
- ‚ùå Can't inspect intermediate results
- ‚ùå Must re-run everything to test changes

### Jupyter Notebook (`.ipynb` file):
- ‚úÖ Run cell by cell
- ‚úÖ See intermediate results
- ‚úÖ Easy debugging and experimentation
- ‚úÖ Variables persist between cells
- ‚úÖ Can add documentation with Markdown
- ‚úÖ Perfect for data science and AI development

**Recommendation**: Use notebooks for development and experimentation, then convert to scripts for production deployment!