# 01: Basic Text-to-SQL with LlamaIndex

Welcome to the first notebook in the Text-to-SQL series! In this notebook, you'll learn the fundamentals of converting natural language questions into SQL queries using LlamaIndex.

## Learning Objectives

By the end of this notebook, you will be able to:
- Understand text-to-SQL fundamentals
- Use `NLSQLTableQueryEngine` for structured queries
- Work with SQLite databases
- Inspect and understand generated SQL
- Handle basic error cases
- Apply security best practices

## What is Text-to-SQL?

**Text-to-SQL** is the process of converting natural language questions into SQL queries that can be executed against a database. This enables non-technical users to query databases without knowing SQL.

**Example:**
- Natural Language: "Who are the top 3 highest paid employees?"
- SQL Query: `SELECT name, salary FROM employees ORDER BY salary DESC LIMIT 3`

---

**Security Warning:** Any Text-to-SQL application should be aware that executing arbitrary SQL queries can be a security risk. It is recommended to use read-only databases, restricted roles, or sandboxing.

## Section 1: Setup and Installation

### 1.1 Import Required Libraries

In [1]:
# Standard library imports
import os
from pathlib import Path

# Third-party imports
from dotenv import load_dotenv
from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer, Float, insert
import pandas as pd

# LlamaIndex imports
from llama_index.core import SQLDatabase
from llama_index.core.query_engine import NLSQLTableQueryEngine
from llama_index.llms.openai import OpenAI

print("âœ“ All libraries imported successfully")

âœ“ All libraries imported successfully


### 1.2 Load Environment Variables

In [2]:
# Load environment variables from .env file
load_dotenv()

# Verify API key is loaded
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
if not OPENAI_API_KEY:
    raise ValueError(
        "OPENAI_API_KEY not found in environment variables. "
        "Please create a .env file with your API key. "
        "See .env.example for template."
    )

print("âœ“ Environment loaded successfully")
print(f"âœ“ API key found (length: {len(OPENAI_API_KEY)} characters)")

âœ“ Environment loaded successfully
âœ“ API key found (length: 164 characters)


### 1.3 Initialize OpenAI LLM

In [3]:
# Initialize OpenAI LLM
# Using gpt-4o-mini for cost-effectiveness
# temperature=0.1 for more deterministic SQL generation
llm = OpenAI(
    temperature=0.1, 
    model="gpt-4o-mini",
    api_key=OPENAI_API_KEY
)

print("âœ“ OpenAI LLM initialized")
print(f"  Model: {llm.model}")
print(f"  Temperature: {llm.temperature}")

âœ“ OpenAI LLM initialized
  Model: gpt-4o-mini
  Temperature: 0.1


## Section 2: Create Sample Database

We'll create a simple company database with two tables:
- `employees`: Information about employees
- `departments`: Information about departments

### 2.1 Create SQLite Database and Schema

In [4]:
# Create SQLite database
DB_PATH = "basic_company.db"
engine = create_engine(f"sqlite:///{DB_PATH}")
metadata_obj = MetaData()

# Define employees table
employees = Table(
    'employees',
    metadata_obj,
    Column('id', Integer, primary_key=True),
    Column('name', String(50), nullable=False),
    Column('department_id', Integer, nullable=False),
    Column('salary', Float, nullable=False),
    Column('hire_date', String(10), nullable=False),
    Column('position', String(50), nullable=False)
)

# Define departments table
departments = Table(
    'departments',
    metadata_obj,
    Column('id', Integer, primary_key=True),
    Column('name', String(50), nullable=False),
    Column('budget', Float, nullable=False),
    Column('location', String(50), nullable=False)
)

# Create tables
metadata_obj.create_all(engine)

print(f"âœ“ Database created: {DB_PATH}")
print("âœ“ Tables created: employees, departments")

âœ“ Database created: basic_company.db
âœ“ Tables created: employees, departments


### 2.2 Insert Sample Data

In [5]:
# Sample data for departments
department_data = [
    {'id': 1, 'name': 'Engineering', 'budget': 500000, 'location': 'San Francisco'},
    {'id': 2, 'name': 'Sales', 'budget': 300000, 'location': 'New York'},
    {'id': 3, 'name': 'Marketing', 'budget': 200000, 'location': 'Los Angeles'},
    {'id': 4, 'name': 'Human Resources', 'budget': 150000, 'location': 'Chicago'}
]

# Sample data for employees
employee_data = [
    {'id': 1, 'name': 'Alice Johnson', 'department_id': 1, 'salary': 95000, 'hire_date': '2020-01-15', 'position': 'Senior Engineer'},
    {'id': 2, 'name': 'Bob Smith', 'department_id': 1, 'salary': 87000, 'hire_date': '2021-03-22', 'position': 'Software Engineer'},
    {'id': 3, 'name': 'Carol White', 'department_id': 2, 'salary': 75000, 'hire_date': '2019-07-10', 'position': 'Sales Manager'},
    {'id': 4, 'name': 'David Brown', 'department_id': 2, 'salary': 82000, 'hire_date': '2020-11-05', 'position': 'Senior Sales Rep'},
    {'id': 5, 'name': 'Eve Davis', 'department_id': 3, 'salary': 68000, 'hire_date': '2022-02-14', 'position': 'Marketing Specialist'},
    {'id': 6, 'name': 'Frank Miller', 'department_id': 3, 'salary': 78000, 'hire_date': '2021-05-20', 'position': 'Marketing Manager'},
    {'id': 7, 'name': 'Grace Lee', 'department_id': 1, 'salary': 105000, 'hire_date': '2018-09-01', 'position': 'Lead Engineer'},
    {'id': 8, 'name': 'Henry Wilson', 'department_id': 4, 'salary': 72000, 'hire_date': '2021-08-15', 'position': 'HR Manager'},
    {'id': 9, 'name': 'Iris Chen', 'department_id': 2, 'salary': 65000, 'hire_date': '2022-06-01', 'position': 'Sales Representative'},
    {'id': 10, 'name': 'Jack Thompson', 'department_id': 1, 'salary': 92000, 'hire_date': '2020-10-12', 'position': 'Senior Engineer'}
]

# Insert data
with engine.begin() as conn:
    # Insert departments
    for dept in department_data:
        stmt = insert(departments).values(**dept)
        conn.execute(stmt)
    
    # Insert employees
    for emp in employee_data:
        stmt = insert(employees).values(**emp)
        conn.execute(stmt)

print(f"âœ“ Inserted {len(department_data)} departments")
print(f"âœ“ Inserted {len(employee_data)} employees")

âœ“ Inserted 4 departments
âœ“ Inserted 10 employees


### 2.3 Verify Data

In [6]:
# Query to verify data
with engine.connect() as conn:
    print("Departments:")
    depts_df = pd.read_sql("SELECT * FROM departments", conn)
    print(depts_df)
    
    print("\nEmployees (first 5):")
    emps_df = pd.read_sql("SELECT * FROM employees LIMIT 5", conn)
    print(emps_df)

Departments:
   id             name    budget       location
0   1      Engineering  500000.0  San Francisco
1   2            Sales  300000.0       New York
2   3        Marketing  200000.0    Los Angeles
3   4  Human Resources  150000.0        Chicago

Employees (first 5):
   id           name  department_id   salary   hire_date              position
0   1  Alice Johnson              1  95000.0  2020-01-15       Senior Engineer
1   2      Bob Smith              1  87000.0  2021-03-22     Software Engineer
2   3    Carol White              2  75000.0  2019-07-10         Sales Manager
3   4    David Brown              2  82000.0  2020-11-05      Senior Sales Rep
4   5      Eve Davis              3  68000.0  2022-02-14  Marketing Specialist


## Section 3: Basic NLSQLTableQueryEngine

Now we'll use LlamaIndex's `NLSQLTableQueryEngine` to query our database using natural language.

### 3.1 Create SQL Database Object

In [7]:
# Create SQLDatabase object
# This wraps the SQLAlchemy engine and provides LlamaIndex integration
sql_database = SQLDatabase(
    engine, 
    include_tables=["employees", "departments"]
)

print("âœ“ SQL Database object created")
print(f"  Tables: {sql_database.get_usable_table_names()}")

# Inspect table schema
print("\nEmployees table schema:")
print(sql_database.get_single_table_info("employees"))

print("\nDepartments table schema:")
print(sql_database.get_single_table_info("departments"))

âœ“ SQL Database object created
  Tables: ['departments', 'employees']

Employees table schema:
Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .

Departments table schema:
Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .


### 3.2 Initialize NLSQLTableQueryEngine

In [8]:
# Create query engine
query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["employees", "departments"],
    llm=llm,
    verbose=True  # Set to True to see generated SQL
)

print("âœ“ NLSQLTableQueryEngine initialized")
print("  Ready to process natural language queries!")

âœ“ NLSQLTableQueryEngine initialized
  Ready to process natural language queries!


### 3.3 Run Simple Queries

Let's test our query engine with various natural language questions.

In [9]:
# Query 1: Basic count
query = "How many employees are there in total?"
print(f"Query: {query}")
print("=" * 60)

response = query_engine.query(query)
print(f"\nAnswer: {response}")
print("\n" + "=" * 60)

Query: How many employees are there in total?
> Table Info: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .
> Table Info: Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Table desc str: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .

Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Predicted SQL query: SELECT COUNT(*) AS total_employees FROM employees;

Answer: There are a total of 10 employees.



In [11]:
# Query 2: Aggregation by group
query = "How many employees are in each department?"
print(f"Query: {query}")
print("=" * 60)

response = query_engine.query(query)
print(f"\nAnswer: {response}")
print("\n" + "=" * 60)

Query: How many employees are in each department?
> Table Info: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .
> Table Info: Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Table desc str: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .

Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Predicted SQL query: SELECT departments.name, COUNT(employees.id) AS employee_count FROM departments LEFT JOIN employees ON departments.id = employees.department_id GROUP BY departments.name ORDER BY employee_count DESC;

Answer: The number of employees in each department is as follows:

- Engineering: 4 employees
- Sales: 3 employees
- Marketing: 2 employees
- Human

In [12]:
# Query 3: Average calculation
query = "What is the average salary by department?"
print(f"Query: {query}")
print("=" * 60)

response = query_engine.query(query)
print(f"\nAnswer: {response}")
print("\n" + "=" * 60)

Query: What is the average salary by department?
> Table Info: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .
> Table Info: Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Table desc str: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .

Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Predicted SQL query: SELECT d.name AS department_name, AVG(e.salary) AS average_salary FROM employees e JOIN departments d ON e.department_id = d.id GROUP BY d.name ORDER BY average_salary DESC;

Answer: The average salary by department is as follows:

- Engineering: $94,750
- Sales: $74,000
- Marketing: $73,000
- Human Resources: $72,000



In [13]:
# Query 4: Top N with ordering
query = "Who are the top 3 highest paid employees?"
print(f"Query: {query}")
print("=" * 60)

response = query_engine.query(query)
print(f"\nAnswer: {response}")
print("\n" + "=" * 60)

Query: Who are the top 3 highest paid employees?
> Table Info: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .
> Table Info: Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Table desc str: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .

Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Predicted SQL query: SELECT employees.name, employees.salary FROM employees ORDER BY employees.salary DESC LIMIT 3

Answer: The top three highest paid employees are Grace Lee with a salary of $105,000, Alice Johnson earning $95,000, and Jack Thompson with a salary of $92,000.



In [14]:
# Query 5: JOIN operation
query = "Which department has the highest budget?"
print(f"Query: {query}")
print("=" * 60)

response = query_engine.query(query)
print(f"\nAnswer: {response}")
print("\n" + "=" * 60)

Query: Which department has the highest budget?
> Table Info: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .
> Table Info: Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Table desc str: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .

Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Predicted SQL query: SELECT departments.name, departments.budget FROM departments ORDER BY departments.budget DESC LIMIT 1

Answer: The department with the highest budget is Engineering, with a budget of $500,000.



In [15]:
# Query 6: Complex query with JOIN
query = "Show me employees in the Engineering department with their salaries"
print(f"Query: {query}")
print("=" * 60)

response = query_engine.query(query)
print(f"\nAnswer: {response}")
print("\n" + "=" * 60)

Query: Show me employees in the Engineering department with their salaries
> Table Info: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .
> Table Info: Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Table desc str: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .

Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Predicted SQL query: SELECT employees.name, employees.salary FROM employees JOIN departments ON employees.department_id = departments.id WHERE departments.name = 'Engineering'

Answer: Here are the employees in the Engineering department along with their salaries:

- Alice Johnson: $95,000
- Bob Smith: $87,000
- Grace Lee: $105,000
- Jack Tho

## Section 4: Understanding the Process

### 4.1 Inspect Generated SQL

Let's examine the SQL that was generated for our queries.

In [16]:
# Example of extracting metadata
query = "What is the total salary cost for the Engineering department?"
print(f"Query: {query}")
print("=" * 60)

response = query_engine.query(query)
print(f"\nAnswer: {response}")

# Access metadata if available
if hasattr(response, 'metadata'):
    print("\nMetadata:")
    for key, value in response.metadata.items():
        print(f"  {key}: {value}")

Query: What is the total salary cost for the Engineering department?
> Table Info: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .
> Table Info: Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Table desc str: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .

Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Predicted SQL query: SELECT SUM(employees.salary) AS total_salary FROM employees JOIN departments ON employees.department_id = departments.id WHERE departments.name = 'Engineering';

Answer: The total salary cost for the Engineering department is $379,000.

Metadata:
  e1a7863a-1b12-4d4b-b5d2-202597d3beff: {'sql_query': "SELECT SUM(employees.salary

### 4.2 Error Handling

Let's see what happens with ambiguous or problematic queries.

In [17]:
# Example 1: Ambiguous query
try:
    query = "Show me everyone"
    print(f"Query: {query}")
    response = query_engine.query(query)
    print(f"Answer: {response}")
except Exception as e:
    print(f"Error: {e}")

Query: Show me everyone
> Table Info: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .
> Table Info: Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Table desc str: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .

Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Predicted SQL query: SELECT employees.name, employees.position FROM employees ORDER BY employees.name;
Answer: Here is the list of all employees along with their positions:

1. Alice Johnson - Senior Engineer
2. Bob Smith - Software Engineer
3. Carol White - Sales Manager
4. David Brown - Senior Sales Rep
5. Eve Davis - Marketing Specialist
6. Frank Miller - Marketing Manager
7. Grace Lee - L

In [18]:
# Example 2: Non-existent column
try:
    query = "What is the age of each employee?"
    print(f"Query: {query}")
    response = query_engine.query(query)
    print(f"Answer: {response}")
except Exception as e:
    print(f"Error: {e}")

Query: What is the age of each employee?
> Table Info: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .
> Table Info: Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Table desc str: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .

Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Predicted SQL query: SELECT employees.name, employees.hire_date FROM employees
Answer: To determine the age of each employee, we can calculate the difference between the current date and their hire date. Here are the employees along with their hire dates:

1. Alice Johnson - Hired on January 15, 2020
2. Bob Smith - Hired on March 22, 2021
3. Carol White - Hired on July 10, 20

### 4.3 Query Refinement

For better results, make your queries more specific.

In [19]:
# Less specific query
print("Less specific query:")
query1 = "Show me some employees"
response1 = query_engine.query(query1)
print(f"Q: {query1}")
print(f"A: {response1}")

print("\n" + "=" * 60 + "\n")

# More specific query
print("More specific query:")
query2 = "List all employees hired after 2020 with their names, positions, and salaries"
response2 = query_engine.query(query2)
print(f"Q: {query2}")
print(f"A: {response2}")

Less specific query:
> Table Info: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .
> Table Info: Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Table desc str: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .

Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Predicted SQL query: SELECT employees.name, employees.position FROM employees LIMIT 5;
Q: Show me some employees
A: Here are some employees from our records:

1. **Alice Johnson** - Senior Engineer
2. **Bob Smith** - Software Engineer
3. **Carol White** - Sales Manager
4. **David Brown** - Senior Sales Rep
5. **Eve Davis** - Marketing Specialist


More specific query:
> Table Info: Table 'employ

## Section 5: Best Practices

### 5.1 Table Specification

Always specify which tables to use to avoid sending unnecessary schema information to the LLM.

In [20]:
# Good: Specify tables explicitly
sql_database_specific = SQLDatabase(
    engine, 
    include_tables=["employees", "departments"]
)

# This focuses the LLM on relevant tables only
print("Tables in context:")
print(sql_database_specific.get_usable_table_names())

Tables in context:
['departments', 'employees']


### 5.2 Security Considerations

Important security practices:

1. **Use Read-Only Connections**: In production, use database users with SELECT-only privileges
2. **Validate Queries**: Check generated SQL before execution
3. **Set Timeouts**: Prevent long-running queries
4. **Sanitize Inputs**: Be cautious with user-provided data
5. **Monitor Usage**: Log all queries for audit purposes

In [21]:
# Example: Simple SQL validation
def is_query_safe(sql: str) -> bool:
    """
    Basic SQL safety check.
    In production, use more sophisticated validation.
    """
    dangerous_keywords = ['DROP', 'DELETE', 'UPDATE', 'INSERT', 'ALTER', 'TRUNCATE', 'EXEC', 'EXECUTE']
    sql_upper = sql.upper()
    
    for keyword in dangerous_keywords:
        if keyword in sql_upper:
            return False
    
    return True

# Test
safe_sql = "SELECT * FROM employees WHERE salary > 80000"
unsafe_sql = "DROP TABLE employees"

print(f"Is '{safe_sql}' safe? {is_query_safe(safe_sql)}")
print(f"Is '{unsafe_sql}' safe? {is_query_safe(unsafe_sql)}")

Is 'SELECT * FROM employees WHERE salary > 80000' safe? True
Is 'DROP TABLE employees' safe? False


### 5.3 Context Management

Provide helpful context to improve query accuracy.

In [22]:
# You can add custom context/instructions
# This is helpful for domain-specific terminology
query = """
In our company, 'IC' means Individual Contributor (non-management positions).
Show me all ICs with salaries above 90000.
"""

print(f"Query: {query.strip()}")
print("=" * 60)

response = query_engine.query(query)
print(f"\nAnswer: {response}")

Query: In our company, 'IC' means Individual Contributor (non-management positions).
Show me all ICs with salaries above 90000.
> Table Info: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .
> Table Info: Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Table desc str: Table 'employees' has columns: id (INTEGER), name (VARCHAR(50)), department_id (INTEGER), salary (FLOAT), hire_date (VARCHAR(10)), position (VARCHAR(50)), .

Table 'departments' has columns: id (INTEGER), name (VARCHAR(50)), budget (FLOAT), location (VARCHAR(50)), .
> Predicted SQL query: SELECT employees.name, employees.salary FROM employees WHERE employees.salary > 90000 AND employees.position NOT LIKE '%Manager%';

Answer: The following Individual Contributors (ICs) have salaries above $90,000:

1. Alice Johnson - $95,000
2. Grace Lee - $105,000
3. Ja

## Section 6: Exercises

Try these exercises to practice what you've learned:

### Exercise 1: Basic Queries
Write natural language queries to answer these questions:
1. How many departments are there?
2. What is the highest salary in the company?
3. Which department is located in San Francisco?

### Exercise 2: Aggregations
1. What is the total budget across all departments?
2. What is the average salary for each position?
3. How many employees were hired in 2021?

### Exercise 3: Complex Queries
1. Which department has the highest average employee salary?
2. List all employees who earn more than the average salary
3. Show departments with more than 2 employees

### Exercise 4: Query Optimization
1. Try rephrasing queries to get better results
2. Experiment with different levels of specificity
3. Test edge cases and error handling

In [None]:
# Your code here for Exercise 1
# Example:
# query = "How many departments are there?"
# response = query_engine.query(query)
# print(response)

In [None]:
# Your code here for Exercise 2

In [None]:
# Your code here for Exercise 3

In [None]:
# Your code here for Exercise 4

## Summary

In this notebook, you learned:

âœ“ How to set up a basic text-to-SQL system with LlamaIndex
âœ“ Using `NLSQLTableQueryEngine` for natural language queries
âœ“ Working with SQLite databases
âœ“ Inspecting generated SQL queries
âœ“ Error handling and query refinement
âœ“ Security best practices

## Next Steps

Continue to **Notebook 02: Intermediate Text-to-SQL with DuckDB** to learn:
- DuckDB integration
- Dynamic table retrieval with ObjectIndex
- Scaling to multiple tables
- Advanced query patterns

## Resources

- [LlamaIndex Documentation](https://docs.llamaindex.ai/)
- [Text-to-SQL Guide](https://developers.llamaindex.ai/python/examples/index_structs/struct_indices/sqlindexdemo/)
- [SQLAlchemy Documentation](https://docs.sqlalchemy.org/)

---

**Great job!** You've completed the basic text-to-SQL notebook. ðŸŽ‰