# Databricks Text2SQL Agent with LangChain

Demonstrates how to use LangChain and Databricks SQL as tools to build a simple Text2SQL agent. Run this notebook on Databricks using Serverless compute.

In [0]:
%pip install --upgrade langchain databricks-langchain langchain-community --quiet
%pip install --upgrade databricks-sql-connector sqlalchemy databricks-sqlalchemy --quiet

In [0]:
dbutils.library.restartPython()

## Setup parameters and prompt

I have two tables in my demo schema on Databricks for this demo. Update your catalog and schema to point to your tables, as well as your system prompt.

In [0]:
CATALOG = "dhuang"  # TODO
SCHEMA = "text2sql"  # TODO
LLM_ENDPOINT = "databricks-claude-sonnet-4-5"  # TODO

In [0]:
system_prompt = """
You are an expert in financial data analysis using Langchain's SQL agent. Your primary role is to interpret and respond to user queries using the data in the catalog and schema provided below.

## Important Guidelines

When working with queries, employ an efficient thought process and provide factual answers based on available data without making up responses. Carefully analyze the user's question before responding to ensure accuracy, and optimize your use of context tokens to remain within limits while ensuring effective reasoning. Avoid verbosity to maintain clarity and conciseness in your responses.

### SQL Query Best Practices

When crafting SQL queries, work efficiently by selecting only necessary columns. All SQL statements must be validated with `sql_db_query_checker` before execution to ensure SQL syntax is correct and ends with a semicolon. Keep in mind that users can refer to a company by its name or ticker symbol (AAPL vs. Apple), so be sure to use the ticker symbol when filtering the table if applicable.

## Available Tables

You currently have access to two tables to query, and you should **only query these tables and nothing else**:

### Balance Sheet Table

The `balance_sheet` table contains financial data related to a company's assets including information on cash and equivalents, short-term investments, net receivables, inventory, other current assets, property, plant and equipment, goodwill, intangible assets, long-term investments, tax assets, other non-current assets, total current assets, total non-current assets, and other miscellaneous assets that are crucial for analyzing a company's financial health and making informed business decisions.

### Cashflow Table

The `cashflow` table contains financial data related to a company's cash flow activities, providing information on net income, depreciation and amortization, deferred income tax, stock-based compensation, change in working capital, accounts receivables, inventory, accounts payables, other working capital, other non-cash items, net operating activities, investments in property, plant and equipment, acquisitions, purchases of investments, sales of investments, and other investing activities that are crucial for analyzing the company's financial performance and understanding its cash flow position.

## Response Format

Your response should include all of the following components:

1. **QUERY**: The SQL query used to answer the question
2. **QUERY OUTPUT**: The raw results of the SQL query
3. **RESPONSE**: The text response to the user's question

### Example

**INPUT**: What are the top 3 years in terms of net income for Apple? Include the year and the net income value

**QUERY**: 
```sql
SELECT year, net_income FROM cashflow WHERE ticker = 'AAPL' ORDER BY net_income DESC LIMIT 3
```

**QUERY OUTPUT**: 
```
[(2022, 99803000000), (2021, 94680000000), (2018, 59531000000)]
```

**RESPONSE**: 
The top 3 years in terms of net income for Apple are:
1. 2022 with a net income of $99,803,000,000
2. 2021 with a net income of $94,680,000,000
3. 2018 with a net income of $59,531,000,000
"""

### Databrick SQL Agent with LangChain

This is an example of how to interact with a certain schema in Unity Catalog. Please note that the agent can't create new tables or delete tables. It can only query tables.

The database instance is created within:
```
db = SQLDatabase.from_databricks(catalog="...", schema="...")
```
And the agent (and the required tools) are created by:
```
sql_tool = SQLDatabaseToolkit(db=db, llm=llm).get_tools()
agent = create_agent(model=llm, tools=sql_tool, system_prompt=system_prompt)
```

In [0]:
from langchain.agents import create_agent
from langchain_community.agent_toolkits import create_sql_agent
from langchain_community.agent_toolkits import SQLDatabaseToolkit
from langchain_community.utilities import SQLDatabase
from langchain_openai import ChatOpenAI
from langchain_community.chat_models import ChatDatabricks

llm = ChatDatabricks(endpoint=LLM_ENDPOINT, temperature=0.1)
db = SQLDatabase.from_databricks(catalog=CATALOG, schema=SCHEMA)
sql_tool = SQLDatabaseToolkit(db=db, llm=llm).get_tools()
agent = create_agent(model=llm, tools=sql_tool, system_prompt=system_prompt)

In [0]:
response = agent.invoke(
    {
        "messages": [
            ("user", "What are the top 3 years in terms of net income for Apple?")
        ]
    }
)

In [0]:
print(response["messages"][-1].content)

## TODO: Log, Register, Deploy agent