> **This notebook uses a langchain `create_sql_agent` to query a SQL database.**
> - **Thought:** Use this agent for various purposes: querying data, generating reports, authoring documents, and maybe updating data
> - **Observations:**
>   - LangChain `SQLDatabase` can only work with the default schema, until [this issue](https://github.com/langchain-ai/langchain/issues/3036) is resolved.
>   - The agent queries the available tables and its schema first.
>   - It will generate sql scripts to try, and keep retrying if the sql script fails -- until it reaches the model token limit.
>   - From trials, it requires a lot of trials to get a query to work. And it can fail due to complex joins. 
> - Sample Data: [Northwind Traders](https://www.dofactory.com/sql/sample-database), bacause AdventureWorks uses a `SalesLT` schema

## Environment Setup

In [2]:
import os
from dotenv import load_dotenv

# Set environment variables in the .env file.
load_dotenv()

COMPLETION_MODEL = os.environ["OPENAI_COMPLETION_MODEL"]
COMPLETION_DEPLOYMENT = os.environ["OPENAI_COMPLETION_DEPLOYMENT"]
CHAT_MODEL = os.environ["OPENAI_CHAT_MODEL"]
CHAT_DEPLOYMENT = os.environ["OPENAI_CHAT_DEPLOYMENT"]
OPENAI_API_VERSION = os.environ["OPENAI_API_VERSION"]

SQL_DB_USER = os.environ["SQL_DB_USER"]
SQL_DB_PASSWORD = os.environ["SQL_DB_PASSWORD"]
SQL_DB_SERVER_NAME = os.environ["SQL_DB_SERVER_NAME"]
SQL_DB_NAME = os.environ["SQL_DB_NAME"]
SQL_CONNECTIONSTRING_FORMAT = os.environ["SQL_CONNECTIONSTRING_FORMAT"]

In [3]:
from langchain.llms import AzureOpenAI
from langchain.chat_models import AzureChatOpenAI

llm = AzureOpenAI(
    model_name=COMPLETION_MODEL,
    deployment_name=COMPLETION_DEPLOYMENT,
    #temperature=0.3,
    verbose=True
)

chat = AzureChatOpenAI(
    deployment_name=CHAT_DEPLOYMENT,
    openai_api_version=OPENAI_API_VERSION,
    #temperature=0.3,
    verbose=True
)

## Code

In [4]:
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.sql_database import SQLDatabase

connection_string = SQL_CONNECTIONSTRING_FORMAT.format(
    database_user=SQL_DB_USER,
    database_password=SQL_DB_PASSWORD,
    database_server=SQL_DB_SERVER_NAME,
    database_db=SQL_DB_NAME)

db = SQLDatabase.from_uri(connection_string)
toolkit = SQLDatabaseToolkit(db=db, llm=chat)

In [5]:
from langchain.agents import create_sql_agent

sqlagent = create_sql_agent(
    llm=chat,
    toolkit=toolkit,
    verbose=True
)

In [6]:
metaprompt = """You are an AI assistant that helps me answer questions about data in a database.
The database contains the following tables, columns and datatypes:
Customer
- Id (int)
- FirstName (string)
- LastName (string)
- City (string)
- Country (string)
- Phone (string)

Order
- Id (int)
- OrderDate (datetime)
- OrderNumber (int)
- CustomerId (int), joins to Customer.Id
- TotalAmount (decimal)

OrderItem
- Id (int)
- OrderId (int), joins to Order.Id
- ProductId (int),
- UnitPlace (decimal),
- Quantity (int)

Supplier
- Id (int),
- CompanyName (string),
- ContactName (string),
- ContactTitle (string),
- City (string),
- Country (string),
- Phone (string),
- Fax (string)

Product
- Id (int),
- ProductName (string),
- SupplierId (int), joins to Supplier.Id,
- UnitPrice (decimal),
- Package (string),
- IsDiscontinued (bit)

Question: """

In [7]:
sqlagent.run(metaprompt + "list top 5 customers by highest order amount")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: ""[0m
Observation: [38;5;200m[1;3mCustomer, Order, OrderItem, Product, Supplier[0m
Thought:[32;1m[1;3mI can query the Customer and Order tables to find the customers with the highest order amounts.
Action: sql_db_schema
Action Input: "Customer, Order"[0m
Observation: [33;1m[1;3m
CREATE TABLE [Customer] (
	[Id] INTEGER NOT NULL IDENTITY(1,1), 
	[FirstName] NVARCHAR(40) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, 
	[LastName] NVARCHAR(40) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, 
	[City] NVARCHAR(40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, 
	[Country] NVARCHAR(40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, 
	[Phone] NVARCHAR(20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, 
	CONSTRAINT [PK_CUSTOMER] PRIMARY KEY ([Id])
)

/*
3 rows from Customer table:
Id	FirstName	LastName	City	Country	Phone
1	Maria	Anders	Berlin	Germany	030-0074321
2	Ana	Trujillo	México D.F.	Mexico	(5

'Horst Kloss, Jose Pavarotti, Mario Pontes, Lúcia Carvalho, Jean Fresnière'

In [9]:
sqlagent.run(metaprompt + "Get the Product with the top orders by Amount, and then group by Country.")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: ""[0m
Observation: [38;5;200m[1;3mCustomer, Order, OrderItem, Product, Supplier[0m
Thought:[32;1m[1;3mI can query the tables Customer, Order, OrderItem, Product, and Supplier. I should query the schema of the Order table and the Product table to see what columns I can use to get the top orders by Amount and group by Country.
Action: sql_db_schema
Action Input: "Order, Product"[0m
Observation: [33;1m[1;3m
CREATE TABLE [Order] (
	[Id] INTEGER NOT NULL IDENTITY(1,1), 
	[OrderDate] DATETIME NOT NULL DEFAULT (getdate()), 
	[OrderNumber] NVARCHAR(10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, 
	[CustomerId] INTEGER NOT NULL, 
	[TotalAmount] DECIMAL(12, 2) NULL DEFAULT ((0)), 
	CONSTRAINT [PK_ORDER] PRIMARY KEY ([Id]), 
	CONSTRAINT [FK_ORDER_REFERENCE_CUSTOMER] FOREIGN KEY([CustomerId]) REFERENCES [Customer] ([Id])
)

/*
3 rows from Order table:
Id	OrderDate	OrderNumber	CustomerId	To

"The Product with the top orders by Amount, grouped by Country are as follows:\n1. Product: Original Frankfurter grüne Soße, Country: USA, Total Amount: $6155.90\n2. Product: Escargots de Bourgogne, Country: USA, Total Amount: $4157.00\n3. Product: Sir Rodney's Marmalade, Country: Germany, Total Amount: $4031.00\n4. Product: Chef Anton's Gumbo Mix, Country: Belgium, Total Amount: $3730.00\n5. Product: Manjimup Dried Apples, Country: Ireland, Total Amount: $3127.00\n6. Product: Chartreuse verte, Country: Germany, Total Amount: $3016.00\n7. Product: Gudbrandsdalsost, Country: USA, Total Amount: $2835.00\n8. Product: Pâté chinois, Country: Belgium, Total Amount: $2708.80\n9. Product: Northwoods Cranberry Sauce, Country: Switzerland, Total Amount: $2490.50\n10. Product: Pavlova, Country: Austria, Total Amount: $2464.80"