# Agents
LangChain has a SQL Agent which provides a more flexible way of interacting with SQL Databases than a chain. 

The main advantages of using the SQL Agent are:

1. It can answer questions based on the databases' schema as well as on the databases' content (like describing a specific table).
2. It can recover from errors by running a generated query, catching the traceback and regenerating it correctly.
3. It can query the database as many times as needed to answer the user question.
4. It will save tokens by only retrieving the schema from relevant tables.



In [1]:
import os
from dotenv import load_dotenv
load_dotenv()
os.environ["LANGCHAIN_API_KEY"]=os.environ.get('LANGCHAIN_API_KEY')
os.environ["LANGCHAIN_TRACING_V2"]="true"
os.environ["LANGCHAIN_PROJECT"]="Q&A_over_SQL_agent"

In [2]:
from langchain_community.utilities import SQLDatabase

db = SQLDatabase.from_uri("sqlite:///Chinook.db")
print(db.dialect)
print(db.get_usable_table_names())
db.run("SELECT * FROM Artist LIMIT 10;")

sqlite
['Album', 'Artist', 'Customer', 'Employee', 'Genre', 'Invoice', 'InvoiceLine', 'MediaType', 'Playlist', 'PlaylistTrack', 'Track']


"[(1, 'AC/DC'), (2, 'Accept'), (3, 'Aerosmith'), (4, 'Alanis Morissette'), (5, 'Alice In Chains'), (6, 'Ant√¥nio Carlos Jobim'), (7, 'Apocalyptica'), (8, 'Audioslave'), (9, 'BackBeat'), (10, 'Billy Cobham')]"

In [3]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")


# Initialize the agent 

We'll use the SQLDatabaseToolkit to create a bunch of tools:

* Create and execute queries
* Check query syntax
* Retrieve table descriptions
  
... and more

In [4]:
from langchain_community.agent_toolkits import SQLDatabaseToolkit
toolkit = SQLDatabaseToolkit(db=db, llm=llm)
tools = toolkit.get_tools()
tools

[QuerySQLDataBaseTool(description="Input to this tool is a detailed and correct SQL query, output is a result from the database. If the query is not correct, an error message will be returned. If an error is returned, rewrite the query, check the query, and try again. If you encounter an issue with Unknown column 'xxxx' in 'field list', use sql_db_schema to query the correct table fields.", db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x000001A80269CBB0>),
 InfoSQLDatabaseTool(description='Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables. Be sure that the tables actually exist by calling sql_db_list_tables first! Example Input: table1, table2, table3', db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x000001A80269CBB0>),
 ListSQLDatabaseTool(db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x000001A80269CBB0>),
 QuerySQLCheckerTool(description='Use this tool to 

In [5]:
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage

from langgraph.prebuilt import create_react_agent


# Dealing with high-cardinality columns
In order to filter columns that contain proper nouns such as addresses, song names or artists, 
we first need to double-check the spelling in order to filter the data correctly.

We can achieve this by creating a vector store with all the distinct proper nouns that exist in the database. 

We can then have the agent query that vector store each time the user includes a proper noun in their question, 

to find the correct spelling for that word. 

In this way, the agent can make sure it understands which entity the user is referring to before building the target query.


# Parse the entities
First we need the unique values for each entity we want, for which we define a function that parses the result into a list of elements:

In [6]:
import ast
import re

def query_as_list(db, query):
    res = db.run(query)
    res = [el for sub in ast.literal_eval(res) for el in sub if el]
    res = [re.sub(r"\b\d+\b", "", string).strip() for string in res]
    return list(set(res))

In [7]:
artists = query_as_list(db, "SELECT Name FROM Artist")
albums = query_as_list(db, "SELECT Title FROM Album")
albums[:5]

['Misplaced Childhood',
 'Purcell: Music for the Queen Mary',
 'Jota Quest-',
 'Powerslave',
 'Black Sabbath Vol.  (Remaster)']

# Create a Retriever tool
Using this function, we can create a retriever tool that the agent can execute at its discretion.



In [8]:
from langchain.agents.agent_toolkits import create_retriever_tool
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

vector_db = FAISS.from_texts(artists + albums, OpenAIEmbeddings())
retriever = vector_db.as_retriever(search_kwargs={"k": 5})
description = """Use to look up values to filter on. Input is an approximate spelling of the proper noun, output is \
valid proper nouns. Use the noun most similar to the search."""
retriever_tool = create_retriever_tool(
    retriever,
    name="search_proper_nouns",
    description=description,
)

Test the retriever

In [9]:
print(retriever_tool.invoke("Alice Chains"))

Alice In Chains

Alanis Morissette

Pearl Jam

Pearl Jam

Audioslave


This way, if the agent determines it needs to write a filter based on an artist along the lines of "Alice Chains", it can first use the retriever tool to observe relevant values of a column.

Putting this together:

# System Prompt
We will also want to create a system prompt for our agent. This will consist of instructions for how to behave.

In [10]:
system = """You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct SQLite query to run, then look at the results of the query and return the answer.
Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most 5 results.
You can order the results by a relevant column to return the most interesting examples in the database.
Never query for all the columns from a specific table, only ask for the relevant columns given the question.
You have access to tools for interacting with the database.
Only use the given tools. Only use the information returned by the tools to construct your final answer.
You MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again.

DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.

You have access to the following tables: {table_names}

If you need to filter on a proper noun, you must ALWAYS first look up the filter value using the "search_proper_nouns" tool!
Do not try to guess at the proper name - use this function to find similar ones.""".format(
    table_names=db.get_usable_table_names()
)


# Initializing agent
First, get required package LangGraph


We will use a prebuilt LangGraph agent to build our agent



In [11]:
system_message = SystemMessage(content=system)
tools.append(retriever_tool)

In [12]:
agent_executor = create_react_agent(llm, tools, messages_modifier=system_message)

  agent_executor = create_react_agent(llm, tools, messages_modifier=system_message)


In [13]:
for s in agent_executor.stream(
    {"messages": [HumanMessage(content="How many albums does alis in chain have?")]}
):
    print(s)
    print("----")

{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_amBwEjBaRq412vkRdoeLRpIy', 'function': {'arguments': '{"query":"alis in chain"}', 'name': 'search_proper_nouns'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 665, 'total_tokens': 684}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_f33667828e', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-21d770a1-e483-499d-83ac-e90b7a68749b-0', tool_calls=[{'name': 'search_proper_nouns', 'args': {'query': 'alis in chain'}, 'id': 'call_amBwEjBaRq412vkRdoeLRpIy', 'type': 'tool_call'}], usage_metadata={'input_tokens': 665, 'output_tokens': 19, 'total_tokens': 684})]}}
----
{'tools': {'messages': [ToolMessage(content='Alice In Chains\n\nAisha Duo\n\nXis\n\nDa Lama Ao Caos\n\nA-Sides', name='search_proper_nouns', tool_call_id='call_amBwEjBaRq412vkRdoeLRpIy')]}}
----
{'agent': {'messages': [AIMessage(content='', 

# Printing only the final output

In [14]:
def print_final_generation(agent_executor,user_input):
    last_agent_message = None

    # Stream the output from the agent executor
    for s in agent_executor.stream(
        {"messages": [HumanMessage(content=user_input)]}
    ):
        # Check if the message is from the agent and contains content
        if 'agent' in s and 'messages' in s['agent']:
            for message in s['agent']['messages']:
                if isinstance(message, AIMessage) and message.content:
                    last_agent_message = message.content

    # Print the last agent message
    if last_agent_message:
        print(last_agent_message)

In [15]:
while True:
    user_input = input("User: ")
    if user_input.lower() in ["quit", "exit", "q"]:
        print("Goodbye!")
        break
    print_final_generation(agent_executor,user_input)

The customers from the USA spent the most, totaling $523.06. The top five countries with the highest spending are:

1. **USA**: $523.06
2. **Canada**: $303.96
3. **France**: $195.10
4. **Brazil**: $190.10
5. **Germany**: $156.48
Goodbye!


Note that the agent executes multiple queries until it has the information it needs:

List available tables;
Retrieves the schema for three tables;
Queries multiple of the tables via a join operation.
The agent is then able to use the result of the final query to generate an answer to the original question.

The agent can similarly handle qualitative questions:

In [16]:
for s in agent_executor.stream(
    {"messages": [HumanMessage(content="Describe the playlisttrack table")]}
):
    print(s)
    print("----")

{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_Vv6AO48LGip9mA5hu2KGReDg', 'function': {'arguments': '{"table_names":"PlaylistTrack"}', 'name': 'sql_db_schema'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 660, 'total_tokens': 677}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_f33667828e', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-111e1d12-3a4b-4144-ada6-e24839df2359-0', tool_calls=[{'name': 'sql_db_schema', 'args': {'table_names': 'PlaylistTrack'}, 'id': 'call_Vv6AO48LGip9mA5hu2KGReDg', 'type': 'tool_call'}], usage_metadata={'input_tokens': 660, 'output_tokens': 17, 'total_tokens': 677})]}}
----
{'tools': {'messages': [ToolMessage(content='\nCREATE TABLE "PlaylistTrack" (\n\t"PlaylistId" INTEGER NOT NULL, \n\t"TrackId" INTEGER NOT NULL, \n\tPRIMARY KEY ("PlaylistId", "TrackId"), \n\tFOREIGN KEY("TrackId") REFERENCES "Track" ("TrackI