# Many tables

One of the main pieces of information we need to include in our prompt is the schemas of the relevant tables. 

When we have very many tables, we can't fit all of the schemas in a single prompt. 

What we can do in such cases is first extract the names of the tables related to the user input, and then include only their schemas.

One easy and reliable way to do this is using tool-calling. 

Below, we show how we can use this feature to obtain output conforming to a desired format (in this case, a list of table names). 

We use the chat model's .bind_tools method to bind a tool in Pydantic format, 

and feed this into an output parser to reconstruct the object from the model's response.

In [1]:
import os
from dotenv import load_dotenv
load_dotenv()
os.environ["LANGCHAIN_API_KEY"]=os.environ.get('LANGCHAIN_API_KEY')
os.environ["LANGCHAIN_TRACING_V2"]="true"
os.environ["LANGCHAIN_PROJECT"]="Q&A_over_SQL_data"

# SQL DB Creation
Create an SQL database that we can query



In [2]:
from langchain_community.utilities import SQLDatabase

db = SQLDatabase.from_uri("sqlite:///Chinook.db")
print(db.dialect)
print(db.get_usable_table_names())
db.run("SELECT * FROM Artist LIMIT 10;")

sqlite
['Album', 'Artist', 'Customer', 'Employee', 'Genre', 'Invoice', 'InvoiceLine', 'MediaType', 'Playlist', 'PlaylistTrack', 'Track']


"[(1, 'AC/DC'), (2, 'Accept'), (3, 'Aerosmith'), (4, 'Alanis Morissette'), (5, 'Alice In Chains'), (6, 'Antônio Carlos Jobim'), (7, 'Apocalyptica'), (8, 'Audioslave'), (9, 'BackBeat'), (10, 'Billy Cobham')]"

# Hook the DB to LLM

In [3]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")

# Table definitions and example rows and dialect

We will add more place holders:

1) ```{dialect}```
2) ```{top_k}```

In [4]:
system = """You are a {dialect} expert. Given an input question, create a syntactically correct {dialect} query to run.
Unless the user specifies in the question a specific number of examples to obtain, query for at most {top_k} results using the LIMIT clause as per {dialect}. You can order the results to return the most informative data in the database.
Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (") to denote them as delimited identifiers.
Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
Pay attention to use date('now') function to get the current date, if the question involves "today".

Only use the following tables:
{table_info}

Question: {input}

Write an initial draft of the query. Then double check the {dialect} query for common mistakes, including. Make sure that these mistakes do NOT happen at all.
- Using NOT IN with NULL values
- Using UNION when UNION ALL should have been used
- Using BETWEEN for exclusive ranges
- Data type mismatch in predicates
- Properly quoting identifiers
- Using the correct number of arguments for functions
- Casting to the correct data type
- Using the proper columns for joins
- Giving output of query in backticks like ```query```

If there are any of the above mistakes, rewrite the query.
Give only the SQL query and no other characters. Not even header like SQLQuery: etc
Never give output of query in backticks like ```query```
If there are backticks. Remove them before giving the final output query.
If there are no mistakes, just reproduce the original query with no further commentary.

Output the final SQL query only.

"""


In [5]:
db.dialect

'sqlite'

In [6]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [("system", system), ("human", "{input}")]
).partial(dialect=db.dialect)

In [7]:
from langchain.chains import create_sql_query_chain

query_chain = create_sql_query_chain(llm, db, prompt=prompt)

In [8]:
from langchain_core.output_parsers.openai_tools import PydanticToolsParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field


class Table(BaseModel):
    """Table in SQL database."""

    name: str = Field(description="Name of table in SQL database.")


table_names = "\n".join(db.get_usable_table_names())
system = f"""Return the names of ALL the SQL tables that MIGHT be relevant to the user question. \
The tables are:

{table_names}

Remember to include ALL POTENTIALLY RELEVANT tables, even if you're not sure that they're needed."""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "{input}"),
    ]
)
llm_with_tools = llm.bind_tools([Table])
output_parser = PydanticToolsParser(tools=[Table])

table_chain = prompt | llm_with_tools | output_parser

table_chain.invoke({"input": "What are all the genres of Alanis Morisette songs"})

[Table(name='Artist'),
 Table(name='Album'),
 Table(name='Genre'),
 Table(name='Track')]

This works pretty well! Except, as we'll see below, we actually need a few other tables as well. This would be pretty difficult for the model to know based just on the user question. In this case, we might think to simplify our model's job by grouping the tables together. We'll just ask the model to choose between categories "Music" and "Business", and then take care of selecting all the relevant tables from there:

In [9]:
system = """Return the names of any SQL tables that are relevant to the user question.
The tables are:

Music
Business
"""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "{input}"),
    ]
)

category_chain = prompt | llm_with_tools | output_parser
category_chain.invoke({"input": "What are all the genres of Alanis Morisette songs"})

[Table(name='Music')]

In [10]:
from typing import List


def get_tables(categories: List[Table]) -> List[str]:
    tables = []
    for category in categories:
        if category.name == "Music":
            tables.extend(
                [
                    "Album",
                    "Artist",
                    "Genre",
                    "MediaType",
                    "Playlist",
                    "PlaylistTrack",
                    "Track",
                ]
            )
        elif category.name == "Business":
            tables.extend(["Customer", "Employee", "Invoice", "InvoiceLine"])
    return tables


table_chain = category_chain | get_tables
table_chain.invoke({"input": "What are all the genres of Alanis Morisette songs"})

['Album', 'Artist', 'Genre', 'MediaType', 'Playlist', 'PlaylistTrack', 'Track']

Now that we've got a chain that can output the relevant tables for any query we can combine this with our 

```create_sql_query_chain```, which can accept a list of ```table_names_to_use``` 

to determine which table schemas are included in the prompt:

In [26]:
query = query_chain.invoke(
    {"question": "What are all the genres of Alanis Morisette songs"}
)
print(query)

SELECT DISTINCT "Genre"."Name" 
FROM "Track" 
JOIN "Genre" ON "Track"."GenreId" = "Genre"."GenreId" 
WHERE "Track"."Composer" LIKE '%Morisette%' 
LIMIT 5


In [15]:
from operator import itemgetter

from langchain.chains import create_sql_query_chain
from langchain_core.runnables import RunnablePassthrough

table_chain = {"input": itemgetter("question")} | table_chain
# Set table_names_to_use using table_chain.
full_chain = RunnablePassthrough.assign(table_names_to_use=table_chain) | query_chain

In [22]:
query = full_chain.invoke(
    {"question": "What are all the genres of Alanis Morisette songs"}
)
print(query)

```sql
SELECT DISTINCT "Genre"."Name"
FROM "Track"
JOIN "Genre" ON "Track"."GenreId" = "Genre"."GenreId"
WHERE "Track"."Composer" LIKE '%Morisette%'
LIMIT 5;
```


In [27]:
db.run("""SELECT DISTINCT "Genre"."Name" 
FROM "Track" 
JOIN "Genre" ON "Track"."GenreId" = "Genre"."GenreId" 
WHERE "Track"."Composer" LIKE '%Morisette%' 
LIMIT 5""")

''

In [23]:
from langchain_community.tools.sql_database.tool import QuerySQLDataBaseTool

execute_query = QuerySQLDataBaseTool(db=db)
execute_query_chain = full_chain | execute_query

In [24]:
query_execution_result = execute_query_chain.invoke(
    {
        "question": "What are all the genres of Alanis Morisette songs"
    }
)
print(query_execution_result)




# Answer the question in Natural Language

Now that we've got a way to automatically generate and execute queries, we just need to combine the original question and SQL query result to generate a final answer. We can do this by passing question and result to the LLM once more:

In [20]:
from operator import itemgetter

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough

answer_prompt = PromptTemplate.from_template(
    """Given the following user question, corresponding SQL query, and SQL result, answer the user question. 

Question: {question}
SQL Query: {query}
SQL Result: {result}
Answer: """
)

sql_qa_chain = (
    RunnablePassthrough.assign(query=full_chain).assign(
        result=itemgetter("query") | execute_query
    )
    | answer_prompt
    | llm
    | StrOutputParser()
)


In [None]:
sql_qa_chain.invoke({"question": "How many employees are there."})