## Hierarchical Agents with LlamaIndex

The goal of this notebook is to demonstrate and compare the use of hierachical agents with LlamaIndex. We will set up a few tools and compare the capabilities of the agent when using the resources it has available for retrieval as agents vs tools.

### Setup OpenAI Agent
To start we will import the OpenAI agent we will be using across examples, and authenticate with our key:

In [1]:
# Set up OpenAI
import openai
from llama_index.agent import OpenAIAgent

openai.api_key = "sk-your-key"

### Setup Database Tool

The first data source we will be setting up as a tool is the Database tool. This tool connects to a SQL database and can provide information about the tables, and run SQL queries against the database. One of the tools, `describe tables`, lists all of the tables and their columns, as seen below:

In [2]:
# Import and initialize our tool spec
from llama_hub.tools.database.base import DatabaseToolSpec
from sqlalchemy import create_engine

db_tool_spec = DatabaseToolSpec(engine=create_engine("sqlite:///countries.db"))
print(db_tool_spec.describe_tables())
print(db_tool_spec.load_data("SELECT * from airports LIMIT 1")[0].text)
database_tools = db_tool_spec.to_tool_list()


CREATE TABLE airports (
	"Orig" TEXT, 
	"Name" TEXT, 
	"AirportCountry" TEXT, 
	"Airport1Latitude" TEXT, 
	"Airport1Longitude" TEXT, 
	"TotalTravelers" INTEGER
)




CREATE TABLE gdp (
	"CountryName" TEXT, 
	"MillionsGDP" INTEGER
)




CREATE TABLE population (
	"Country" TEXT, 
	"ThousandsPopulation" INTEGER
)



HEA, Herat, Afghanistan, 34.20690155, 62.22669983, 22041


For the purposes of this demo, we are using a database of airports, GDP and population from the [world bank](https://datacatalog.worldbank.org/home). The airports table contains international airports, with the country they are located in and the number of travelers that passed through in 2019. The population and GDP tables contain population and GDP data of each country.

In addition to the airports database, we will be using the wikipedia tool.

### Wikipedia Query Engine Tool

The wikipedia tool allows our agent to search wikipedia and pull entire articles based on the search. This is very useful for general knowledge retrieval, however most Wikipedia articles are nearly too large or too large to fit into context windows, and if our agent is retrieving multiple articles we quickly run out of tokens.

Therefore, we combine two tools to give our Agent access to Wikipedia. We take the wikipedia `search_data` tool and wrap it with the `LoadAndSearchTool`. This results in two tools, where the first loads the wikipedia pages into a vector store, and the second acts as a query engine to retrieve data from the vector store. We will set this up and test it out below.

In [3]:
# Import and initialize our tool spec
from llama_hub.tools.wikipedia.base import WikipediaToolSpec
from llama_index.tools.tool_spec.load_and_search.base import LoadAndSearchToolSpec

wiki_spec = WikipediaToolSpec()
# Get the search wikipedia tool
search_wikipedia = wiki_spec.to_tool_list()[1]
# Create the Agent with our tools
wikipedia_tool_spec = LoadAndSearchToolSpec.from_defaults(search_wikipedia)

print(wikipedia_tool_spec.load("NBA 2023 playoffs"))
print(wikipedia_tool_spec.read("Who won the NBA 2023 playoffs"))
wikipedia_tools = wikipedia_tool_spec.to_tool_list()

Content loaded! You can now search the information using read_search_data
The Denver Nuggets won the NBA 2023 playoffs.


### Flat Agent

We will now create two agents, one that directly uses the database tool, and another that has a dedicated agent for using the database tool, and compare the effectiveness.

The "flat" is relatively simple to set up, we just pass the tools to our OpenAI Agent abstraction

In [4]:
flat_agent = OpenAIAgent.from_tools(
    [*wikipedia_tools, *database_tools],
    system_prompt="""
        You are a specialized agent with access to wikipedia and a database containing information on
        populations, gdp and the airports of countries. 
        When asked questions you should always attempt to use the database to gather information, and
        wikipedia as a backup when the database does not contain the desired information.
        Break any questions down into steps so that you can answer using a mix of the database and wikipedia.
    """,
    verbose=True,
)

### Creating the Hierarchical Agent

Creating the hierarchical agent is somewhat more involved. One of the advantages of hierarchical agents is we can provide more specfic prompts to the sub-agents and for the agents as tools. For each sub-agent in a hierarchical agent we want to make, we make an agent with the tools it should have available and give the sub-agent a specfic system prompt for how it can use the tools. Then we wrap the sub-agent with the QueryEngineTool abstraction so that it can be passed as a tool to the higher level agent.

We repeat this process to create a database and wikipedia agent tool that are passed to the hierarchical agent

In [5]:
from llama_index.tools.query_engine import QueryEngineTool

# Recreate the wikipedia tool spec so that our agents have seperate vector stores.
wiki_spec = WikipediaToolSpec()
search_wikipedia = wiki_spec.to_tool_list()[1]
wikipedia_tool_spec = LoadAndSearchToolSpec.from_defaults(search_wikipedia)


# Create the Database Agent
database_agent = OpenAIAgent.from_tools(
    [*database_tools],
    system_prompt=f"""
        You are a specialized agent designed to assist users query a SQL database.
        
        Here is a complete description of the SQL database you have access to:
        
        {db_tool_spec.describe_tables()}
    """,
    verbose=True,
)

# Convert the Database Agent to a Tool
database_agent_tool = QueryEngineTool.from_defaults(
    database_agent,
    name="database_agent",
    description="""
        This agent can answer questions about airport traffic, country GDP and population using a database
    """,
)

# Create the Wikipedia Agent
wikipedia_agent = OpenAIAgent.from_tools(
    [*wikipedia_tools],
    system_prompt="""
        You are a specialized agent designed to search for wikipedia pages and then 
        query the retrieved articles to answer questions passed to you by the user.
        
        This agent should be passed a specfic question to be answered using the wikipedia knowledge sources
    """,
    verbose=True,
)

# Convert the Wikipedia Agent to a Tool
wikipedia_agent_tool = QueryEngineTool.from_defaults(
    wikipedia_agent,
    name="wikipedia_agent",
    description=f"""
        This agent can search wikipedia and parse articles to answer any questions on your behalf
    """,
)

# Create the Hierarchical agent with the database agent
hierarchical_agent = OpenAIAgent.from_tools(
    [wikipedia_agent_tool, database_agent_tool],
    system_prompt="""
        You are a specialized agent with access to wikipedia and a database containing information on
        populations, gdp and the airports of countries. 
        When asked questions you should always attempt to use the database to gather information, and
        wikipedia as a backup when the database does not contain the desired information.
        Break any questions down into steps so that you can answer using a mix of the database and wikipedia.
    """,
    verbose=True,
)

### Comparing the Agents

One of the first issues the flat agent will run into is it doesn't have knowledge of the SQL schema. An advantage of the hierarchical agent is that we were able to supply the SQL schema in the system prompt for the database_agent. Because of this, the flat agent will struggle to use the database until we have it call the `describe_tables` tool to see the schema:

In [6]:
print(hierarchical_agent.chat("what country has the third highest gdp"))

=== Calling Function ===
Calling function: database_agent with args: {
  "input": "country with third highest gdp"
}
=== Calling Function ===
Calling function: load_data with args: {
  "query": "SELECT CountryName, MillionsGDP FROM gdp ORDER BY MillionsGDP DESC LIMIT 3"
}
Got output: [Document(id_='dfd1ced4-9c6c-478d-9999-fb7c95dd23eb', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='d85186287ae3c721bb63e6bacf6442aa9f3355d9a68ddb936ebcf1a6f50657d0', text='United States, 25462700', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), Document(id_='0b4b3b36-b9b1-4d83-9f9f-9000d95c95d5', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='52ab443d5c85d54558d1e749735171a48c35ef431492df0867fe8ab0dc0f4214', text='China, 17963171', start_char_idx=None, end_char_idx=Non

In [7]:
print(flat_agent.chat("what country has the third highest gdp"))

=== Calling Function ===
Calling function: load_data with args: {
  "query": "SELECT country_name, gdp FROM countries ORDER BY gdp DESC LIMIT 3"
}


OperationalError: (sqlite3.OperationalError) no such table: countries
[SQL: SELECT country_name, gdp FROM countries ORDER BY gdp DESC LIMIT 3]
(Background on this error at: https://sqlalche.me/e/20/e3q8)

### Hierachical Agent Advantage: Additional specfic context

The flat agent was unable to properly form the SQL query as it hadn't yet retrieved the table schema. In comparison, we provided the database agent the database schema as part of it's system prompt, as it's main focus is interacting with the database. We can ask the flat agent to load the table schema and ask the question again to get the correct answer:

In [8]:
print(flat_agent.chat("describe all of the available tables"))
print(flat_agent.chat("what country has the third highest gdp"))

=== Calling Function ===
Calling function: describe_tables with args: {}
Got output: 
CREATE TABLE airports (
	"Orig" TEXT, 
	"Name" TEXT, 
	"AirportCountry" TEXT, 
	"Airport1Latitude" TEXT, 
	"Airport1Longitude" TEXT, 
	"TotalTravelers" INTEGER
)




CREATE TABLE gdp (
	"CountryName" TEXT, 
	"MillionsGDP" INTEGER
)




CREATE TABLE population (
	"Country" TEXT, 
	"ThousandsPopulation" INTEGER
)



There are three available tables in the database:

1. `airports` table:
   - Columns: "Orig" (origin), "Name", "AirportCountry", "Airport1Latitude", "Airport1Longitude", "TotalTravelers"
   - This table contains information about airports, including their names, countries, coordinates, and total number of travelers.

2. `gdp` table:
   - Columns: "CountryName", "MillionsGDP"
   - This table contains information about the GDP (Gross Domestic Product) of countries, including the country name and the GDP in millions.

3. `population` table:
   - Columns: "Country", "ThousandsPopulation"
   - Th

The flat agent was able to form the correct query after being explicity asked by the user, however it still made an extra query to the wikipedia tool when the data could have been retrieved from the database.

### Comparing ability to write Join queries

We ask the agents to execute a SQL query that requires a join:

In [9]:
print(
    hierarchical_agent.chat(
        "What country with under 10 million people has the highest gdp"
    )
)

=== Calling Function ===
Calling function: database_agent with args: {
  "input": "country with highest gdp and population under 10 million"
}
=== Calling Function ===
Calling function: load_data with args: {
  "query": "SELECT gdp.CountryName, gdp.MillionsGDP, population.ThousandsPopulation FROM gdp JOIN population ON gdp.CountryName = population.Country WHERE population.ThousandsPopulation < 10000 ORDER BY gdp.MillionsGDP DESC LIMIT 1"
}
Got output: [Document(id_='726b0010-5f0b-4843-b38a-55e51d32fdd7', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='d176dff8313dab084435594a2f2c4071819c5ccf5b6abb2d03d4889d0ae4ef8e', text='Switzerland, 807706, 8770', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')]
Got output: The country with the highest GDP and a population under 10 million is Switzerland, with a GDP of 807,706 million 

In [10]:
print(flat_agent.chat("What country with under 10 million people has the highest gdp"))

=== Calling Function ===
Calling function: load_data with args: {
  "query": "SELECT CountryName, MillionsGDP FROM gdp WHERE MillionsGDP IS NOT NULL AND ThousandsPopulation < 10000 ORDER BY MillionsGDP DESC LIMIT 1"
}


OperationalError: (sqlite3.OperationalError) no such column: ThousandsPopulation
[SQL: SELECT CountryName, MillionsGDP FROM gdp WHERE MillionsGDP IS NOT NULL AND ThousandsPopulation < 10000 ORDER BY MillionsGDP DESC LIMIT 1]
(Background on this error at: https://sqlalche.me/e/20/e3q8)

### Hierarchical Agent Advantage: Better ability to craft queries

The flat agent was unable to come up with the correct join query to combine the population and gdp tables, whereas the hierarchical agent executed the query correctly and obtained the right answer. One reason that the hierarchical agent performed better here may be that schema was closer to the question in the context window.

In [11]:
print(
    hierarchical_agent.chat(
        "Who is the current leader of the country with the third highest population"
    )
)

=== Calling Function ===
Calling function: database_agent with args: {
  "input": "country with third highest population"
}
=== Calling Function ===
Calling function: load_data with args: {
  "query": "SELECT Country, ThousandsPopulation FROM population ORDER BY ThousandsPopulation DESC LIMIT 3"
}
Got output: [Document(id_='b3e294be-0cea-446e-b161-06d760c93f06', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='3eb098f7aa2d847b30fa9837d8798f637148c3c6a120ae12612114412660bfce', text='India, 1417173', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), Document(id_='e43f63ed-2a10-44af-9983-a37fd42c437b', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='72c2fca9e61ac0cbbfb69f533b765fc241006300363ad524bc140aa9f903b807', text='China, 1412175', start_char_idx=None, 

In [12]:
print(
    flat_agent.chat(
        "Who is the current leader of the country with the third highest population"
    )
)

=== Calling Function ===
Calling function: load_data with args: {
  "query": "SELECT Country FROM population ORDER BY ThousandsPopulation DESC LIMIT 3"
}
Got output: [Document(id_='af84a01d-816f-45ed-bc14-b8ba61b82455', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='abbc17675c6627a122d3c0160f6cce46a9f841a6469eb47ab5d87a6499fd86b7', text='India', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), Document(id_='6a6948bb-c64d-4d07-906e-ba3b04291302', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='99d48fd32be248a61bf8694eee10f141a2ce750e9d5daad17ac71946b71a7a26', text='China', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), Document(id_='23f035b3-4663-4e67-b68d

### Hierarchical Agent Advantage: Ability to interpret results

The hierarchical agent correctly took the last value in the query as the third highest population, where as the flat agent took the first result which is actually the highest population. It appears that they were both able to use the wikipedia tool equally well.

In [13]:
print(hierarchical_agent.chat("when was the fifth busiest airport in Spain built?"))

=== Calling Function ===
Calling function: database_agent with args: {
  "input": "fifth busiest airport in Spain"
}
=== Calling Function ===
Calling function: load_data with args: {
  "query": "SELECT Name, TotalTravelers FROM airports WHERE AirportCountry = 'Spain' ORDER BY TotalTravelers DESC LIMIT 5"
}
Got output: [Document(id_='a1d7c549-4fa0-4ea8-8e6f-e536bdf692ee', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='d3633a93029fc54b48f25e444778f0dbe985b57f76f36e8e79e6e734b14aee90', text='Adolfo Suarez-Barajas, 26811681', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), Document(id_='7a638f65-7dc9-432b-80ea-6db76e2c9d46', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='cec3baa9dbbb70249033fb6c6995520286d54b3144e50e33db45f3f2bf4b87bb', text='Barcelona, 2

In [14]:
print(flat_agent.chat("when was the fifth busiest airport in Spain built?"))

=== Calling Function ===
Calling function: load_data with args: {
  "query": "SELECT Name FROM airports WHERE AirportCountry = 'Spain' ORDER BY TotalTravelers DESC LIMIT 5"
}
Got output: [Document(id_='1c3bc3aa-9fc1-42b6-b97a-18e7504b497e', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='6e5f8e00b3cfc65c4147a4a548ed00dba6ee4756bb29604b58ed5fbb707d3fe1', text='Adolfo Suarez-Barajas', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), Document(id_='5e7c5f25-e2eb-47be-bd82-6fe675df1b70', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='b7ab1ee21e956cea7f8214063fc92c17c8be117642ba67a19f7d1fe4d7736080', text='Barcelona', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n

### No agent advantage

Both agents were unable to properly answer this question, as the flat agent again took the first result instead of the fifth. The hierarchical agent selected the correct airport, however it was not able to use the wikipedia tool effecitvely to determine the date of construction of the airport, instead providing the date an additional terminal was constructed. It's possible with more prompt tuning we could get the hierarchical agent to use the wikipedia sub-agent more effectively.