# Lesson 1: Router Engine

Welcome to Lesson 1.

To access the `requirements.txt` file, the data/pdf file required for this lesson and the `helper` and `utils` modules, please go to the `File` menu and select`Open...`.

I hope you enjoy this course!

## Setup

The first thing we will import our OpenAI key with a helper function. Next we will import `nest_asyncio`, because Jupyter runs an event loop behind the scenes, and a lot of our modules use async. We need to import `nest_asyncio` to make async play nice with Jupyter notebooks.

In [1]:
from helper import get_openai_api_key

OPENAI_API_KEY = get_openai_api_key()

In [2]:
import nest_asyncio

nest_asyncio.apply()

## Load Data

The next step to load a sample document, MetaGPT.pdf (a research paper). To download this paper, below is the needed code:

#!wget "https://openreview.net/pdf?id=VtmBAGCN7o" -O metagpt.pdf

**Note**: The pdf file is included with this lesson. To access it, go to the `File` menu and select`Open...`.

In [3]:
from llama_index.core import SimpleDirectoryReader

# load documents
documents = SimpleDirectoryReader(input_files=["metagpt.pdf"]).load_data()

## Define LLM and Embedding model

Next we will import the sentence splitter from LlamaIndex. We will use our simple directory reader module and LlamaIndex to read the PDF into a parsed document representation. To split these documents into even-sized chunks, and we will split on the order of sentences. So we set `chunk_size=1024`, and we call `splitter.get_nodes_from_documents()` to split these documents into nodes.

The next step is optional, and this allows us to find an LLM and embedding model. We can do this by specifying a global config setting where we specify the LLM and embedding mode that we want to inject as part of the global config. By default, we use `GPT 3.5 turbo` and `text embedding ada 2.0` in this course.

We define the settings object, and `Settings.llm = OpenAi()`, and `Settings.embed_model = OpenAIEmbedding()`.

In [4]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

In [5]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

## Define Summary Index and Vector Index over the Same Data

Now we will start building some indexes. We define 2 indexes over these nodes. This includes both a summary index and a vector index. We can think of an index as a set of metadata over our data. We can query an index, and different indexes will have different retrieval behaviors. 

A vector index indexes nodes via text embeddings and its core abstraction in LlamaIndex, and a core abstraction for building any sort of RAG system. Querying a vector index will return the most similar nodes by embedding similarity.

A summary index, on the other hand, is also a very simple index, but querying it will return all the nodes currently in the index, so it doesn't necessarily depend on the user query, but will return all the nodes that is currently in the index. 

To set up both the summary and vector index, we will just import these two simple modules.

In [6]:
from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

## Define Query Engines and Set Metadata

Now let's turn these indexes into query entrants and then query tools. Each query entrant represents an overall query interface over the data that's stored in this index, and combines retrieval with LLM synthesis. Each query entrant is good for a certain type of question, and this is a great use case for a router, which can route dynamically between these different query entrants.

A query tool now is just the query entrant with metadata, specifically a description of what types of questions the tool can answer. So here we define `summary_query_engine = summary_index.as_query_engine()`. And then also `vector_query_engine = vector_index.as_query_engine()`. 

We can see that the query engine is derived from each of the indexes. We can see that for the summary query engine, we set `use_async=True` to enforce fast query generation by leveraging async capabilities.

Next, a query tool is just a query engine with metadata. It is specifically a description of what types of questions the tool can answer. We will define a query tool for both the summary and vector query engines. Through this code snippet here, we see that the summary tool description is useful for summarization questions related to MetaGPT, and the vector tool description is useful for retrieving specific context from the MetaGPT paper.

In [7]:
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
vector_query_engine = vector_index.as_query_engine()

In [8]:
from llama_index.core.tools import QueryEngineTool

summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to MetaGPT"
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from the MetaGPT paper."
    ),
)

## Define Router Query Engine

Now that we have our query engines and tools, we are ready to define our router. LlamaIndex provides several different types of selectors to enable us to build a router, and each of these selectors has distinct attributes. 

The LLM selector is one option, and it involves prompting an LLM to output a json that is then parsed, and then the corresponding indexes are queried.

Another option is to use the Pydantic selectors. Instead of directly prompting the LLM with text, we actually use the function calling APIs supported by models like OpenAI to produce Pydantic selection objects, rather than parsing raw json. 

For each of these types of selectors, we also have the dynamic capabilities to select one index to route two, or actually multiple.

Let's try an LLM-powered single selector called `LLMSingleSelector`. So we import 2 modules: a `RouterQueryEngine` and the `LLMSingleSelector`. We see that the `RouterQueryEngine` takes in a `selector` type and a set of `query_engine_tools`. The `selector` type is just the `LLMSingleSelector()`, which means that it prompts the LLM and makes a single selection. The `query_engine_tools` include the summarization tool and the vector tool.

In [9]:
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector

query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    verbose=True
)

Let's now test some queries. The first question we will ask is "What is a summary of the document?" 

The verbose output allows us to view the intermediate steps that are being taken. We see that the output includes `Selecting query engine 0: Useful for summarization questions related to MetaGPT`. This means that the first option, the summary tool, is actually picked in order to help answer this question. As a result, we are able to get back a response. The document introduces MetaGPT, a meta-programming framework for LLM-based multi-agent collaboration. And this gives an overall summary of the paper, and it is synthesized over all the context in the paper. 

So the response comes with sources. And to inspect the sources, we can take a look at `response.source_nodes`. We see that the length of `response.source_nodes` is equal to 34. Coincidentally, that is exactly equal to the number of chunks of the entire document. And so we see that the summary query engine must have been getting called, because the summary query engine returns all the chunks corresponding to the items within its index.

In [10]:
response = query_engine.query("What is the summary of the document?")
print(str(response))

[1;3;38;5;200mSelecting query engine 0: Useful for summarization questions related to MetaGPT.
[0mThe document introduces MetaGPT, a meta-programming framework that enhances multi-agent collaboration using Large Language Models (LLMs) in software development. It incorporates Standardized Operating Procedures (SOPs) to streamline workflows, assign specialized roles to agents, and improve communication. MetaGPT achieves state-of-the-art performance in code generation benchmarks, offers an executable feedback mechanism for debugging and executing code during runtime, and emphasizes the importance of role specialization and efficient communication mechanisms. The framework presents a promising approach to developing efficient LLM-based multi-agent systems with a focus on high-quality results and iterative code improvement.


In [11]:
print(len(response.source_nodes))

34


Let's take a look at another example. We will ask the question: "How do agents share information with other agents?" So let's ask this against the overall router query engine, and take a look at both the verbose output as well as the response. 

Here, we see that we actually select query engine 1, and the LLM gives some reasoning as to why it actually picks the vector search tool as opposed to the summary tool. It is because the focus is on retrieving specific context from the MetaGPT paper, where agents sharing information with other agents is probably located within a paragraph of that paper. And it's able to find that context and generate a response. So it is able to utilize a shared message pool where they can publish structured messages.

In [12]:
response = query_engine.query(
    "How do agents share information with other agents?"
)
print(str(response))

[1;3;38;5;200mSelecting query engine 1: This choice is more relevant as it focuses on retrieving specific context from the MetaGPT paper, which may contain information on how agents share information with other agents..
[0mAgents share information with other agents by utilizing a shared message pool where they can publish structured messages. This shared message pool allows all agents to exchange messages directly, enabling them to both publish their own messages and access messages from other agents transparently. Additionally, agents can subscribe to relevant messages based on their role profiles, ensuring that they receive only the information that is pertinent to their tasks and responsibilities.


## Let's put everything together

And that basically helps to conclude Lesson 1. To put everything together, all the code above can be consolidated into a single helper function that takes in a file path and builds the router query engine with both vector search and summarization over it. 

So this is an `utils` module called `get_router_query_engine`. We can use it to query MetaGPT PDF, and test an example question "Tell me about the ablation study results." And we will get back the response from the query engine. 

In this case, we also look at query engine 1 because the ablation study results reference specific context from the MetaGPT paper, so we want to do a vector search. And we are able to get back a final answer.

In [13]:
from utils import get_router_query_engine

query_engine = get_router_query_engine("metagpt.pdf")

In [14]:
response = query_engine.query("Tell me about the ablation study results?")
print(str(response))

[1;3;38;5;200mSelecting query engine 1: Ablation study results are specific context from the MetaGPT paper, making choice 2 the most relevant..
[0mThe ablation study results show that MetaGPT effectively addresses challenges related to context utilization, code hallucinations, and information overload in software development. By accurately unfolding natural language descriptions and maintaining information validity, MetaGPT eliminates ambiguity and allows Language Models (LLMs) to focus on relevant data. Additionally, by focusing on granular tasks like requirement analysis and package selection, MetaGPT guides LLMs in software generation, reducing code hallucination issues. Furthermore, MetaGPT uses a global message pool and subscription mechanism to tackle information overload, ensuring efficient communication and filtering out irrelevant contexts to enhance the relevance and utility of information.
