# Lesson 1: Router Engine

We will start with the simplest form of agentic RAG, a router. Given a query, a router will pick one of several query engines that execute the query. We will build a simple router over a single document that can handle question answering and summarization.

![Router Engine](img/router_engine.jpg)

Welcome to Lesson 1.

The first three lesson will build agentic capabilities over a single document. The fourth lesson will show how to build a multi-document agent.
To access the `requirements.txt` file, the data/pdf file required for this lesson and the `helper` and `utils` modules, please go to the `File` menu and select`Open...`.

I hope you enjoy this course!

## Setup

We must import `nest_asyncio` because Jupyter runs a loop behind the scenes, and this loop leverages async. To make async play nice with Jupyter notebooks, we need this module.

In [1]:
from helper import get_openai_api_key

OPENAI_API_KEY = get_openai_api_key()

In [2]:
import nest_asyncio

nest_asyncio.apply()

## Load Data

This is the paper we will use in the first three lessons.
To download this paper, below is the needed code:

#!wget "https://openreview.net/pdf?id=VtmBAGCN7o" -O metagpt.pdf

**Note**: The pdf file is included with this lesson. To access it, go to the `File` menu and select`Open...`.

The `SimpleDirectoryReader` function reads a PDF file into a parsed document representation. You can find more about this function [here](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/).

In [3]:
from llama_index.core import SimpleDirectoryReader

# load documents
documents = SimpleDirectoryReader(input_files=["metagpt.pdf"]).load_data()

## Define LLM and Embedding model

We split the document trying to respect the boundaries of sentences, but with a chunks size limit of 1024 (tokens). More information on `SentenceSplitter` can be found [here](https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/modules/?h=sentencesplitter#sentencesplitter).

In [4]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

LamaIndex, by default, uses gpt-3.5-turbo as a model and the `text-embedding-ada-002` as the embedding model, but this can be fully customized. Customization is done using the `Settings` module from `llama_index.core`. and setting the fields `.llm` and `.embed_model`. The code below shows how to change the LLM and the embeddings.

**Question**: is it only possible to consider the LLMs and embedding models stored in the `llms` and `embeddings` modules of `llama_index`?

In [5]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

## Define Summary Index and Vector Index over the Same Data

Here we build two indexes, one for Q&A and one for summarization. An index can be thought of as metadata over our data. Indexes can be queried, and different indexes will have different retrieval behaviors.

A vector index is based on embeddings, and retrieval is based on embedding similarity. It is a core abstraction for any RAG system.

A summary index will return all the nodes currently in the index, so it doesn't really depend on the user query

In [6]:
from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

## Define Query Engines and Set Metadata

The next step is turning these indexes into query engines and query tools. Each query engine is good for a certain type of query, and routers allow to route different queries to different engines. Below we define the summary query engine.

A query tool is just a query engine with metadata. It is a description of what types of questions the tool can answer. We will define a tool for the summary and Q&A.

In [7]:
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
vector_query_engine = vector_index.as_query_engine()

In [8]:
from llama_index.core.tools import QueryEngineTool


summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to MetaGPT"
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from the MetaGPT paper."
    ),
)

## Define Router Query Engine

LlamaIndex provides several types of *selectors* to build a router, each of which has distinct attributes.

The LLM selectors use the LLM to output a JSON that is parsed, and the corresponding indexes are queried.

Instead of directly prompting the LLM with text, the Pydantic selectors use the OpenAI Function Calling API to produce pydantic selection objects, rather than parsing raw JSON. Let's try an LLM-powered single selector called `LLMSingleSelector`.

The `RouterQueryEngine` takes a selector and a list of query engine tools.

In [9]:
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector


query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    verbose=True
)

Let's test some queries. The first query asks for a summary of the document. The router correctly selects the summarization tool. The verbose output allows to view the intermediate steps taken.

In [10]:
response = query_engine.query("What is the summary of the document?")
print(str(response))

[1;3;38;5;200mSelecting query engine 0: Useful for summarization questions related to MetaGPT.
[0mThe document introduces MetaGPT, a meta-programming framework that enhances multi-agent systems using Large Language Models (LLMs) by incorporating human-like Standardized Operating Procedures (SOPs) to streamline workflows and improve problem-solving processes. MetaGPT assigns specific roles to agents, facilitates structured communication, and employs an executable feedback mechanism to iteratively improve code quality. It achieves state-of-the-art performance in code generation benchmarks, emphasizing role specialization, workflow management, and efficient sharing mechanisms. The framework also explores self-improvement mechanisms, recursive self-improvement in software development teams, and the concept of multi-agent economies. Additionally, it discusses the software development process for creating a "Drawing App" using MetaGPT, highlighting the roles of different agents and the imp

The response come with sources that we can inspect with `response.source_nodes`. The number of nodes, here, is the number of chunks the document was split into. This is what we meant when we said that the summary engine returns all its nodes.

In [11]:
print(len(response.source_nodes))

34


The query below, instead, activates the vector tool. Again, the verbose output shows the steps followed.

In [12]:
response = query_engine.query(
    "How do agents share information with other agents?"
)
print(str(response))

[1;3;38;5;200mSelecting query engine 1: This choice is more relevant as it specifically mentions retrieving specific context, which is necessary for understanding how agents share information with other agents..
[0mAgents share information with other agents by utilizing a shared message pool where they can publish structured messages. This shared message pool allows all agents to exchange messages directly, enabling them to not only publish their own messages but also access messages from other entities transparently. Additionally, agents can subscribe to relevant messages based on their role profiles, allowing them to extract the information they need based on their specific roles and interests.


## Let's put everything together

The above steps can be summarized in a single helper function that is already available in the `utils` module: the `get_router_query_engine()` function.

In [13]:
from utils import get_router_query_engine

query_engine = get_router_query_engine("metagpt.pdf")

In [14]:
response = query_engine.query("Tell me about the ablation study results?")
print(str(response))

[1;3;38;5;200mSelecting query engine 1: Ablation study results are specific context from the MetaGPT paper, making choice 2 the most relevant..
[0mThe ablation study results show that MetaGPT effectively addresses challenges related to context utilization, code hallucinations, and information overload in software development. By focusing on unfolding natural language descriptions accurately, maintaining information validity, and using a global message pool with a subscription mechanism, MetaGPT enhances the efficiency and relevance of communication and task-solving processes.
