A router query engine is the decision-making component of the agent that decides which data source or tool is the best fit for a given query.

*   Ensures that the query is directed to the right source
*   Minimizing query time by avoiding unnecessary routing.

In [None]:
!pip install -q llama_index llama-index-readers-web llama-index-tools-google llama-index-embeddings-huggingface llama-index-llms-anthropic

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/56.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.5/56.5 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m17.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m862.7/862.7 kB[0m [31m18.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m 

In [None]:
from urllib.request import urlretrieve

urlretrieve("https://arxiv.org/pdf/2312.10997.pdf", "2312.10997.pdf")

('2312.10997.pdf', <http.client.HTTPMessage at 0x7d1909151e70>)

In [None]:
from llama_index.core import SimpleDirectoryReader

paper_documents = SimpleDirectoryReader(input_files=["2312.10997.pdf"]).load_data()

In [None]:
paper_documents[0].text[:300]

'1\nRetrieval-Augmented Generation for Large\nLanguage Models: A Survey\nYunfan Gaoa, Yun Xiongb, Xinyu Gaob, Kangxiang Jiab, Jinliu Panb, Yuxi Bic, Yi Daia, Jiawei Suna, Meng\nWangc, and Haofen Wanga,c\naShanghai Research Institute for Intelligent Autonomous Systems, Tongji University\nbShanghai Key Labor'

In [None]:
from llama_index.readers.web import SimpleWebPageReader

recipe_documents = SimpleWebPageReader(html_to_text=True).load_data(["https://tasty.co/recipe/chicken-gyros"])

In [None]:
recipe_documents[0].text[10000:12000]

'_00001.jpg?output-\nformat=auto&output-quality=auto&resize=600:*)\n\n##### Total Time\n\n3 hr 30 min\n\n3 hr 30 min\n\n##### Prep Time\n\n20 minutes\n\n20 min\n\n##### Cook Time\n\n1 hr 30 min\n\n1 hr 30 min\n\n## Ingredients\n\nfor 8 servings\n\nMarinade\n\n  * 2 cups plain full-fat greek yogurt (570 g)\n  * ¼ cup lemon juice (60 mL)\n  * ¾ cup olive oil (180 mL)\n  * 1 tablespoon kosher salt\n  * 1 tablespoon minced garlic\n  * 1 tablespoon ground coriander\n  * 1 tablespoon paprika\n  * 1 tablespoon ground cumin\n  * ½ teaspoon cayenne pepper\n  * 1 teaspoon cinnamon\n  * 1 teaspoon freshly ground black pepper\n  * 2 lb boneless, skinless chicken thighs (910 g), pounded flat\n\nTzatziki Sauce\n\n  * 1 large cucumber, shredded\n  * 2 cups plain full-fat greek yogurt (570 g)\n  * 1 tablespoon minced garlic\n  * ¼ cup lemon juice (60 mL)\n  * 2 tablespoons finely chopped fresh dill\n  * 2 tablespoons finely chopped fresh parsley\n  * kosher salt, to taste\n  * freshly ground black pep

In [None]:
from llama_index.core import Settings

Settings.chunk_size = 500
paper_nodes = Settings.node_parser.get_nodes_from_documents(paper_documents)
recipe_nodes = Settings.node_parser.get_nodes_from_documents(recipe_documents)

In [None]:
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.anthropic import Anthropic
from google.colab import userdata

anthropic_api_key = userdata.get('ANTHROPIC_API_KEY')

embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)
llm = Anthropic(model="claude-3-5-sonnet-20240620", api_key=anthropic_api_key)

paper_vector_index = VectorStoreIndex(paper_nodes, embed_model=embed_model)
recipe_vector_index = VectorStoreIndex(recipe_nodes, embed_model=embed_model)

paper_query_engine = paper_vector_index.as_query_engine(llm=llm)
recipe_query_engine = recipe_vector_index.as_query_engine(llm=llm)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
from llama_index.tools.google import GoogleSearchToolSpec
import json

google_search_api_key = userdata.get('GOOGLE_SEARCH_API_KEY')
google_search_engine = userdata.get('GOOGLE_SEARCH_ENGINE')
google_search_tool = GoogleSearchToolSpec(key=google_search_api_key, engine=google_search_engine)

test_results = google_search_tool.google_search("potato")
print(json.loads(test_results[0].text)["queries"]["request"][0]["totalResults"])

1040000000


In [None]:
from llama_index.core.query_engine import CustomQueryEngine
from llama_index.core.retrievers import BaseRetriever
from llama_index.core.response_synthesizers import BaseSynthesizer
from llama_index.core import PromptTemplate

qa_prompt = PromptTemplate(
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the query.\n"
    "Query: {query_str}\n"
    "Answer: "
)

class GoogleSearchQueryEngine(CustomQueryEngine):
    """Google Search Query Engine."""

    llm: Anthropic
    tool: GoogleSearchToolSpec

    def custom_query(self, query_str: str):
        response = self.tool.google_search(query_str)
        response_obj = json.loads(response[0].text)
        context_str = "\n\n".join([n["snippet"] for n in response_obj["items"][0:5]])
        output = self.llm.complete(
            qa_prompt.format(context_str=context_str, query_str=query_str)
        )
        return str(output)

In [None]:
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
from llama_index.core.tools import QueryEngineTool


paper_vector_tool = QueryEngineTool.from_defaults(
    query_engine=paper_query_engine,
    description="Useful for retrieving information about Retrieval Augmented Generation or RAG techniques",
)
recipe_vector_tool = QueryEngineTool.from_defaults(
    query_engine=recipe_query_engine,
    description="Useful for retrieving information about cooking recipes",
)
google_query_engine = GoogleSearchQueryEngine(llm=llm, tool=google_search_tool)
google_tool = QueryEngineTool.from_defaults(
    query_engine=google_query_engine,
    description="Useful for retrieving information from the internet",
)

query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(llm=llm),
    query_engine_tools=[
        paper_vector_tool,
        recipe_vector_tool,
        google_tool
    ],
    llm=llm
)

In [None]:
result = query_engine.query("Explain Modular RAG in one paragraph")
result

Response(response='Modular RAG represents an evolution in the Retrieval-Augmented Generation (RAG) approach, offering greater flexibility and adaptability compared to its predecessors. This architecture incorporates various strategies to enhance its components, such as introducing a search module for similarity searches and refining the retriever through fine-tuning. It supports both sequential processing and integrated end-to-end training across its components. While building upon the principles of Advanced and Naive RAG, Modular RAG introduces specialized modules like the Search module and RAG-Fusion to improve retrieval and processing capabilities. These innovations allow for more efficient handling of diverse data sources and complex query scenarios. The overall structure of Modular RAG is not limited to sequential retrieval and generation but includes methods such as iterative and adaptive retrieval, making it a more versatile and powerful tool for information retrieval and genera

In [None]:
result = query_engine.query("What ingredients do I need to make chicken gyros?")
result

Response(response="To make chicken gyros, you'll need several ingredients for the marinade, tzatziki sauce, and serving.\n\nFor the marinade, gather Greek yogurt, lemon juice, olive oil, kosher salt, minced garlic, ground coriander, paprika, ground cumin, cayenne pepper, cinnamon, black pepper, and boneless, skinless chicken thighs.\n\nThe tzatziki sauce requires cucumber, Greek yogurt, minced garlic, lemon juice, fresh dill, fresh parsley, kosher salt, black pepper, and yellow onion.\n\nFor serving, you'll need pita breads, sliced onion, and sliced tomato.\n\nAdditionally, you'll want to have a sturdy 10-inch wooden skewer on hand as part of the special equipment needed for this recipe.", source_nodes=[NodeWithScore(node=TextNode(id_='d5191c57-c5c1-4f45-8738-35119bacfce7', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='https://tasty.co/recipe/chicken-gyros', node_type=

As we can see, the chicken gyros recipe vector store was correctly chosen to answer that question.

Finally, let's ask it a question that can be answered with a Google Search.



In [None]:
result = query_engine.query("How tall is the Eiffel Tower?")
result

Response(response="According to the context information provided, the Eiffel Tower is 330 metres (1,083 ft) tall. This is equivalent to the height of an 81-storey building, and it is described as the tallest structure in Paris. \n\nIt's worth noting that there is a slight discrepancy in the information provided, as one source mentions a height of 984 feet. However, the more specific measurement of 330 metres (1,083 ft) is likely the more accurate and up-to-date figure.\n\nAdditionally, the context mentions that 6 meters were recently added to the tower's height due to the installation of a new antenna for digital terrestrial radio. This suggests that the current height might be slightly greater than 330 metres, but an exact updated measurement is not provided in the given information.", source_nodes=[], metadata={'selector_result': MultiSelection(selections=[SingleSelection(index=2, reason="The question 'How tall is the Eiffel Tower?' requires retrieving factual information from a gene