# Router Query Engine

In this tutorial, we will be using a router query engine, which will choose one of multiple candidate query engines to execute user query.

[Documentation](https://gpt-index.readthedocs.io/en/stable/examples/query_engine/RouterQueryEngine.html)

# Setup

In [None]:
!pip install llama-index

Collecting llama-index
  Downloading llama_index-0.9.47-py3-none-any.whl (15.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.9/15.9 MB[0m [31m43.0 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json (from llama-index)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting deprecated>=1.2.9.3 (from llama-index)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)
Collecting dirtyjson<2.0.0,>=1.0.8 (from llama-index)
  Downloading dirtyjson-1.0.8-py3-none-any.whl (25 kB)
Collecting httpx (from llama-index)
  Downloading httpx-0.26.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
Collecting openai>=1.1.0 (from llama-index)
  Downloading openai-1.12.0-py3-none-any.whl (226 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.7/226.7 kB[0m [31m30.3 MB/s[0m eta [36m0:00:00[0m
Collecting tiktoken>=0.3.3 (from lla

In [None]:
# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes.
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.
import nest_asyncio

nest_asyncio.apply()

In [None]:
import logging
import sys

# Set up the root logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)  # Set logger level to INFO

# Clear out any existing handlers
logger.handlers = []

# Set up the StreamHandler to output to sys.stdout (Colab's output)
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.INFO)  # Set handler level to INFO

# Add the handler to the logger
logger.addHandler(handler)

from llama_index import (
    VectorStoreIndex,
    SummaryIndex,
    SimpleDirectoryReader,
    ServiceContext,
    StorageContext,
)

import openai
openai.api_key = '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'

NumExpr defaulting to 2 threads.


## Download Data

In [None]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/jerryjliu/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

--2024-02-12 06:24:39--  https://raw.githubusercontent.com/jerryjliu/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham/paul_graham_essay.txt’


2024-02-12 06:24:39 (8.16 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]



## Load Data

In [None]:
# load documents
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

In [None]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
# HF_TOKEN = "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"

In [None]:
from llama_index.llms.anyscale import Anyscale
from llama_index import ServiceContext, VectorStoreIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings import HuggingFaceEmbedding
import openai

ANYSCALE_ENDPOINT_TOKEN = "esecret_zlrv9emfhnpx8gsqlhy7nqw3tz"
openai.api_key = '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'

# Define LLM
llm = Anyscale(model = "meta-llama/Llama-2-70b-chat-hf",
                 api_key=ANYSCALE_ENDPOINT_TOKEN)


mistral_llm = Anyscale(model = "mistralai/Mixtral-8x7B-Instruct-v0.1",
                 api_key=ANYSCALE_ENDPOINT_TOKEN)


# model = 'mistralai/Mistral-7B-Instruct-v0.1'

# loads BAAI/bge-small-en-v1.5
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

# Abstract llm, embedding model
service_context = ServiceContext.from_defaults(context_window=6000,
    llm = llm,
    embed_model = embed_model,
)
service_context_mistral = ServiceContext.from_defaults(context_window=6000,
    llm = mistral_llm,
    embed_model = embed_model,
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

# Define List Index and Vector Index over Same Data

In [None]:
summary_index = SummaryIndex.from_documents(documents,service_context=service_context_mistral)
vector_index = VectorStoreIndex.from_documents(documents,service_context=service_context)

# Define Query Engines and Set Metadata

In [None]:
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
vector_query_engine = vector_index.as_query_engine()

In [None]:
from llama_index.tools.query_engine import QueryEngineTool


summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description="Useful for summarization questions related to Paul Graham eassy on What I Worked On.",
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description="Useful for retrieving specific context from Paul Graham essay on What I Worked On.",
)

# Define Router Query Engine

There are several selectors available, each with some distinct attributes.

The LLM selectors use the LLM to output a JSON that is parsed, and the corresponding indexes are queried.

The Pydantic selectors (currently only supported by gpt-4-0613 and gpt-3.5-turbo-0613 (the default)) use the OpenAI Function Call API to produce pydantic selection objects, rather than parsing raw JSON.

For each type of selector, there is also the option to select 1 index to route to, or multiple.

## PydanticSingleSelector

In [None]:
from llama_index.query_engine.router_query_engine import RouterQueryEngine
from llama_index.selectors.llm_selectors import LLMSingleSelector
from llama_index.selectors.pydantic_selectors import (
    PydanticSingleSelector,
)
# from llama_index.llms import OpenAI
from llama_index.llms.openai import OpenAI
from IPython.display import display, HTML

#Only Azure LLM works so, no need to mention defaults
query_engine = RouterQueryEngine(
    selector=PydanticSingleSelector.from_defaults(llm = OpenAI(model="gpt-3.5-turbo-0125")),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
)

In [None]:
# query_engine.get_prompts()

In [None]:
from IPython.display import Markdown, display
def display_prompt_dict(prompts_dict):
    for k, p in prompts_dict.items():
        text_md = f"**Prompt Key**: {k}<br>" f"**Text:** <br>"
        display(Markdown(text_md))
        print(p.get_template())
        display(Markdown("<br><br>"))

In [None]:

# display_prompt_dict(query_engine.get_prompts())

In [None]:
display_prompt_dict(vector_query_engine.get_prompts())

**Prompt Key**: response_synthesizer:text_qa_template<br>**Text:** <br>

Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer: 


<br><br>

**Prompt Key**: response_synthesizer:refine_template<br>**Text:** <br>

The original query is as follows: {query_str}
We have provided an existing answer: {existing_answer}
We have the opportunity to refine the existing answer (only if needed) with some more context below.
------------
{context_msg}
------------
Given the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer.
Refined Answer: 


<br><br>

In [None]:
display_prompt_dict(summary_query_engine.get_prompts())

**Prompt Key**: response_synthesizer:summary_template<br>**Text:** <br>

Context information from multiple sources is below.
---------------------
{context_str}
---------------------
Given the information from multiple sources and not prior knowledge, answer the query.
Query: {query_str}
Answer: 


<br><br>

In [None]:
response = query_engine.query("What is the summary of the document?")

HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Selecting query engine 0: The choice is specifically tailored for summarization questions related to Paul Graham's essay on What I Worked On..
HTTP Request: POST https://api.endpoints.anyscale.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.endpoints.anyscale.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.endpoints.anyscale.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.endpoints.anyscale.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.endpoints.anyscale.com/v1/chat/completions "HTTP/1.1 200 OK"


In [None]:
print(response.response)

 The document is a compilation of information about Paul Graham, a programmer, writer, and investor. Graham started his career as a co-founder of Viaweb, a startup that created software for building online stores and was later acquired by Yahoo. After the acquisition, he decided to pursue his passion for painting but faced difficulties and returned to New York. He then created a new dialect of Lisp called Arc and founded Y Combinator, an investment firm that provides seed funding to startups. The document also includes excerpts from Graham's essays, covering topics such as technology, independent thinking, painting, and programming languages. Throughout his career, Graham has emphasized the importance of building things that last, learning through doing, and being the "entry level" option in a market.


In [None]:
response.response

' The document is a compilation of information about Paul Graham, a programmer, writer, and investor. Graham started his career as a co-founder of Viaweb, a startup that created software for building online stores and was later acquired by Yahoo. After the acquisition, he decided to pursue his passion for painting but faced difficulties and returned to New York. He then created a new dialect of Lisp called Arc and founded Y Combinator, an investment firm that provides seed funding to startups. The document also includes excerpts from Graham\'s essays, covering topics such as technology, independent thinking, painting, and programming languages. Throughout his career, Graham has emphasized the importance of building things that last, learning through doing, and being the "entry level" option in a market.'

In [None]:
response = query_engine.query("When was Lisp created?")
response.response

HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Selecting query engine 1: The question 'When was Lisp created?' requires retrieving specific context from Paul Graham's essay on What I Worked On..
HTTP Request: POST https://api.endpoints.anyscale.com/v1/chat/completions "HTTP/1.1 200 OK"


'  The creation of Lisp is not explicitly mentioned in the given context information. However, it is mentioned that Paul Graham, the author of the essay, was working on a new dialect of Lisp called Arc in the summer of 1990. It is also mentioned that Lisp was associated with AI at the time, and that Graham had experience with Lisp hacking. Therefore, it can be inferred that Lisp was created sometime before the summer of 1990, likely in the late 1980s or early 1990s.'

In [None]:
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

## LLMSingleSelector

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

In [None]:
vector_query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(service_context=service_context),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
)

In [None]:
display_prompt_dict(vector_query_engine.get_prompts())

**Prompt Key**: summarizer:summary_template<br>**Text:** <br>

Context information from multiple sources is below.
---------------------
{context_str}
---------------------
Given the information from multiple sources and not prior knowledge, answer the query.
Query: {query_str}
Answer: 


<br><br>

**Prompt Key**: selector:prompt<br>**Text:** <br>

Some choices are given below. It is provided in a numbered list (1 to {num_choices}), where each item in the list corresponds to a summary.
---------------------
{context_list}
---------------------
Using only the choices above and not prior knowledge, return the choice that is most relevant to the question: '{query_str}'



<br><br>

In [None]:
response = query_engine.query("What is the summary of the document?")

HTTP Request: POST https://api.endpoints.anyscale.com/v1/chat/completions "HTTP/1.1 200 OK"
Selecting query engine 0: The question asks for a summary of the document, and choice 1 states that it is useful for summarization questions related to Paul Graham's essay on What I Worked On, which suggests that it may provide a summary of the essay..
HTTP Request: POST https://api.endpoints.anyscale.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.endpoints.anyscale.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.endpoints.anyscale.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.endpoints.anyscale.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.endpoints.anyscale.com/v1/chat/completions "HTTP/1.1 200 OK"


In [None]:
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

In [None]:
response = query_engine.query("What did Paul Graham do after RICS?")

HTTP Request: POST https://api.endpoints.anyscale.com/v1/chat/completions "HTTP/1.1 200 OK"
Selecting query engine 1: This choice is relevant to the question because it mentions Paul Graham's essay on What I Worked On, which is likely to contain information about his activities after RICS..
HTTP Request: POST https://api.endpoints.anyscale.com/v1/chat/completions "HTTP/1.1 200 OK"


In [None]:
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))