# Query Routing

Sometimes its not possible to have all information in a single data source. Maybe your pipeline should be able to handle many different queries, or maybe you have a lot of data and you want to split it into different sources to make it easier find the retrieve the right documents.

Imagine that you have to be able to answer questions about the current weather, but you also want to be able to answer questions about geography. Or you want to be able to limit the search space to search for different categories of documents.

In these cases you can use **Query routing** to dynamically select the right data source to answer the query or search use multiple data sources to answer the query. On way to achieve this is to use the decision making capabilities of LLMs to decide on the fly where to retrieve data from.

## Setup libraries and environment

In [None]:
%pip install python-dotenv
%pip install python-weather=2.0.3
%pip install llama-index==0.10.33
%pip install llama-index-llms-openai==0.1.16
%pip install pydantic

In [None]:
import os

from typing import Optional
from dotenv import load_dotenv
from util.helpers import get_weather, get_wiki_pages, create_and_save_wiki_md_files

from llama_index.core.query_engine import RouterQueryEngine, CustomQueryEngine
from llama_index.core.selectors import LLMMultiSelector, LLMSingleSelector
from llama_index.core.tools import FunctionTool , QueryEngineTool
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, PromptTemplate
from llama_index.llms.openai import OpenAI

In [None]:
# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes.
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.
import nest_asyncio
nest_asyncio.apply()

In [None]:
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

## Example with `RouterQueryEngine`

In the following example we build a `RouterQueryEngine` that will decide on the fly which data source to use to answer the query. We will have two data sources: 

- **Weather data source** - Query engine that can answer questions using todays weather forecast
- **Wiki pages for cities data source** - Query engine that can answer questions using wikipedia pages for cities

The `RouterQueryEngine` will use a LLM to decide which data source to use to answer the query. We use the `LLMMultiSelector` to allow the LLM to select multiple datasources if it needs to.

#### Define custom `WeatherQueryEngine` 

In [None]:
class WeatherQueryEngine(CustomQueryEngine):
    llm: Optional[OpenAI] = OpenAI(api_key=OPENAI_API_KEY, model="gpt-4-turbo")

    def custom_query(self, query_str: str) -> str:
        return "Not implemented yet."

    async def acustom_query(self, query_str: str) -> str:
        cities_prompt = PromptTemplate(
            """Given the following query make a comma separated listof  the cities mentioned in it
        Query: {query_str}
        Cities:"""
        )
        cities = self.llm.complete(cities_prompt.format(query_str=query_str))

        context = [await get_weather(city) for city in str(cities).split(",")]

        res_prompt = PromptTemplate(
            """You're a helpful assistant that helps answer questions using weather forecasts as context
        Question: {query_str}
        
        Forecasts: {context}
        
        Answer:""",
        )
        res = self.llm.complete(
            res_prompt.format(context="\n".join(context), query_str=query_str)
        )
        return str(res)


weather_query_engine = WeatherQueryEngine()

#### Define query engine for Wiki pages for cities

In [None]:
cities_pages = get_wiki_pages(
    [
        "Aarhus",
        "London",
        "Paris",
        "Berlin",
        "Tokyo",
        "Beijing",
        "Moscow",
        "Sydney",
    ]
)

In [None]:
create_and_save_wiki_md_files(cities_pages, path="./data/docs/cities/")
cities_documents = SimpleDirectoryReader("./data/docs/cities").load_data()
cities_index = VectorStoreIndex.from_documents(cities_documents)

#### Define `RouterQueryEngine`

In [None]:
weather_tool = QueryEngineTool.from_defaults(
    query_engine=weather_query_engine,
    description="Useful for getting todays weather forecast for a given city",
)
cities_tool = QueryEngineTool.from_defaults(
    query_engine=cities_index.as_query_engine(),
    name="Cities Wiki Pages",
    description="Useful for getting information about cities",
)

llm = OpenAI(api_key=OPENAI_API_KEY, model="gpt-4-turbo")
query_engine = RouterQueryEngine(
    selector=LLMMultiSelector.from_defaults(llm=llm),
    llm=llm,
    verbose=True,
    query_engine_tools=[
        weather_tool,
        cities_tool,
    ],
)

The following is the prompt used by the selector:

In [None]:
DEFAULT_MULTI_SELECT_PROMPT_TMPL = (
    "Some choices are given below. It is provided in a numbered list (1 to {num_choices}), "
    "where each item in the list corresponds to a summary.\n"
    "---------------------\n"
    "{context_list}"
    "\n---------------------\n"
    "Using only the choices above and not prior knowledge, return the top choices "
    "(no more than {max_outputs}, but only select what is needed) that "
    "are most relevant to the question: '{query_str}'\n"
)

In [58]:
response = await query_engine.aquery("At what time should I go running today in Aarhus?")
print(str(response))

[1;3;38;5;200mSelecting query engine 0: This choice is relevant because knowing today's weather forecast for Aarhus can help determine the best time to go running based on weather conditions..
[0mTo choose the best time for running in Aarhus today, considering the weather conditions, it would be ideal to go when the temperature is comfortable and there is minimal chance of rain. According to the forecast, the early morning hours show no rain and moderate temperatures. Specifically, the times from 00:00 to 06:00 have 0% chance of rain with temperatures ranging from 9°C to 11°C and relatively low cloud cover.

Therefore, the best time to go running in Aarhus today would be between 03:00 and 06:00, when the temperature is around 9°C to 10°C, which is cool and comfortable for running, and there is no rain expected.


In [59]:
response = await query_engine.aquery("What are the best sights to see in Aarhus?")
print(str(response))

[1;3;38;5;200mSelecting query engine 1: This choice is relevant because it involves getting information about cities, which would include details about sights and attractions in a specific city like Aarhus..
[0mThe best sights to see in Aarhus include the ARoS Art Museum, the Old Town Museum (Den Gamle By), Tivoli Friheden, Moesgård Museum, Kvindemuseet, and the various festivals like NorthSide and SPOT. These attractions, along with the city's extensive shopping facilities, make Aarhus a popular destination for tourists.


In [60]:
response = await query_engine.aquery(
    "What are the best sights to see in Aarhus and at what time is it best to go there with regards to weather?"
    )
print(str(response))

[1;3;38;5;200mSelecting query engine 0: This choice is relevant because it provides today's weather forecast for a given city, which is necessary to determine the best time to visit sights in Aarhus..
[0m[1;3;38;5;200mSelecting query engine 1: This choice is relevant as it provides information about cities, which can include details on the best sights to see in Aarhus..
[0mThe best sights to see in Aarhus include the ARoS Art Museum, the Old Town (Den Gamle By), Moesgaard Museum, Aarhus Botanical Gardens, and the Latin Quarter. For the best experience considering the weather, it is advisable to visit these attractions early in the morning before 09:00 AM on May 5, 2024, especially the outdoor locations like the Old Town, Moesgaard Museum, the Botanical Gardens, and the Latin Quarter, to avoid the rain forecasted to start at noon. The ARoS Art Museum, being an indoor venue, can be visited any time during the day, but an early morning visit is recommended to avoid the heavier rain ex