# Router Retriever
In this guide, we define a custom router retriever that selects one or more candidate retrievers in order to execute a given query.

The router (`BaseSelector`) module uses the LLM to dynamically make decisions on which underlying retrieval tools to use. This can be helpful to select one out of a diverse range of data sources. This can also be helpful to aggregate retrieval results across a variety of data sources (if a multi-selector module is used).

This notebook is very similar to the RouterQueryEngine notebook.

### Setup

In [1]:
# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes.
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.
import nest_asyncio

nest_asyncio.apply()

In [2]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().handlers = []
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index import (
    VectorStoreIndex,
    ListIndex,
    SimpleDirectoryReader,
    ServiceContext,
    StorageContext,
    SimpleKeywordTableIndex
)
from llama_index.llms import OpenAI

Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
NumExpr defaulting to 8 threads.


### Load Data

We first show how to convert a Document into a set of Nodes, and insert into a DocumentStore.

In [3]:
# load documents
documents = SimpleDirectoryReader("../data/paul_graham").load_data()

In [22]:
# initialize service context (set chunk size)
llm = OpenAI(model="gpt-4")
service_context = ServiceContext.from_defaults(chunk_size=1024, llm=llm)
nodes = service_context.node_parser.get_nodes_from_documents(documents)

In [23]:
# initialize storage context (by default it's in-memory)
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)

In [24]:
# define 
list_index = ListIndex(nodes, storage_context=storage_context)
vector_index = VectorStoreIndex(nodes, storage_context=storage_context)
keyword_index = SimpleKeywordTableIndex(nodes, storage_context=storage_context)

In [25]:
list_retriever = list_index.as_retriever()
vector_retriever = vector_index.as_retriever()
keyword_retriever = keyword_index.as_retriever()

In [32]:
from llama_index.tools import RetrieverTool

list_tool = RetrieverTool.from_defaults(
    retriever=list_retriever,
    description="Will retrieve all context from Paul Graham's essay on What I Worked On. Don't use if the question only requires more specific context.",
)
vector_tool = RetrieverTool.from_defaults(
    retriever=vector_retriever,
    description="Useful for retrieving specific context from Paul Graham essay on What I Worked On.",
)
keyword_tool = RetrieverTool.from_defaults(
    retriever=keyword_retriever,
    description="Useful for retrieving specific context from Paul Graham essay on What I Worked On (using entities mentioned in query)",
)

### Define Selector Module for Routing

There are several selectors available, each with some distinct attributes.

The LLM selectors use the LLM to output a JSON that is parsed, and the corresponding indexes are queried.

The Pydantic selectors (currently only supported by `gpt-4-0613` and `gpt-3.5-turbo-0613` (the default)) use the OpenAI Function Call API to produce pydantic selection objects, rather than parsing raw JSON.

Here we use PydanticSingleSelector/PydanticMultiSelector but you can use the LLM-equivalents as well. 

In [28]:
from llama_index.selectors.llm_selectors import LLMSingleSelector, LLMMultiSelector
from llama_index.selectors.pydantic_selectors import (
    PydanticMultiSelector,
    PydanticSingleSelector,
)
from llama_index.retrievers import RouterRetriever
from llama_index.response.notebook_utils import display_source_node

#### PydanticSingleSelector

In [10]:
retriever = RouterRetriever(
    selector=PydanticSingleSelector.from_defaults(llm=llm),
    retriever_tools=[
        list_tool,
        vector_tool,
    ],
)

In [11]:
# will retrieve all context from the author's life
nodes = retriever.retrieve("Can you give me all the context regarding the author's life?")
for node in nodes:
    display_source_node(node)

Selecting retriever 0: This choice is most relevant as it mentions retrieving all context from the essay, which could include information about the author's life..


**Node ID:** 7d07d325-489e-4157-a745-270e2066a643<br>**Similarity:** None<br>**Text:** What I Worked On

February 2021

Before college the two main things I worked on, outside of schoo...<br>

**Node ID:** 01f0900b-db83-450b-a088-0473f16882d7<br>**Similarity:** None<br>**Text:** showed Terry Winograd using SHRDLU. I haven't tried rereading The Moon is a Harsh Mistress, so I ...<br>

**Node ID:** b2549a68-5fef-4179-b027-620ebfa6e346<br>**Similarity:** None<br>**Text:** Science is an uneasy alliance between two halves, theory and systems. The theory people prove thi...<br>

**Node ID:** 4f1e9f0d-9bc6-4169-b3b6-4f169bbfa391<br>**Similarity:** None<br>**Text:** been explored. But all I wanted was to get out of grad school, and my rapidly written dissertatio...<br>

**Node ID:** e20c99f9-5e80-4c92-8cc0-03d2a527131e<br>**Similarity:** None<br>**Text:** stop there, of course, or you get merely photographic accuracy, and what makes a still life inter...<br>

**Node ID:** dbdf341a-f340-49f9-961f-16b9a51eea2d<br>**Similarity:** None<br>**Text:** that big, bureaucratic customers are a dangerous source of money, and that there's not much overl...<br>

**Node ID:** ed341d3a-9dda-49c1-8611-0ab40d04f08a<br>**Similarity:** None<br>**Text:** about money, because I could sense that Interleaf was on the way down. Freelance Lisp hacking wor...<br>

**Node ID:** d69e02d3-2732-4567-a360-893c14ae157b<br>**Similarity:** None<br>**Text:** a web app, is common now, but at the time it wasn't clear that it was even possible. To find out,...<br>

**Node ID:** df9e00a5-e795-40a1-9a6b-8184d1b1e7c0<br>**Similarity:** None<br>**Text:** have to integrate with any other software except Robert's and Trevor's, so it was quite fun to wo...<br>

**Node ID:** 38f2699b-0878-499b-90ee-821cb77e387b<br>**Similarity:** None<br>**Text:** all too keenly aware of the near-death experiences we seemed to have every few months. Nor had I ...<br>

**Node ID:** be04d6a9-1fc7-4209-9df2-9c17a453699a<br>**Similarity:** None<br>**Text:** for a second still life, painted from the same objects (which hopefully hadn't rotted yet).

Mean...<br>

**Node ID:** 42344911-8a7c-4e9b-81a8-0fcf40ab7690<br>**Similarity:** None<br>**Text:** which I'd created years before using Viaweb but had never used for anything. In one day it got 30...<br>

**Node ID:** 9ec3df49-abf9-47f4-b0c2-16687882742a<br>**Similarity:** None<br>**Text:** I didn't know but would turn out to like a lot: a woman called Jessica Livingston. A couple days ...<br>

**Node ID:** d0cf6975-5261-4fb2-aae3-f3230090fb64<br>**Similarity:** None<br>**Text:** of readers, but professional investors are thinking "Wow, that means they got all the returns." B...<br>

**Node ID:** 607d0480-7eee-4fb4-965d-3cb585fda62c<br>**Similarity:** None<br>**Text:** to the "YC GDP," but as YC grows this becomes less and less of a joke. Now lots of startups get t...<br>

**Node ID:** 730a49c9-55f7-4416-ab91-1d0c96e704c8<br>**Similarity:** None<br>**Text:** So this set me thinking. It was true that on my current trajectory, YC would be the last thing I ...<br>

**Node ID:** edbe8c67-e373-42bf-af98-276b559cc08b<br>**Similarity:** None<br>**Text:** operators you need? The Lisp that John McCarthy invented, or more accurately discovered, is an an...<br>

**Node ID:** 175a4375-35ec-45a0-a90c-15611505096b<br>**Similarity:** None<br>**Text:** Like McCarthy's original Lisp, it's a spec rather than an implementation, although like McCarthy'...<br>

**Node ID:** 0cb367f9-0aac-422b-9243-0eaa7be15090<br>**Similarity:** None<br>**Text:** must tell readers things they don't already know, and some people dislike being told such things....<br>

**Node ID:** 67afd4f1-9fa1-4e76-87ac-23b115823e6c<br>**Similarity:** None<br>**Text:** 1960 paper.

But if so there's no reason to suppose that this is the limit of the language that m...<br>

In [12]:
nodes = retriever.retrieve("What did Paul Graham do after RISD?")
for node in nodes:
    display_source_node(node)

Selecting retriever 1: The question asks for specific information from Paul Graham's essay on 'What I Worked On'. Therefore, the second choice is more relevant as it is useful for retrieving specific context..


**Node ID:** dbdf341a-f340-49f9-961f-16b9a51eea2d<br>**Similarity:** 0.8017176790752668<br>**Text:** that big, bureaucratic customers are a dangerous source of money, and that there's not much overl...<br>

**Node ID:** 730a49c9-55f7-4416-ab91-1d0c96e704c8<br>**Similarity:** 0.7935885352785799<br>**Text:** So this set me thinking. It was true that on my current trajectory, YC would be the last thing I ...<br>

#### PydanticMultiSelector

In [33]:
retriever = RouterRetriever(
    selector=PydanticMultiSelector.from_defaults(llm=llm),
    retriever_tools=[
        list_tool,
        vector_tool,
        keyword_tool
    ],
)

In [18]:
nodes = retriever.retrieve(
    "What were noteable events and people from the authors time at Interleaf and YC?"
)
for node in nodes:
    display_source_node(node)

Selecting retriever 1: This choice allows for retrieving specific context from the essay, which is necessary for answering the question about notable events and people during the author's time at Interleaf and YC..
Selecting retriever 2: This choice also allows for retrieving specific context using keywords, which could be useful for finding information about Interleaf and YC..
> Starting query: What were noteable events and people from the authors time at Interleaf and YC?
query keywords: ['yc', 'people', 'notable', 'events', 'interleaf']
> Extracted keywords: ['yc', 'people', 'interleaf']


  nodes = retriever.retrieve(


**Node ID:** 43f1c43b-8c97-44ae-9471-32fa39f34596<br>**Similarity:** None<br>**Text:** So this set me thinking. It was true that on my current trajectory, YC would be the last thing I ...<br>

**Node ID:** fd114f92-036a-4d79-9be6-47610fbcf0e7<br>**Similarity:** 0.8029640095509958<br>**Text:** that big, bureaucratic customers are a dangerous source of money, and that there's not much overl...<br>

**Node ID:** 6b40af73-8713-4bcf-9102-0e72890c479a<br>**Similarity:** None<br>**Text:** must tell readers things they don't already know, and some people dislike being told such things....<br>

**Node ID:** dfe09c34-89c0-4c11-89a5-b7ea171be184<br>**Similarity:** None<br>**Text:** stop there, of course, or you get merely photographic accuracy, and what makes a still life inter...<br>

**Node ID:** 32f6ff3f-ad1b-4327-bee0-13e87b2bef07<br>**Similarity:** None<br>**Text:** to the "YC GDP," but as YC grows this becomes less and less of a joke. Now lots of startups get t...<br>

**Node ID:** ff6a0845-0427-4a88-aaa0-f747cac3673c<br>**Similarity:** None<br>**Text:** of readers, but professional investors are thinking "Wow, that means they got all the returns." B...<br>

**Node ID:** e6315375-a99d-46a3-9353-2c7fc28f439a<br>**Similarity:** None<br>**Text:** which I'd created years before using Viaweb but had never used for anything. In one day it got 30...<br>

**Node ID:** 02690740-cc9e-4a24-a0f6-693a6987c310<br>**Similarity:** None<br>**Text:** been explored. But all I wanted was to get out of grad school, and my rapidly written dissertatio...<br>

**Node ID:** afbecc43-125c-466d-b92d-18d9d9ebd5ed<br>**Similarity:** None<br>**Text:** Like McCarthy's original Lisp, it's a spec rather than an implementation, although like McCarthy'...<br>

**Node ID:** dd3de208-c0f9-4e2e-82cc-188e83868076<br>**Similarity:** None<br>**Text:** Science is an uneasy alliance between two halves, theory and systems. The theory people prove thi...<br>

In [34]:
nodes = await retriever.aretrieve(
    "What were noteable events and people from the authors time at Interleaf?"
)
for node in nodes:
    display_source_node(node)

Selecting retriever 1: This choice is most relevant as it allows for retrieving specific context from the essay, which is necessary to answer the question about notable events and people during the author's time at Interleaf..
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=44 request_id=c1ecc198a9b42919edb7e15c6868a89d response_code=200
Selecting retriever 2: This choice is also relevant as it allows for retrieving specific context using entities mentioned in the query, which could help in identifying notable events and people from the author's time at Interleaf..


**Node ID:** b872283b-558e-40d9-8764-7f1294ff3783<br>**Similarity:** 0.806937699429088<br>**Text:** that big, bureaucratic customers are a dangerous source of money, and that there's not much overl...<br>

**Node ID:** b4b9ddf8-711b-49c7-82cb-a525ad943832<br>**Similarity:** 0.8030677459417556<br>**Text:** about money, because I could sense that Interleaf was on the way down. Freelance Lisp hacking wor...<br>