# Router Retriever
In this guide, we define a custom router retriever that selects one or more candidate retrievers in order to execute a given query.

The router (`BaseSelector`) module uses the LLM to dynamically make decisions on which underlying retrieval tools to use. This can be helpful to select one out of a diverse range of data sources. This can also be helpful to aggregate retrieval results across a variety of data sources (if a multi-selector module is used).

This notebook is very similar to the RouterQueryEngine notebook.

### Setup

In [1]:
# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes.
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.
import nest_asyncio

nest_asyncio.apply()

In [2]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().handlers = []
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index import (
    VectorStoreIndex,
    ListIndex,
    SimpleDirectoryReader,
    ServiceContext,
    StorageContext,
    SimpleKeywordTableIndex
)

Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
NumExpr defaulting to 8 threads.


### Load Data

We first show how to convert a Document into a set of Nodes, and insert into a DocumentStore.

In [3]:
# load documents
documents = SimpleDirectoryReader("../data/paul_graham").load_data()

In [4]:
# initialize service context (set chunk size)
service_context = ServiceContext.from_defaults(chunk_size=1024)
nodes = service_context.node_parser.get_nodes_from_documents(documents)

In [5]:
# initialize storage context (by default it's in-memory)
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)

In [6]:
# define 
list_index = ListIndex(nodes, storage_context=storage_context)
vector_index = VectorStoreIndex(nodes, storage_context=storage_context)
keyword_index = SimpleKeywordTableIndex(nodes, storage_context=storage_context)

In [7]:
list_retriever = list_index.as_retriever()
vector_retriever = vector_index.as_retriever()
keyword_retriever = keyword_index.as_retriever()

In [8]:
from llama_index.tools import RetrieverTool

list_tool = RetrieverTool.from_defaults(
    retriever=list_retriever,
    description="Useful for summarization questions related to Paul Graham eassy on What I Worked On.",
)
vector_tool = RetrieverTool.from_defaults(
    retriever=vector_retriever,
    description="Useful for retrieving specific context from Paul Graham essay on What I Worked On.",
)
keyword_tool = RetrieverTool.from_defaults(
    retriever=keyword_retriever,
    description="Useful for retrieving specific context using keywords from Paul Graham essay on What I Worked On.",
)

### Define Selector Module for Routing

There are several selectors available, each with some distinct attributes.

The LLM selectors use the LLM to output a JSON that is parsed, and the corresponding indexes are queried.

The Pydantic selectors (currently only supported by `gpt-4-0613` and `gpt-3.5-turbo-0613` (the default)) use the OpenAI Function Call API to produce pydantic selection objects, rather than parsing raw JSON.

Here we use PydanticSingleSelector/PydanticMultiSelector but you can use the LLM-equivalents as well. 

In [9]:
from llama_index.selectors.llm_selectors import LLMSingleSelector, LLMMultiSelector
from llama_index.selectors.pydantic_selectors import (
    PydanticMultiSelector,
    PydanticSingleSelector,
)
from llama_index.retrievers import RouterRetriever

#### PydanticSingleSelector

In [10]:
retriever = RouterRetriever(
    selector=PydanticSingleSelector.from_defaults(),
    retriever_tools=[
        list_tool,
        vector_tool,
    ],
)

In [11]:
retriever.retrieve("What is the summary of the document?")

IndexError: tuple index out of range

In [None]:
retriever.retrieve("What did Paul Graham do after RISD?")

#### PydanticMultiSelector

In [None]:
retriever = RouterRetriever(
    selector=PydanticMultiSelector.from_defaults(),
    query_engine_tools=[
        list_tool,
        vector_tool,
        keyword_tool
    ],
)

In [None]:
retriever.retrieve(
    "What were noteable events and people from the authors time at Interleaf and YC?"
)