To evaluate different indexing methods, the following queries were used:

- Query 1: Summarize the key points of each chapter.
- Query 2: Which LLaMA 3 models support tool use?


The findings were:
- SummaryIndex performed well for Query 1 but was less effective for Query 2.
- VectorStoreIndex provided a weak summary for Query 1 but answered Query 2 accurately.


**Logical Routing** was employed to enable the LLM to understand the available data sources and choose the appropriate one for each query.

A Router Query Engine with LLMSingleSelector and PydanticSingleSelector was implemented. Both selectors successfully handled both queries.

In [1]:
import os
os.environ["OPENAI_API_KEY"] = ""

In [2]:
# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes.
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.

import nest_asyncio

nest_asyncio.apply()

In [3]:
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector, PydanticSingleSelector 
from llama_index.core.tools import QueryEngineTool

from llama_index.core import Settings
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
)
from llama_index.core import SummaryIndex

import pprint

In [4]:
# load documents
documents = SimpleDirectoryReader("./data").load_data()

# initialize settings (set chunk size)
Settings.chunk_size = 1024
nodes = Settings.node_parser.get_nodes_from_documents(documents)

# initialize storage context (by default it's in-memory)
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)

In [5]:
# Define Summary Index and Vector Index
summary_index = SummaryIndex(nodes, storage_context=storage_context)
vector_index = VectorStoreIndex(nodes, storage_context=storage_context)

summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize", use_async=True
)
vector_query_engine = vector_index.as_query_engine(
    response_mode="tree_summarize", use_async=True
)

### Compare the use of just Summary Index and just Vector Index with using a logical router to choose the suitable data source.
- Query 1: Outline the key points of each chapter
- Query 2: Among the LLaMA 3 models, which ones support tool use?

##### summary_query_engine

In [6]:
response = summary_query_engine.query("Outline the key points of each chapter")
pprint.pprint(str(response))

('Chapter 1: Introduces foundation models, discusses the importance of data, '
 'scale, and managing complexity in AI systems, and introduces Llama 3 as a '
 'new set of foundation models for language.\n'
 '\n'
 'Chapter 2: Details the standard dense Transformer architecture used in Llama '
 '3, highlights modifications made in Llama 3 compared to previous versions, '
 'and provides information on key hyperparameters of Llama 3 models.\n'
 '\n'
 'Chapter 3: Provides an overview of the language model pre-training process, '
 'details pre-training data curation, scaling laws, and determining the data '
 'mix, and explains annealing and its impact on model performance.\n'
 '\n'
 'Chapter 4: Discusses the post-training approach for Llama 3, focusing on '
 'techniques such as rejection sampling, supervised fine-tuning, and direct '
 "preference optimization to enhance the model's performance.\n"
 '\n'
 'Chapter 5: Explores safety measures implemented in Llama 3, including safety '
 'finetun

In [7]:
response = summary_query_engine.query("Among the LLaMA 3 models, which ones support tool use?")
pprint.pprint(str(response))

Retrying llama_index.llms.openai.base.OpenAI._achat in 0.0036315293320886566 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-TpFDNhmy3ILspCWJmSOL1KA2 on tokens per min (TPM): Limit 200000, Used 185451, Requested 16926. Please try again in 713ms. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.


('The LLaMA 3 models that support tool use are the Llama 3 8B, Llama 3 70B, '
 'Llama 3 405B, Pal, Mutox, and Tora.')


##### vector_query_engine

In [8]:
response = vector_query_engine.query("Outline the key points of each chapter")
pprint.pprint(str(response))

('Chapter 70 outlines the key factors that contributed to the successful '
 'development of the Llama 3 model family, emphasizing the importance of '
 'high-quality data, scale, and simplicity in achieving the best results. It '
 'also discusses the deep technical problems and clever organizational '
 'decisions involved in developing a flagship foundation model like Llama 3, '
 'such as preventing overfitting on commonly used benchmarks and ensuring '
 'trustworthy human evaluations. Additionally, it mentions preliminary '
 'experiments on integrating multimodal capabilities into Llama 3 to '
 'accelerate research in that direction.\n'
 '\n'
 'Chapter 71 discusses the decision to publicly release the Llama 3 language '
 'models to accelerate the development of AI systems for various societal use '
 'cases and enable the research community to scrutinize and improve the '
 'models. It emphasizes the role of open, responsible development of AGI '
 'models and the belief that sharing foun

In [9]:
response = vector_query_engine.query("Among the LLaMA 3 models, which ones support tool use?")
pprint.pprint(str(response))

'Llama 3.1 8B, Llama 3.1 70B, and Llama 3.1 405B support tool use.'


### Router Query Engine 

In [10]:
# initialize tools
summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description="Useful for summarization questions related to the data source",
)
vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description="Useful for retrieving specific context related to the data source",
)

# initialize router query engine (single selection, llm)
query_engine_single_selector = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
)

# initialize router query engine (single selection, pydantic)
query_engine_pydantic_selector = RouterQueryEngine(
    selector=PydanticSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
)

#### LLM Single Selector

In [11]:
response = query_engine_single_selector.query("Outline the key points of each chapter")
pprint.pprint(str(response))

Retrying llama_index.llms.openai.base.OpenAI._achat in 0.6203736367095077 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-TpFDNhmy3ILspCWJmSOL1KA2 on tokens per min (TPM): Limit 200000, Used 199060, Requested 13051. Please try again in 3.633s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 0.8461261260069054 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-TpFDNhmy3ILspCWJmSOL1KA2 on tokens per min (TPM): Limit 200000, Used 195814, Requested 16244. Please try again in 3.617s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 0.1514632

('Chapter 1: Development of Llama 3 foundation models, focusing on '
 'pre-training and post-training stages, data optimization, scaling laws, and '
 'model architecture.\n'
 '\n'
 'Chapter 2: Model architecture of Llama 3, detailing the use of standard '
 'dense Transformer architecture with modifications for improved training '
 'stability and efficiency, and covering scaling laws for determining optimal '
 'model size.\n'
 '\n'
 'Chapter 3: Training infrastructure, scaling, and efficiency of Llama 3, '
 'including details on hardware, storage, network, and parallelism methods '
 'used for training, and challenges faced in maintaining reliability during '
 'large-scale training.\n'
 '\n'
 'Chapter 4: Optimization of Llama 3 through pre-training and post-training '
 'stages, showcasing competitive results on various benchmarks and robustness '
 'in multiple-choice question setups.\n'
 '\n'
 'Chapter 5: Safety measures in Llama 3 development, including safety '
 'benchmark construction

In [12]:
response.metadata['selector_result']

MultiSelection(selections=[SingleSelection(index=0, reason='Useful for summarization questions related to the data source')])

In [13]:
response = query_engine_single_selector.query("Among the LLaMA 3 models, which ones support tool use?")
pprint.pprint(str(response))

'Llama 3.1 8B, Llama 3.1 70B, and Llama 3.1 405B support tool use.'


In [14]:
response.metadata['selector_result']

MultiSelection(selections=[SingleSelection(index=1, reason='The question is asking for specific context related to the data source, in this case, the LLaMA 3 models and their support for tool use.')])

#### Pydantic Selector

In [15]:
response = query_engine_pydantic_selector.query("Outline the key points of each chapter")
pprint.pprint(str(response))

Retrying llama_index.llms.openai.base.OpenAI._achat in 0.37094534440713733 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-TpFDNhmy3ILspCWJmSOL1KA2 on tokens per min (TPM): Limit 200000, Used 189321, Requested 13051. Please try again in 711ms. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 0.8328147508741239 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-TpFDNhmy3ILspCWJmSOL1KA2 on tokens per min (TPM): Limit 200000, Used 189071, Requested 12985. Please try again in 616ms. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 0.03626700

('Chapter 1: Introduction to foundation models and the development of Llama 3, '
 'emphasizing its multilinguality, coding, reasoning, and tool usage support. '
 'Details on data, scale, and complexity management in Llama 3 development.\n'
 '\n'
 'Chapter 2: Overview of the dense Transformer architecture in Llama 3, key '
 'hyperparameters for different language models, and scaling laws for '
 'determining optimal model size.\n'
 '\n'
 'Chapter 3: Information on training infrastructure, parallelism methods for '
 'model scaling, and efficiency in Llama 3. \n'
 '\n'
 'Chapter 4: Safety measures during pre-training and fine-tuning stages in '
 'Llama 3, focusing on data cleaning, safety fine-tuning, benchmark '
 'construction, and safety pre-training.\n'
 '\n'
 'Chapter 5: Safety evaluations in Llama 3, including safety fine-tuning, '
 'cybersecurity evaluation results, and safety at the system level. Discussion '
 'on safety training data, risk mitigation, and safety performance across 

In [16]:
response.metadata['selector_result']

MultiSelection(selections=[SingleSelection(index=0, reason='Summarization questions related to the data source require outlining key points of each chapter.')])

In [17]:
response = query_engine_pydantic_selector.query("Among the LLaMA 3 models, which ones support tool use?")
pprint.pprint(str(response))

'Llama 3.1 8B, Llama 3.1 70B, and Llama 3.1 405B support tool use.'


In [18]:
response.metadata['selector_result']

MultiSelection(selections=[SingleSelection(index=1, reason='The question is asking for specific context related to the data source, which is about the LLaMA 3 models and their support for tool use.')])