### LlamaIndex: Router Query Engine

Here, I experiment with LlamaIndex routers. 

Please reference this [DeepLearning.AI](https://learn.deeplearning.ai/courses/building-agentic-rag-with-llamaindex/lesson/it0jz/router-query-engine) course for more details.  

### Load Environment

Load environment variables and allow asyncio to be used in the notebook.



In [12]:
from dotenv import load_dotenv
load_dotenv()
import textwrap 
import nest_asyncio
nest_asyncio.apply()
import llama_index.core
llama_index.core.__version__

'0.12.36'

### Retrieve and Load Documents

In [2]:
from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
import httpx
import os

files = ['https://arxiv.org/pdf/2505.10543']

os.makedirs('./data', exist_ok=True)

for idx, f in enumerate(files):
    file_name = f"./data/{f.split("/")[-1]}.pdf"
    if not os.path.exists(file_name):
        r = httpx.get(f, timeout=20)
        with open(file_name, 'wb') as f:
            f.write(r.content)

documents = SimpleDirectoryReader(input_files=["./data/2505.10543.pdf"]).load_data()
splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)
print(f"Number of nodes after splitting: {len(nodes)}")
print(nodes[0].get_content(metadata_mode="all"))

Number of nodes after splitting: 20
page_label: 1
file_name: 2505.10543.pdf
file_path: data\2505.10543.pdf
file_type: application/pdf
file_size: 328049
creation_date: 2025-05-21
last_modified_date: 2025-05-21

arXiv:2505.10543v1  [cs.AI]  15 May 2025
Towards a Deeper Understanding of Reasoning
Capabilities in Large Language Models
Annie Wong ,*, Thomas Bäck, Aske Plaat, Niki van Stein and Anna V . Kononova
Leiden Institute of Advanced Computer Science
Abstract. While large language models demonstrate impressive
performance on static benchmarks, the true potential of large lan-
guage models as self-learning and reasoning agents in dynamic envi-
ronments remains unclear. This study systematically evaluates the ef-
ficacy of self-reflection, heuristic mutation, and planning as prompt-
ing techniques to test the adaptive capabilities of agents. We con-
duct experiments with various open-source language models in dy-
namic environments and find that larger models generally outperform
smalle

### Configure LlamaIndex

In [3]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model="o4-mini")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

### Create indices

Vector and Summary indices are created below.

In [4]:
from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

### Create Query Engines.

Combines doing lookups on the indices and query the LLM.

In [6]:
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
vector_query_engine = vector_index.as_query_engine()

Lets get a summary of the document.

In [18]:
response = summary_query_engine.query("Please provide a concise sentence that summarizes the document.")
wrapped_text = textwrap.fill(str(response), width=140, replace_whitespace=False)
print(wrapped_text)

This paper systematically evaluates self-reflection, heuristic mutation, and planning prompts on open-source LLMs in dynamic decision-making
tasks, showing that while strategic prompting can boost smaller models, it yields unstable gains and highlights persistent reasoning,
planning, and spatial coordination limitations.


### Create The Tools 

Create the tools that provide metadata about the query engines.

In [12]:
from llama_index.core.tools import QueryEngineTool


summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to the paper."
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from the paper."
    ),
)

### Create A Router

Below, a router query engine and its selector are created.

In [13]:
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector


query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    verbose=True
)

### Submitting Queries

#### Request A Summary Of The Document

In [20]:
response = query_engine.query("Provide a summary of the document and its conclusions.")

[1;3;38;5;200mSelecting query engine 0: The question asks for a summary of the document and its conclusions, which aligns with choice 1's focus on summarization..
[0m

In [None]:
wrapped_text = textwrap.fill(str(response), width=150, replace_whitespace=False)
print(wrapped_text)

This paper investigates how well open-source large language models (LLMs) can learn, reason, and plan on the fly when faced with simple dynamic tasks,
without any additional fine-tuning.  The authors assemble a unified “agent” framework that, at each time step, can be equipped with one or more of
three in-context modules:  
• Self-Reflection, which reviews the sequence of past states, actions and rewards to suggest how to improve future
choices;  
• Oracle (heuristic mutation), which evolves rule-like heuristics across episodes via a simple evolutionary strategy;  
• Planner, which
looks ahead a few steps by simulating possible action sequences and scoring their expected cumulative rewards.  

They evaluate four models of
increasing size (8 B to 70 B parameters) on four “SmartPlay” environments:  
1. Two-armed bandit (exploration/exploitation)  
2. Rock-Paper-Scissors
(adapting to an opponent’s biased play)  
3. Tower of Hanoi with three disks (spatial planning)  
4. Messenger (navigat

Lets look at some of the metadata for the results. As expected, the router selected the summary query engine to service the query and, therefore, all 20 nodes of content were used. 

In [32]:
print(response.metadata['selector_result'].reason)
print(len(response.source_nodes))

The question asks for a summary of the document and its conclusions, which aligns with choice 1's focus on summarization.
20


#### Ask A More Specific Question

In [37]:
response = query_engine.query("What are the results of the Two-armed bandit evaluation?")

[1;3;38;5;200mSelecting query engine 1: The question seeks specific experimental results from the paper, so retrieving context is most relevant..
[0m

In [38]:
wrapped_text = textwrap.fill(str(response), width=150, replace_whitespace=False)
print(wrapped_text)

In the two-armed bandit experiments the key findings were:

• Baseline (simple count-and-exploit) wins for smaller and mid-sized models.  
  – LLAMA
3-8B: Baseline median ≈ 40.35 (CI 37.45–41.65) → Reflection+Planner drops to ≈ 34.00 (30.00–35.00).  
  – DEEPSEEK-R1-14B: Baseline ≈ 41.00
(40.55–41.40) → Reflection+Planner ≈ 32.05 (29.00–33.00).

• Only the largest model benefits from more complex prompting.  
  – LLAMA 3.3-70B:
Baseline max ≈ 41.90 → Reflection+Planner max ≈ 48.00.

• Why complexity hurts smaller models:  
  1. Extra prompt text dilutes the reward‐count
signal, lowering signal-to-noise.  
  2. Reflection/Oracle/Planner encourage continued exploration even when one arm is clearly better, causing the
agent to “overthink” and converge more slowly.

• Overall, sheer model size drives the strongest performance; in-context prompting alone cannot fully
bridge the gap.


As this was a specific question, the specific context query engine was used with only 2 nodes used to generate the LLM response. 

In [None]:
print(response.metadata['selector_result'].reason)
print(len(response.source_nodes))


The question seeks specific experimental results from the paper, so retrieving context is most relevant.
2
