# Ensemble Retrieval


When building retrieval-augmented generation (RAG) applications, various retrieval parameters and strategies must be considered, such as chunk size, vector search, keyword search, and hybrid search.

**Concept**: What if we could simultaneously try multiple strategies and use an AI/reranker/LLM to prune the results?

This approach serves two main purposes:

1. **Enhanced Retrieval**: By pooling results from multiple strategies, we can achieve better (though more costly) retrieved results, assuming the reranker is effective.

2. **Benchmarking**: It provides a way to benchmark different retrieval strategies against each other with respect to the reranker.

## Key Purposes of the Ensemble Retriever

1. **Multi-Strategy Retrieval**: Try multiple retrieval strategies simultaneously, such as different chunk sizes and index types (vector, keyword, hybrid search). This allows for comparing the effectiveness of various approaches in one shot.

2. **Pooling Results**: Pool results from different retrieval strategies, leading to better overall retrieved results, assuming an effective reranker is used to prune the ensemble results.

3. **Benchmarking Performance**: Benchmark and compare the performance of different retrieval strategies against each other with respect to the reranker.


In [None]:
%%capture
!pip install llama-index==0.10.37 llama-index-embeddings-openai==0.1.9 qdrant-client==1.9.1 llama-index-vector-stores-qdrant==0.2.8 llama-index-llms-openai==0.1.19

In [2]:
import os
import sys
from getpass import getpass
import nest_asyncio

from IPython.display import Markdown, display

from dotenv import load_dotenv

nest_asyncio.apply()

load_dotenv("")

sys.path.append('../helpers')

from utils import setup_llm, setup_embed_model, setup_vector_store

In [3]:
OPENAI_API_KEY = os.environ['OPENAI_API_KEY'] or getpass("Enter your OpenAI API key: ")

In [18]:
CO_API_KEY = os.environ['CO_API_KEY'] or getpass("Enter your Cohere API key: ")

In [4]:
# QDRANT_URL = os.environ['QDRANT_URL'] or getpass("Enter your Qdrant URL:")

QDRANT_URL=":memory:"

In [5]:
QDRANT_API_KEY = os.environ['QDRANT_API_KEY'] or  getpass("Enter your Qdrant API Key:")

In [6]:
from llama_index.core.settings import Settings
from utils import setup_llm, setup_embed_model

setup_llm(
    provider="openai",
    api_key=OPENAI_API_KEY, 
    model="gpt-4o", 
    temperature=0.75, 
    system_prompt="""Use ONLY the provided context and generate a complete, coherent answer to the user's query. 
    Your response must be grounded in the provided context and relevant to the essence of the user's query.
    """
    )

setup_embed_model(
    provider="openai", 
    model="text-embedding-3-small",
    api_key=OPENAI_API_KEY
    )

In [7]:
import random
from llama_index.core.storage.docstore import SimpleDocumentStore
from utils import get_documents_from_docstore, group_documents_by_author, sample_documents

documents = get_documents_from_docstore("../data/words-of-the-senpais")

random.seed(42)

documents_by_author = group_documents_by_author(documents)

senpai_documents = sample_documents(documents_by_author, num_samples=10)

In [12]:
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import VectorStoreIndex

chunk_sizes = [128, 256, 512, 1024]

nodes_list = []

vector_indices = []

for chunk_size in chunk_sizes:
    print(f"Chunk Size: {chunk_size}")
    splitter = SentenceSplitter(chunk_size=chunk_size, chunk_overlap=8)
    nodes = splitter.get_nodes_from_documents(senpai_documents)

    # add chunk size to nodes to track later
    for node in nodes:
        node.metadata["chunk_size"] = chunk_size
        node.excluded_embed_metadata_keys = ["chunk_size"]
        node.excluded_llm_metadata_keys = ["chunk_size"]

    nodes_list.append(nodes)

    # build vector index
    vector_index = VectorStoreIndex(nodes)
    vector_indices.append(vector_index)

Chunk Size: 128
Chunk Size: 256
Chunk Size: 512
Chunk Size: 1024


## How to Create an Ensemble Retriever

1. **Define Index Nodes**: Create separate index nodes for each retrieval strategy you want to try (e.g., retrievers for different chunk sizes).

2. **Create a Summary Index**: Combine all the index nodes into a single summary index.

3. **Set Up a Recursive Retriever**: Define a recursive retriever with the root node being the summary index retriever. This will fetch results from all the underlying retrievers when a query is run.

4. **Define a Reranker**: Use a reranker (e.g., LLM-based, Cohere, Sentence Transformer) to process and prune the final retrieved set of nodes.

5. **Integrate with a Query Engine**: Create a retriever query engine that combines the recursive retriever and the reranker.

6. **Run Queries**: Execute queries through the retriever query engine to get results that leverage the ensemble of retrievers, postprocessed by the reranker.


### The code below...

•  **Define Index Nodes**: Create a separate `IndexNode` for the vector retriever corresponding to each chunk size (e.g., a retriever for chunk size 128, another for chunk size 256, etc.).



In [13]:
from llama_index.core.schema import IndexNode

retriever_dict = {}

retriever_nodes = []

for chunk_size, vector_index in zip(chunk_sizes, vector_indices):
    node_id = f"chunk_{chunk_size}"
    node = IndexNode(
        text=(
            "Retrieves relevant context from the Llama 2 paper (chunk size"
            f" {chunk_size})"
        ),
        index_id=node_id,
    )
    retriever_nodes.append(node)
    retriever_dict[node_id] = vector_index.as_retriever()

**Aggregate Index Nodes**: Combine all `IndexNodes` into a single `SummaryIndex`. 

[`SummaryIndex`](https://github.com/run-llama/llama_index/blob/7849b1a851d88ee28e1bfd05d19f18e40d5b8e10/llama-index-core/llama_index/core/indices/list/base.py#L33)  is a simple list-based data structure. During index construction, `SummaryIndex` takes in a dataset of text documents as input, chunks them up into smaller document chunks, and concatenates them into a list.


*Index Construction*

1. **Chunking**: The document texts are divided into chunks.
2. **Node Creation**: Each chunk is converted into a node.
3. **Storage**: These nodes are stored in a list.

*Query Time*

An initial answer to the query is constructed using the first text chunk. The answer is then refined through feeding in subsequent text chunks as context. Refinement could mean keeping the original answer, making small edits to the original answer, or rewriting the original answer completely.

1. **Iteration**: The summary index iterates through the nodes.
2. **Optional Filtering**: Some filter parameters can be applied.
3. **Answer Synthesis**: An answer is synthesized from all the nodes.

When this retriever is called, **all** nodes are returned.

**Recursive Retriever Setup**: Define a Recursive Retriever with the root node being the summary index retriever. This retriever will first fetch all nodes from the summary index and then recursively call the vector retriever for each chunk size.

1. **Recursive Exploration**: The retriever will follow links from nodes to other retrievers or query engines.

2. **IndexNode Handling**: For any retrieved nodes that are `IndexNodes`, it will:
    - Explore the linked retriever or query engine.
    - Query the linked retriever or query engine.


In [14]:
from llama_index.core.retrievers import RecursiveRetriever
from llama_index.core import SummaryIndex

summary_index = SummaryIndex(retriever_nodes)

retriever = RecursiveRetriever(
    root_id="root",
    retriever_dict={"root": summary_index.as_retriever(), **retriever_dict},
)

In [15]:
nodes = await retriever.aretrieve(
    "How can I effectively identify and leverage new opportunities in the tech industry?"
)

In [16]:
print(f"Number of nodes: {len(nodes)}")
for node in nodes:
    print(node.node.metadata["chunk_size"])
    print(node.node.get_text())

Number of nodes: 8
128
Think about what product or service society wants but does not yet know how to get. You want to become the person who delivers it and delivers it at scale. That is really the challenge of how to make money. Now, the problem is becoming good at whatever it is. It moves around from generation to generation, but a lot of it happens to be in technology.
128
You are waiting for your moment when something emerges in the world, they need a skill set, and youre uniquely qualified. You build your brand in the meantime on Twitter, on YouTube, and by giving away free work. You make a name for yourself, and you take some risk in the process. When it is time to move
256
It moves around from generation to generation, but a lot of it happens to be in technology. You are waiting for your moment when something emerges in the world, they need a skill set, and youre uniquely qualified. You build your brand in the meantime on Twitter, on YouTube, and by giving away free work. You ma

**Rerank Final Results**: Rerank the results obtained from all vector retrievers.


In [19]:
from llama_index.postprocessor.cohere_rerank import CohereRerank

reranker = CohereRerank(top_n=5, api_key=CO_API_KEY)

Define retriever query engine to integrate the recursive retriever + reranker together.

In [20]:
from llama_index.core.query_engine import RetrieverQueryEngine

query_engine = RetrieverQueryEngine(retriever, node_postprocessors=[reranker])

In [23]:
response = query_engine.query(
    "How can I effectively identify and leverage new opportunities in the tech industry?"
)

In [24]:
from llama_index.core.response.notebook_utils import display_response

display_response(
    response, show_source=True, source_length=500, show_source_metadata=True
)

**`Final Response:`** To effectively identify and leverage new opportunities in the tech industry, focus on acquiring unique skills that are in demand but not easily replicable. Stay vigilant for emerging trends and technologies that require these specialized skills. In the meantime, build your personal brand through platforms like Twitter and YouTube, and offer free work to gain visibility and reputation. By doing so, you position yourself as an expert who can provide solutions that are not readily available elsewhere. When the right opportunity arises, you will be ready to take advantage of it and deliver at scale.

---

**`Source Node 1/5`**

**Node ID:** 39d200ea-1bdf-4b97-a9f3-d48dfc26ac69<br>**Similarity:** 0.37433314<br>**Text:** You are waiting for your moment when something emerges in the world, they need a skill set, and youre uniquely qualified. You build your brand in the meantime on Twitter, on YouTube, and by giving away free work. You make a name for yourself, and you take some risk in the process. When it is time to move<br>**Metadata:** {'page_number': 27, 'file_name': '../data/almanack_of_naval_ravikant.pdf', 'title': 'The Almanack of Naval Ravikant', 'author': 'Naval Ravikant', 'chunk_size': 128}<br>

---

**`Source Node 2/5`**

**Node ID:** b3570882-0fa8-4483-b34d-f1b24323ccf4<br>**Similarity:** 0.15546273<br>**Text:** It moves around from generation to generation, but a lot of it happens to be in technology. You are waiting for your moment when something emerges in the world, they need a skill set, and youre uniquely qualified. You build your brand in the meantime on Twitter, on YouTube, and by giving away free work. You make a name for yourself, and you take some risk in the process. When it is time to move<br>**Metadata:** {'page_number': 27, 'file_name': '../data/almanack_of_naval_ravikant.pdf', 'title': 'The Almanack of Naval Ravikant', 'author': 'Naval Ravikant', 'chunk_size': 256}<br>

---

**`Source Node 3/5`**

**Node ID:** e44026ab-e457-4527-b4d8-c6656472393c<br>**Similarity:** 0.09351231<br>**Text:** Youre more likely to have skills society does not yet know how to train other people to do. If someone can train other people how to do something, then they can replace you. If they can replace you, then they dont have to pay you a lot. You want to know how to do something other people dont know how to do at the time period when those skills are in demand. If they can train you to do it, then eventually they will train a computer to do it. You get rewarded by society for giving it what it wan...<br>**Metadata:** {'page_number': 27, 'file_name': '../data/almanack_of_naval_ravikant.pdf', 'title': 'The Almanack of Naval Ravikant', 'author': 'Naval Ravikant', 'chunk_size': 256}<br>

---

**`Source Node 4/5`**

**Node ID:** b323849e-faa2-4021-80e1-0ae2e397fd69<br>**Similarity:** 0.07382971<br>**Text:** Think about what product or service society wants but does not yet know how to get. You want to become the person who delivers it and delivers it at scale. That is really the challenge of how to make money. Now, the problem is becoming good at whatever it is. It moves around from generation to generation, but a lot of it happens to be in technology.<br>**Metadata:** {'page_number': 27, 'file_name': '../data/almanack_of_naval_ravikant.pdf', 'title': 'The Almanack of Naval Ravikant', 'author': 'Naval Ravikant', 'chunk_size': 128}<br>

---

**`Source Node 5/5`**

**Node ID:** e35d4bea-471b-4768-a011-668d6a2694e3<br>**Similarity:** 0.06804042<br>**Text:** Youre more likely to have skills society does not yet know how to train other people to do. If someone can train other people how to do something, then they can replace you. If they can replace you, then they dont have to pay you a lot. You want to know how to do something other people dont know how to do at the time period when those skills are in demand. If they can train you to do it, then eventually they will train a computer to do it. You get rewarded by society for giving it what it wan...<br>**Metadata:** {'page_number': 27, 'file_name': '../data/almanack_of_naval_ravikant.pdf', 'title': 'The Almanack of Naval Ravikant', 'author': 'Naval Ravikant', 'chunk_size': 512}<br>

### Analyzing the Relative Importance of Each Chunk

An interesting feature of ensemble-based retrieval is that reranking allows us to assess the importance of each chunk size based on their order in the final retrieved set. 

For example, if certain chunk sizes consistently rank near the top, they are likely more relevant to the query.

### Purpose

The goal is to evaluate the relative importance or relevance of different metadata values (such as chunk sizes) by analyzing their ranks in the list. 

A higher MRR indicates that a specific metadata value tends to appear earlier in the ranking, implying higher relevance or importance.


1. **Input Parameters**:
    - **metadata_values**: A list of unique values for a specific metadata key that you want to evaluate (e.g., different chunk sizes).

    - **metadata_key**: The specific metadata key to check in each node (e.g., "chunk_size").

    - **source_nodes**: A ranked list of nodes, each containing metadata.

2. **Process**:

    - For each metadata value, iterate through the ranked list of source nodes.

    - Identify the position of the first occurrence of the metadata value in the list.

    - Compute the reciprocal rank (1 divided by the position index + 1) for that value.

    - Store these reciprocal ranks in a dictionary.

3. **Output**:

    - Convert the dictionary of MRR values into a Pandas DataFrame.
    
    - Return the DataFrame, which displays the MRR for each metadata value.



In [25]:
# compute the average precision for each chunk size based on positioning in combined ranking
from collections import defaultdict
import pandas as pd


def mrr_all(metadata_values, metadata_key, source_nodes):
    # source nodes is a ranked list
    # go through each value, find out positioning in source_nodes
    value_to_mrr_dict = {}
    for metadata_value in metadata_values:
        mrr = 0
        for idx, source_node in enumerate(source_nodes):
            if source_node.node.metadata[metadata_key] == metadata_value:
                mrr = 1 / (idx + 1)
                break
            else:
                continue

        # normalize AP, set in dict
        value_to_mrr_dict[metadata_value] = mrr

    df = pd.DataFrame(value_to_mrr_dict, index=["MRR"])
    df.style.set_caption("Mean Reciprocal Rank")
    return df

In [26]:
# Compute the Mean Reciprocal Rank for each chunk size (higher is better)
# we can see that chunk size of 128 has the highest ranked results.
print("Mean Reciprocal Rank for each Chunk Size")
mrr_all(chunk_sizes, "chunk_size", response.source_nodes)

Mean Reciprocal Rank for each Chunk Size


Unnamed: 0,128,256,512,1024
MRR,1.0,0.5,0.2,0
