
<a href="https://colab.research.google.com/github/edumunozsala/llamaindex-RAG-techniques/blob/main/reciprocal_rerank_fusion.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Reciprocal Rerank Fusion Retriever and Hybrid Search 

In this example, we walk through how you can combine retireval results from multiple queries and multiple indexes.

The retrieved nodes will be reranked according to the `Reciprocal Rerank Fusion` algorithm demonstrated in this [paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf). It provides an effecient method for rerranking retrieval results without excessive computation or reliance on external models.

*"Inthe search for such a method we came up with Reciprocal Rank Fusion (RRF) to serve as a baseline. We found that RRF, when used to combine the results of IR methods (including learning to rank), almost invariably improved on the best of the combined results. We also found that RRF consistently equaled or bettered other methods we tried, including established metaranking standards Condorcet Fuse and CombMNZ"*

Full credits go to the next repo on github for their [example implementation here](https://github.com/Raudaschl/rag-fusion).

In [None]:
import os
import openai

Load the API Keys:

In [1]:
from dotenv import load_dotenv

# Load the enviroment variables
load_dotenv()

True

In [12]:
# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes.
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.
import nest_asyncio

nest_asyncio.apply()

## Setup


If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

### Load the data

In [2]:
from pathlib import Path
from llama_index import download_loader

PDFReader = download_loader("PDFReader")

loader = PDFReader()
documents = loader.load_data(file=Path('./data/Attention is all you need.pdf'))

Next, we will setup a vector index over the documentation.

In [3]:
from llama_index import VectorStoreIndex, ServiceContext

service_context = ServiceContext.from_defaults(chunk_size=512)

index = VectorStoreIndex.from_documents(
    documents, service_context=service_context
)

## Create a Hybrid Fusion Retriever

In this step, we fuse our index with a BM25 based retriever. This will enable us to capture both semantic relations and keywords in our input queries.

Since both of these retrievers calculate a score, we can use the reciprocal rerank algorithm to re-sort our nodes without using an additional models or excessive computation.

This setup will also query 4 times, once with your original query, and generate 3 more queries.

By default, it uses the following prompt to generate extra queries:

```python
QUERY_GEN_PROMPT = (
    "You are a helpful assistant that generates multiple search queries based on a "
    "single input query. Generate {num_queries} search queries, one on each line, "
    "related to the following input query:\n"
    "Query: {query}\n"
    "Queries:\n"
)
```

First, we create our retrievers. Each will retrieve the top-2 most similar nodes:

In [11]:
from llama_index.retrievers import BM25Retriever
from llama_index.retrievers import QueryFusionRetriever

**Note: You must install package rank_bm25**

In [7]:
vector_retriever = index.as_retriever(similarity_top_k=2)

bm25_retriever = BM25Retriever.from_defaults(
    docstore=index.docstore, similarity_top_k=2
)

Next, we can create our fusion retriever, which well return the top-2 most similar nodes from the 4 returned nodes from the retrievers:

In [13]:
# Create the Fusion Retriever
retriever = QueryFusionRetriever(
    [vector_retriever, bm25_retriever],
    similarity_top_k=2,
    num_queries=4,  # set this to 1 to disable query generation
    mode="reciprocal_rerank",
    use_async=True,
    verbose=True,
    # query_gen_prompt="...",  # we could override the query generation prompt here
)

Now, we can test our Fusion Retriever

In [14]:
nodes_with_scores = retriever.retrieve(
    "How are transformers related to convolutional neural networks?"
)

Generated queries:
1. What is the role of transformers in convolutional neural networks?
2. Can transformers be used as an alternative to convolutional neural networks?
3. How do transformers and convolutional neural networks differ in their approach to processing data?


In [15]:
for node in nodes_with_scores:
    print(f"Score: {node.score:.2f} - {node.text}...\n-----\n")

Score: 0.08 - The Transformer allows for signiﬁcantly more parallelization and can reach a new state of the art in
translation quality after being trained for as little as twelve hours on eight P100 GPUs.
2 Background
The goal of reducing sequential computation also forms the foundation of the Extended Neural GPU
[16], ByteNet [ 18] and ConvS2S [ 9], all of which use convolutional neural networks as basic building
block, computing hidden representations in parallel for all input and output positions. In these models,
the number of operations required to relate signals from two arbitrary input or output positions grows
in the distance between positions, linearly for ConvS2S and logarithmically for ByteNet. This makes
it more difﬁcult to learn dependencies between distant positions [ 12]. In the Transformer this is
reduced to a constant number of operations, albeit at the cost of reduced effective resolution due
to averaging attention-weighted positions, an effect we counteract with Mult

## Use in a Query Engine!

Now, we can plug our retriever into a query engine to synthesize natural language responses.

In [16]:
from llama_index.query_engine import RetrieverQueryEngine

query_engine = RetrieverQueryEngine.from_args(retriever)

In [17]:
response = query_engine.query("How are transformers related to convolutional neural networks?")

Generated queries:
1. What is the role of transformers in convolutional neural networks?
2. How do transformers enhance the performance of convolutional neural networks?
3. Can transformers be used as an alternative to convolutional neural networks?


In [18]:
from llama_index.response.notebook_utils import display_response

display_response(response)

**`Final Response:`** Transformers and convolutional neural networks (CNNs) are both used in the field of deep learning, but they have different architectures and purposes. Transformers rely entirely on self-attention mechanisms to compute representations of input and output sequences, while CNNs use convolutional layers to extract local features from input data. Transformers are designed to capture global dependencies between positions in a sequence, while CNNs are well-suited for tasks that require spatial hierarchies and translation invariance. While both models have their strengths and weaknesses, they are not directly related in terms of their architecture or functionality.