# Hands-on 1: QueryFusionRetriever

### 🎯 Problem
Given the Paul Graham’s essay build an intelligent QueryFusionRetriever that synthesizes VectorStoreIndex and BM25Retriever to precisely extract the most relevant contextual nodes to answer the question:​

"Why was the author in Florence?"

### 🔍 Suggested tasks:
- 🔤 Create & validate a VectorStoreIndex​
- 🔤 Create & validate a BM25Retriever​
- 🚀 Create a QueryFusionRetriever merging vector and BM25 retrieval strategies​
- 📊 Compare Retrieval Strategies

## Code

In [None]:
# if running on colab uncomment the those lines
%pip install httpx==0.27.2
%pip install llama-index==0.12.3
%pip install llama-index-retrievers-bm25>=0.50
%pip install openai==1.57.0
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

In [None]:
import os
os.environ["OPENAI_API_KEY"] = "sk-" # set your openai api key here

In [None]:
import nest_asyncio

nest_asyncio.apply()

In [None]:
from llama_index.core import SimpleDirectoryReader
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.base.base_retriever import BaseRetriever
from rich import print as rprint
from llama_index.core.schema import MetadataMode
from llama_index.retrievers.bm25 import BM25Retriever
from llama_index.core.retrievers import QueryFusionRetriever
from llama_index.core.retrievers.fusion_retriever import FUSION_MODES

In [None]:
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
splitter = SentenceSplitter(chunk_size=256)

index = VectorStoreIndex.from_documents(
    documents, transformations=[splitter], show_progress=True
)

In [None]:
QUESTION = "Why the author was in Florance?"

### Vector retrieval

In [None]:
def init_vector_retriever(index: VectorStoreIndex, similarity_top_k:int) -> BaseRetriever:
    # write here the code to initialize the vector retriever
    # ...
    raise NotImplementedError("Vector retriever initialization not implemented")

In [None]:
vector_retriever = init_vector_retriever(index, similarity_top_k=3)
vector_nodes = vector_retriever.retrieve(QUESTION)

for node in vector_nodes:
    rprint(node.get_content(
        metadata_mode=MetadataMode.LLM
    ))

### Setup bm25 retriever

In [None]:
def init_base25_retriever(index: VectorStoreIndex, similarity_top_k:int) -> BM25Retriever:
    # write here the code to initialize the BM25Retriever
    # ...
    raise NotImplementedError("BM25Retriever is not implemented yet")

In [None]:

bm25_retriever = init_base25_retriever(index, similarity_top_k=3)
bm25_nodes = bm25_retriever.retrieve(QUESTION)
for node in bm25_nodes:
    rprint(node.get_content(
        metadata_mode=MetadataMode.LLM
    ))

### Merge the results of the two retrievers

In [None]:
def init_query_fusion_retriever(retrievers: list[BaseRetriever], similarity_top_k:int, mode: FUSION_MODES, num_queries:int = 1) -> QueryFusionRetriever:
    # write here the code to initialize the QueryFusionRetriever
    # ...
    return retriever

In [None]:
retriever = init_query_fusion_retriever([vector_retriever, bm25_retriever], similarity_top_k=3, mode=FUSION_MODES.RECIPROCAL_RANK)
nodes = retriever.retrieve(QUESTION)
for node in nodes:
    rprint(node.get_content(
        metadata_mode=MetadataMode.LLM
    ))

Compare the results of the two retrievers and the merged results!