# ReRanking + Hybrid BM25 + FAISS

* BM25 = classical information retrieval scoring function 
* "A document is relevant if it contains same words as queries especially rare words"
* BM25 scores documents based on **Tf-IDF**, **IDF**, **Document length normalization**
* Issue is does not understand meaning/semantics 
* BM25 looks for exact word matching, very precise
* FAISS through query->embeddings looks for documents which are similar in context/semantic
* FAISS -> semantic recall 
* BM25 + FAISS -> Great approach for real data

* Retrieval -> Which documents *might* be relevant ?
* Reranking -> Which of these retrieved is *actually* the best for this query ?

In [18]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter, CharacterTextSplitter, SentenceTransformersTokenTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_classic.schema import Document
from langchain_community.retrievers import BM25Retriever
from langchain_classic.retrievers import EnsembleRetriever
from langchain_classic.retrievers import ContextualCompressionRetriever 
from langchain_classic.retrievers.document_compressors import CrossEncoderReranker 
from langchain_community.cross_encoders import HuggingFaceCrossEncoder
from sentence_transformers import CrossEncoder
import json, os
from langchain_huggingface import HuggingFacePipeline 
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
from langchain_classic.chains import RetrievalQA

In [2]:
#Load pdf
pdf_path = "book1.pdf"
pdf = PyPDFLoader(pdf_path)
docs = pdf.load() 
print("Number of loaded documents are ", len(docs))
print("Example of one document \n ", docs[35].page_content[:250])

Number of loaded documents are  228
Example of one document 
  34 Harry Potter  
 
been trying to do. He shouted at Harry for about half an hour and 
then told him to go and make a cup of tea. Harry shuffled miser-
ably off into the kitchen, and by the time he got back, the post  
had arrived, right into Uncle V


In [3]:
#Create chunks 
chunk_size = 600 
chunk_overlap = 150 

splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size,
                                          chunk_overlap=chunk_overlap,
                                          separators=["\n\n", "\n", " ", ""]
                                          )
chunked_docs = splitter.split_documents(docs)
print("Chunks created ", len(chunked_docs))

Chunks created  1057


In [4]:
#Load embeddings model
embeddings_model = HuggingFaceEmbeddings(model_name = "KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5",
                                         model_kwargs = {"device" : "cuda"},
                                         encode_kwargs = {"normalize_embeddings" : True})

#Create vector store from embeddings model 
vectorstore = FAISS.from_documents(chunked_docs, embedding=embeddings_model)
vectorstore.save_local("faiss_book1_index")

In [5]:
# Creating retriever for chain operations 
faiss_retriever = vectorstore.as_retriever(search_type="similarity",
                                           search_kwargs = {"k":3})

#Load BM25 Retriever
bm25_retriever = BM25Retriever.from_documents(chunked_docs)
bm25_retriever.k =5 

In [6]:
hybrid_retriever = EnsembleRetriever(retrievers=[bm25_retriever, faiss_retriever],
                                     weights = [0.4, 0.6] #FAISS Gets more weights -> check why
)

In [7]:
# Hybrid retriever test 

query = "What was HArry's house"
docs = hybrid_retriever.invoke(query)

for i in docs:
    print(i.page_content[0:100])

chasing him as usual when, as much to Harry’s surprise as anyone 
else’s, there he was sitting on th
was screaming! Harry snapped it shut, but the shriek went on and 
on, one high, unbroken, ear-splitt
mouldy blankets in the second room and made up a bed for 
Dudley on the moth-eaten sofa. She and Unc
was dreading the dawn. What would happen when the rest of 
Gryffindor found out what they’d done? 
A
90 Harry Potter  
 
Sometimes, Harry noticed, the hat shouted out the house at 
once, but at others 
156 Harry Potter  
 
‘Why not?’ 
‘I dunno, I’ve just got a bad feeling about it – and anyway, 
you’v
tank had vanished. The great snak e was uncoiling itself rapidly, 
slithering out on to the floor – 
will have classes with the rest of your house, sleep in your house 
dormitory and spend free time in


Only the retrieval has been augmented with embedding + BME25 scoring

In [8]:
#Create the reranker
reranker_model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-v2-m3",
                                         model_kwargs = {"device" : "cuda"})
reranker =  CrossEncoderReranker(model = reranker_model,
                                top_n=3)

In [10]:
#Wrap hybrid retriever 
compression_retriever = ContextualCompressionRetriever(base_retriever=hybrid_retriever,
                                                       base_compressor=reranker)

In [21]:
query = "Did Harry win his Quidditch match ?"
docs = compression_retriever.invoke(query)
for i in docs:
    print(i.page_content[:150])

The Quidditch season had begun. On Saturday, Harry would be 
playing in his first match after weeks of training: Gryffindor ver-
sus Slytherin. If Gry
‘Honestly, Hermione, you think all teachers are saints or some-
thing,’ snapped Ron. ‘I’m with Harry. I wouldn’t put anything past 
Snape. But what’s 
‘Ron! Ron! Where are you? The game’s over! Harry’s won! 
We’ve won! Gryffindor are in the lead!’ shrieked Hermione, danc-
ing up and down on her seat 


In [17]:
# Loading the llm model to implement on top of this RAG
model_name = "Qwen/Qwen3-0.6B" 

tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir="models")
model = AutoModelForCausalLM.from_pretrained(model_name, 
                                             device_map="cuda",
                                             cache_dir="models",
                                             dtype=torch.bfloat16)

pipe = pipeline("text-generation",
                model=model,
                tokenizer = tokenizer,
                max_new_tokens = 32,
                temperature = 0.6)

llm = HuggingFacePipeline(pipeline=pipe)

Device set to use cuda


In [19]:
qa_chain = RetrievalQA.from_chain_type(llm=llm, 
                                       chain_type="stuff", 
                                       retriever=compression_retriever,
                                       return_source_documents=True)

In [22]:
response = qa_chain.invoke({"query" : query})
print("Response : \n", response['result'])

Response : 
 Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

The Quidditch season had begun. On Saturday, Harry would be 
playing in his first match after weeks of training: Gryffindor ver-
sus Slytherin. If Gryffindor won, they would move up into second 
place in the House Championship. 
Hardly anyone had seen Harry play because Wood had decided 
that, as their secret weapon, Harry should be kept, well, secret.  
But the news that he was playing Seeker had leaked out somehow, 
and Harry didn’t know which was worse – people telling him he’d 
be brilliant or people telling him they’d be running around under-
neath him, holding a mattress.

‘Honestly, Hermione, you think all teachers are saints or some-
thing,’ snapped Ron. ‘I’m with Harry. I wouldn’t put anything past 
Snape. But what’s he after? What’s that dog guarding?’ 
Harry went to bed with his head  buzzing with the

User Query -> BM25 Retriever -> FAISS Retriever -> Score Fusion -> Cross Encoder Reranker -> Top-K Contexxt -> LLM -> Answer + Sources