In this notebook, we're gonna implement RAG using hybrid search,**First**,after splitting our documents into chunks,we'll create vector similarity retriever using their embeddings and also create keyword/syntatic retirever, then after that we combine the retrievers using Ensemble retriever in such a way that when a user query, all the outputs from the two retrievers will be considered and passed to the LLM for answer generation

## Installing dependancies

In [1]:
!pip install --quiet langchain langchain_community # popular framework for generative ai
%pip install --upgrade --quiet huggingface_hub
!pip install --quiet faiss-cpu # vectorstore
!pip install --quiet pypdf # loader in rag
!pip install --quiet langchain_huggingface
!pip install --quiet chromadb # vectorstore
!pip install --quiet langchain_core
!pip install --quiet rank_bm25

Note: you may need to restart the kernel to use updated packages.


In [2]:
from langchain.llms import HuggingFaceHub 
from langchain_huggingface import HuggingFaceEndpoint # for accessing huggingface models
from langchain_huggingface import HuggingFaceEmbeddings # embeding the documents in the vectorstore
from langchain_huggingface import ChatHuggingFace # chat model
from langchain.prompts import ChatPromptTemplate
from langchain_community.document_loaders import PyPDFLoader,PyMuPDFLoader
from langchain.text_splitter import CharacterTextSplitter,RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS,Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers import BM25Retriever
from langchain.retrievers import EnsembleRetriever

## Let's first load our data, it's a pdf format so we use PyPDFLoader

In [3]:
pdfloader = PyPDFLoader('/kaggle/input/intro-todatascience/datascience.pdf')
docs = pdfloader.load()
# their are a quiet number of document loaders, for more info check https://python.langchain.com/docs/integrations/document_loaders/

## We split our document into chunks

In [4]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=0)
texts = splitter.split_documents(docs)

## Now we create embeddings for vector-similarity search retriever

In [5]:
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
db = Chroma.from_documents(texts,embedding=embeddings)

In [6]:
vector_retriever = db.as_retriever(search_type='similarity',search_kwargs = {'k':5})

## Then we create our syntatic retriever, we use BM25Retriever
**BM25** is a probabilistic ranking function that scores documents based on their relevance to a given query. It is an improvement over the traditional TF-IDF (Term Frequency-Inverse Document Frequency) model, incorporating term saturation and document length normalization.
It looks at how often your search words appear in a document and considers the document’s length to provide the most relevant results.
It precomputes term frequencies, document lengths, and inverse document frequencies.
When a query is issued, BM25Retriever calculates the BM25 score for each document in the corpus.
The documents are ranked based on their BM25 scores.
The top k most relevant documents are returned.

![Online Image](https://www.kopp-online-marketing.com/wp-content/uploads/2024/05/Screenshot-2024-06-14-093457-e1718354274506.png)

In [7]:
keyword_retriever = BM25Retriever.from_documents(documents=texts,k=5)

## Now we combine the two retrievers using Ensemble Retriever
**Ensemble Retriever** is a retrieval mechanism that combines multiple retrievers to improve document retrieval accuracy. Instead of relying on a single retrieval method (like BM25 or dense retrieval), it aggregates results from different retrievers to enhance performance.

There are types od ensemble retrievers,common ones are **Rank Fusion (Reciprocal Rank Fusion - RRF)** and **Score Fusion (Weighted Average)**.

In this notebook we implement **Score Fusion (Weighted Average)**.

In [8]:
ensemble_retriever = EnsembleRetriever(
    retrievers=[keyword_retriever, vector_retriever],
    weights=[0.5, 0.5]
)

## We load our LLM model and define a promt to it

In [9]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
hf_tkn = user_secrets.get_secret("HUGGINGFACEHUB_API_TOKEN")

llm = HuggingFaceEndpoint(
    repo_id="HuggingFaceH4/zephyr-7b-beta",
    task="text-generation",
    max_new_tokens=512,
    do_sample=False,
    repetition_penalty=1.03,
    huggingfacehub_api_token=hf_tkn
)

chat_model = ChatHuggingFace(llm=llm)

tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

In [10]:
prompt_text = """You are an assistant who is an expert in question-answering tasks.
                Answer the following question using only the following pieces of 
                retrieved context.
                If the answer is not in the context, do not make up answers, just 
                say that you don't know.
                Keep the answer detailed and well formatted based on the 
                information from the context.
                Do not begin with 'Based on the provided context...' and capitalize
                the first letter.
                
                Question:
                {question}
                
                Context:
                {context}
                
                Answer:
            """

rag_prompt_template = ChatPromptTemplate.from_template(prompt_text)

In [14]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs) # for formatting the output
# # RunnablePassthrough allows us to pass the user's question to the prompt and model
retrieval_chain = (
                {"context":(ensemble_retriever | format_docs),"question":RunnablePassthrough()}
                | rag_prompt_template
                | chat_model
                | StrOutputParser()
                )

## Let's test our RAG, wow! it's pretty good

In [15]:
query = "Data science allows us to adopt four different strategies to explore the world using data,state them"
retrieval_chain.invoke(query)

'Data science allows us to explore the world using data through four different strategies: 1. Probing reality, where data can be gathered using passive or active methods; 2. Understanding people and the world, which is currently beyond the scope of most companies and individuals but is being heavily researched by large companies and governments in areas such as natural language understanding, computer vision, psychology, and neuroscience; 3. Predicting future events by using past data to reveal patterns and natural clusters that simplify problem-solving; and 4. Making informed decisions by adopting evidence-based approaches that consider all available data. These four strategies are not mutually exclusive, and they can be used individually or in combination with each other. The democratization of data analysis, facilitated by cloud computing and open-source development, has made it possible for individuals and small companies to access the same analytical tools and techniques previousl