#	What is Re-ranking?

Re-ranking in RAG is a critical process that refines and reorders the initially retrieved information before it's fed into a generative AI model. It acts as a smart filter, ensuring that the most relevant and high-quality content is prioritized for the generation task.

# Key aspects:
1. Relevance optimization: Improves the quality of information used by the LLM.
2. Intelligent sorting: Uses advanced algorithms to reassess and reorder retrieved passages.
3. Context consideration: Takes into account the query intent and user context.
4. Integration point: Sits between retrieval and generation components in the RAG pipeline.

By effectively re-ranking retrieved information, RAG systems can significantly enhance the accuracy, relevance, and overall quality of the generated AI responses.

# Re-ranking RAG Implementation:

1. **Initial Retrieval:** We use the Chroma vector store's retriever to get relevant documents.
2. **Re-ranking:** We employ FlashRank (via FlashrankRerank) to re-rank the initially retrieved documents.
3. **Context Formation:** We combine the top re-ranked documents into a single context string.
4. **Response Generation:** Using the Gemini Pro model, we generate a final response based on the re-ranked context and the query.

# Setup

1. **[LLM](https://deepmind.google/technologies/gemini/pro/):** Google's free gemini-pro api endpoint ([Google's API Key](https://console.cloud.google.com/apis/credentials))
2. **[Vector Store](https://www.pinecone.io/learn/vector-database/):** [ChromaDB](https://www.trychroma.com/)
3. **[Embedding Model](https://qdrant.tech/articles/what-are-embeddings/):** [nomic-embed-text-v1.5](https://www.nomic.ai/blog/posts/nomic-embed-text-v1)
4. **[LLM Framework](https://python.langchain.com/v0.2/docs/introduction/):** LangChain
5. **[Huggingface API Key](https://huggingface.co/settings/tokens)**

# Install required libraries

In [1]:
!pip install -q -U \
     Sentence-transformers==3.0.1 \
     langchain==0.2.11 \
     langchain-google-genai==1.0.7 \
     langchain-chroma==0.1.2 \
     langchain-community==0.2.10 \
     langchain-huggingface==0.0.3 \
     einops==0.8.0 \
     flashrank==0.2.8

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m227.1/227.1 kB[0m [31m12.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m990.3/990.3 kB[0m [31m25.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m58.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.2/43.2 kB[0m [31m2.0 MB/s[0m eta [36m0:00

# Import related libraries related to Langchain, HuggingfaceEmbedding

In [17]:
# Import Libraries
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_google_genai import (
    ChatGoogleGenerativeAI,
    HarmBlockThreshold,
    HarmCategory,
)
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import WebBaseLoader
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors.flashrank_rerank import FlashrankRerank
from langchain.schema import HumanMessage, SystemMessage

In [3]:
import getpass
import os

# Provide Google API Key. You can create Google API key at following lin

[Google Gemini-Pro API Creation Link](https://console.cloud.google.com/apis/credentials)

[YouTube Video](https://www.youtube.com/watch?v=ZHX7zxvDfoc)



In [4]:
os.environ["GOOGLE_API_KEY"] = getpass.getpass()

··········


# Provide Huggingface API Key. You can create Huggingface API key at following lin

[Higgingface API Creation Link](https://huggingface.co/settings/tokens)




In [5]:
os.environ["HF_TOKEN"] = getpass.getpass()

··········


In [6]:
# Helper function for printing docs
def pretty_print_docs(docs):
    # Iterate through each document and format the output
    for i, d in enumerate(docs):
        print(f"{'-' * 50}\nDocument {i + 1}:")
        print(f"Content:\n{d.page_content}\n")
        print("Metadata:")
        for key, value in d.metadata.items():
            print(f"  {key}: {value}")
    print(f"{'-' * 50}")  # Final separator for clarity

# Example usage
# Assuming `docs` is a list of Document objects

# Step 1: Load and preprocess data code

In [14]:
def load_and_process_data(url):
    loader = WebBaseLoader(url)
    data = loader.load()

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    chunks = text_splitter.split_documents(data)

    for idx, chunk in enumerate(chunks):
        chunk.metadata["id"] = idx

    return chunks

# Step 2: Create vector store and BM25 retriever

In [8]:
def create_vector_store(chunks):
    embeddings = HuggingFaceEmbeddings(model_name="nomic-ai/nomic-embed-text-v1.5", model_kwargs = {'trust_remote_code': True})
    vectorstore = Chroma.from_documents(chunks, embeddings)
    return vectorstore

# Step 3: Re-ranking RAG related code

1. **Initial Retrieval:** We use the Chroma vector store's retriever to get relevant documents.
2. **Re-ranking:** We employ FlashRank (via FlashrankRerank) to re-rank the initially retrieved documents.
3. **Context Formation:** We combine the top re-ranked documents into a single context string.
4. **Response Generation:** Using the Gemini Pro model, we generate a final response based on the re-ranked context and the query.

In [20]:
def reranking_rag(query, vectorstore, llm):
    # Set up the document compressor using FlashRank
    compressor = FlashrankRerank()

    # Create a compression retriever
    compression_retriever = ContextualCompressionRetriever(
        base_compressor=compressor,
        base_retriever=vectorstore.as_retriever()
    )

    # Retrieve and re-rank documents
    docs = compression_retriever.get_relevant_documents(query)
    context = "\n\n".join([doc.page_content for doc in docs])

    # Generate response
    prompt = f"{context}\n\nQuestion: {query}\nAnswer:"
    response = llm.invoke(prompt)

    return {
        "query": query,
        "final_answer": response.content,
        "retrieval_method": "Re-ranking with FlashRank"
    }

# Step 4: Create chunk of web data to Chroma Vector Store

In [16]:
# Initialize the gemini-pro language model with specified settings (Change temeprature  and other parameters as per your requirement)
llm = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0.3, safety_settings={
          HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
        },)

# Load and process data
url = "https://en.wikipedia.org/wiki/Artificial_intelligence"
chunks = load_and_process_data(url)

# Create vector store
vectorstore  = create_vector_store(chunks)



# Step 5: Run Re-ranking RAG

This implementation shows the key parts of Re-ranking RAG:

1. Initial broad retrieval of potentially relevant documents
2. Re-ranking of retrieved documents to prioritize the most relevant ones
3. Generation of a response using the re-ranked and refined context

In [21]:
# Example queries
queries = [
        "What are the main applications of artificial intelligence in healthcare?",
        "Explain the concept of machine learning and its relationship to AI.",
        "Discuss the ethical implications of AI in decision-making processes."
    ]

# Run Re-ranking RAG for each query
for query in queries:
  print(f"\nQuery: {query}")
  result = reranking_rag(query, vectorstore, llm)
  print("Final Answer:")
  print(result["final_answer"])
  print("\nRetrieval Method:")
  print(result["retrieval_method"])



Query: What are the main applications of artificial intelligence in healthcare?
Final Answer:
The main applications of artificial intelligence in healthcare are to increase patient care and quality of life.

Retrieval Method:
Re-ranking with FlashRank

Query: Explain the concept of machine learning and its relationship to AI.
Final Answer:
Machine learning is the study of programs that can improve their performance on a given task automatically. It has been a part of AI from the beginning. Machine learning is a subfield of AI that focuses on developing algorithms that can learn from data. These algorithms can be used to solve a wide variety of problems, such as image recognition, natural language processing, and speech recognition.

Retrieval Method:
Re-ranking with FlashRank

Query: Discuss the ethical implications of AI in decision-making processes.
Final Answer:
The provided text does not discuss the ethical implications of AI in decision-making processes.

Retrieval Method:
Re-ran

# Demonstrate retrieval and re-ranking

In [22]:
demo_query = "Explain the concept of machine learning and its relationship to AI"
print(f"\nDemonstration Query: {demo_query}")

# Retrieve documents before re-ranking
docs_before = vectorstore.similarity_search(demo_query)
print("\nDocuments before re-ranking:")
pretty_print_docs(docs_before)

# Retrieve and re-rank documents
compressor = FlashrankRerank()
compression_retriever = ContextualCompressionRetriever(
        base_compressor=compressor,
        base_retriever=vectorstore.as_retriever()
    )
docs_after = compression_retriever.get_relevant_documents(demo_query)
print("\nDocuments after re-ranking:")
pretty_print_docs(docs_after)


Demonstration Query: Explain the concept of machine learning and its relationship to AI

Documents before re-ranking:
--------------------------------------------------
Document 1:
Content:
Learning
Machine learning is the study of programs that can improve their performance on a given task automatically.[46] It has been a part of AI from the beginning.[e]

Metadata:
  language: en
  source: https://en.wikipedia.org/wiki/Artificial_intelligence
  title: Artificial intelligence - Wikipedia
--------------------------------------------------
Document 2:
Content:
Learning
Machine learning is the study of programs that can improve their performance on a given task automatically.[46] It has been a part of AI from the beginning.[e]

Metadata:
  id: 39
  language: en
  source: https://en.wikipedia.org/wiki/Artificial_intelligence
  title: Artificial intelligence - Wikipedia
--------------------------------------------------
Document 3:
Content:
No established unifying theory or paradigm has g