<a href="https://colab.research.google.com/drive/1Z_FEocd1xLg1KEh8FCs8ho61H5ItJwHA?usp=sharing" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>

###	What is Re-ranking?

Re-ranking in RAG is a critical process that refines and reorders the initially retrieved information before it's fed into a generative AI model. It acts as a smart filter, ensuring that the most relevant and high-quality content is prioritized for the generation task.

### Key aspects:
1. Relevance optimization: Improves the quality of information used by the LLM.
2. Intelligent sorting: Uses advanced algorithms to reassess and reorder retrieved passages.
3. Context consideration: Takes into account the query intent and user context.
4. Integration point: Sits between retrieval and generation components in the RAG pipeline.

By effectively re-ranking retrieved information, RAG systems can significantly enhance the accuracy, relevance, and overall quality of the generated AI responses.


### Setup

1. **[LLM](https://groq.com/):** Groq's free Open source LLM endpoints([Groq API Key](https://console.groq.com/keys))
2. **[Vector Store](https://www.pinecone.io/learn/vector-database/):** [ChromaDB](https://www.trychroma.com/)
3. **[Embedding Model](https://qdrant.tech/articles/what-are-embeddings/):** [nomic-embed-text-v1.5](https://www.nomic.ai/blog/posts/nomic-embed-text-v1)
4. **[LLM Framework](https://python.langchain.com/v0.2/docs/introduction/):** LangChain
5. **[Huggingface API Key](https://huggingface.co/settings/tokens)**

# Install required libraries

In [None]:
!pip install -q -U \
     Sentence-transformers==3.0.1 \
     langchain==0.3.19 \
     langchain-groq==0.2.4 \
     langchain-community==0.3.18 \
     langchain-huggingface==0.1.2 \
     einops==0.8.1 \
     chromadb==0.6.3 \
     flashrank==0.2.10

### Import related libraries related to Langchain, HuggingfaceEmbedding

In [2]:
# Import Libraries
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_groq import ChatGroq

from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.text_splitter import CharacterTextSplitter
from langchain.prompts import ChatPromptTemplate
from langchain.document_loaders import WebBaseLoader
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import (
    LLMChainExtractor,
    EmbeddingsFilter,
)
from langchain.retrievers.document_compressors.flashrank_rerank import FlashrankRerank
# Import the RetrievalQA chain for question-answering tasks
from langchain.chains import RetrievalQA



In [3]:
import getpass
import os

#### Provide a Groq API key. You can create one to access free open-source models at the following link.

[Groq API Creation Link](https://console.groq.com/keys)




In [4]:
os.environ["GROQ_API_KEY"] = getpass.getpass()

··········


### Provide Huggingface API Key. You can create Huggingface API key at following lin

[Huggingface API Creation Link](https://huggingface.co/settings/tokens)




In [5]:
os.environ["HF_TOKEN"] = getpass.getpass()

··········


In [6]:
# Helper function for printing docs
def pretty_print_docs(docs):
    # Iterate through each document and format the output
    for i, d in enumerate(docs):
        print(f"{'-' * 50}\nDocument {i + 1}:")
        print(f"Content:\n{d.page_content}\n")
        print("Metadata:")
        for key, value in d.metadata.items():
            print(f"  {key}: {value}")
    print(f"{'-' * 50}")  # Final separator for clarity

### Step 1: Load and preprocess data code

In [7]:
def load_and_process_data(url):
    loader = WebBaseLoader(url)
    data = loader.load()

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    chunks = text_splitter.split_documents(data)

    for idx, chunk in enumerate(chunks):
        chunk.metadata["id"] = idx

    return chunks

### Step 2: Create vector store and BM25 retriever

In [8]:
def create_vector_store(chunks):
    embeddings = HuggingFaceEmbeddings(model_name="nomic-ai/nomic-embed-text-v1.5", model_kwargs = {'trust_remote_code': True})
    vectorstore = Chroma.from_documents(chunks, embeddings)
    return vectorstore

In [9]:
def reranking_rag(query, vectorstore, llm):
    # Set up the document compressor using FlashRank
    compressor = FlashrankRerank()

    # Create a compression retriever
    compression_retriever = ContextualCompressionRetriever(
        base_compressor=compressor,
        base_retriever=vectorstore.as_retriever()
    )

    # Create a RetrievalQA chain
    chain = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever)

    # Generate response
    response = chain.invoke(query)

    return {
        "query": query,
        "final_answer": response["result"],
        "retrieval_method": "Re-ranking with FlashRank"
    }

### Step 4: Create chunk of web data to Chroma Vector Store

In [None]:
# Initialize the language model with specified settings (Change temeprature  and other parameters as per your requirement)
llm = ChatGroq(
    model="llama3-8b-8192",
    temperature=0.5
)

# Load and process data
url = "https://en.wikipedia.org/wiki/Artificial_intelligence"
chunks = load_and_process_data(url)

# Create vector store
vectorstore  = create_vector_store(chunks)



In [None]:
# Example queries
queries = [
        "What are the main applications of artificial intelligence in healthcare?",
        "Explain the concept of machine learning and its relationship to AI.",
        "Discuss the ethical implications of AI in decision-making processes."
    ]

# Run Re-ranking RAG for each query
for query in queries:
  print(f"\nQuery: {query}")
  result = reranking_rag(query, vectorstore, llm)
  print("Final Answer:")
  print(result["final_answer"])
  print("\nRetrieval Method:")
  print(result["retrieval_method"])

# Demonstrate retrieval and re-ranking
demo_query = "Explain the concept of machine learning and its relationship to AI"
print(f"\nDemonstration Query: {demo_query}")

# Retrieve documents before re-ranking
docs_before = vectorstore.similarity_search(demo_query)
print("\nDocuments before re-ranking:")
pretty_print_docs(docs_before)

# Retrieve and re-rank documents
compressor = FlashrankRerank()
compression_retriever = ContextualCompressionRetriever(
        base_compressor=compressor,
        base_retriever=vectorstore.as_retriever()
    )
docs_after = compression_retriever.get_relevant_documents(demo_query)
print("\nDocuments after re-ranking:")
pretty_print_docs(docs_after)