##### Self-RAG: A Dynamic Approach to Retrieval-Augmented Generation

Self-RAG is an advanced algorithm that combines the power of retrieval-based and generation-based approaches in natural language processing. It dynamically decides whether to use retrieved information and how to best utilize it in generating responses, aiming to produce more accurate, relevant, and useful outputs.

##### Method Details
1. Retrieval Decision: The algorithm first decides if retrieval is necessary for the given query. This step prevents unnecessary retrieval for queries that can be answered directly.

2. Document Retrieval: If retrieval is deemed necessary, the algorithm fetches the top-k most similar documents from a vector store.

3. Relevance Evaluation: Each retrieved document is evaluated for its relevance to the query. This step filters out irrelevant information, ensuring that only pertinent context is used for generation.

4. Response Generation: The algorithm generates responses using the relevant contexts. If no relevant contexts are found, it generates a response without retrieval.

5. Support Assessment: Each generated response is evaluated to determine how well it is supported by the context. This step helps in identifying responses that are grounded in the provided information.

6. Utility Evaluation: The utility of each response is rated, considering how well it addresses the original query.

7. Response Selection: The final step involves selecting the best response based on the support assessment and utility evaluation.

In [1]:
import os
import sys
from dotenv import load_dotenv
load_dotenv()
from langchain_groq import ChatGroq
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain.vectorstores import Chroma
from langchain_core.documents import Document
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate
from utility import encode_pdf, show_context, retrieve_context_per_question
from langchain_core.output_parsers import StrOutputParser
from typing import List, Any, Dict
from concurrent.futures import ThreadPoolExecutor, as_completed
from langchain_community.docstore.in_memory import InMemoryDocstore
from tqdm import tqdm
from langchain.vectorstores import Chroma, FAISS
import faiss
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader, TextLoader
from utility import replace_t_with_space
from langchain_experimental.text_splitter import SemanticChunker
import pymupdf
from pydantic import BaseModel, Field

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [2]:
file_path="data/Understanding_Climate_Change.pdf"
vector_store = encode_pdf(file_path)

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
from pydantic import BaseModel, Field
groq_api_key=os.getenv("GROQ_API_KEY")
llm=ChatGroq(groq_api_key=groq_api_key,model_name="llama-3.1-8b-instant")

##### Define prompts and chain for each step

In [11]:
#Schema and prompt for retrieval response 
class RetrievalResponse(BaseModel):
    response:str = Field(description="Determine if retrieval is necessary. The answer should be 'yes' or 'no'")

retrieval_prompt = PromptTemplate(
    template = "Given the query '{query}', determine if retrieval is necessary. Output only 'Yes' or 'No'.",
    input_variables=["query"]
)

#Schema and prompt for relevance response 
class RelevanceResponse(BaseModel):
    response:str = Field(description="Determine if context is relevant. The answer should be 'yes' or 'no'")

relevance_prompt = PromptTemplate(
    template = "Given the query '{query}' and the context '{context}', determine if the context is relevant. Output only 'Relevant' or 'Irrelevant'",
    input_variables=["query","context"]
)

#Schema and prompt for Generation response 
#class GenerationResponse(BaseModel):
#    response:str = Field(description="Generated response from llm. Output should be string")

generation_prompt = PromptTemplate(
    template = "Given the query '{query}' and the context '{context}', generate a response.",
    input_variables=["query","context"]
)

#Schema and prompt for support response 
class SupportResponse(BaseModel):
    response:str = Field(description="Determines if response is supported. Output 'Fully supported', 'Partially supported', or 'No support'")

support_prompt = PromptTemplate(
    template = """Given the response '{response}' and the context '{context}', 
    determine if the response is supported by the context. Output 'Fully supported', 
    'Partially supported', or 'No support'.""",
    input_variables=["response","context"]
)
#Schema and prompt for utility response
class UtilityResponse(BaseModel):
    response: int = Field(description="Rate the utility of the response from 1 to 5")
utility_prompt = PromptTemplate(
    input_variables=["query", "response"],
    template="Given the query '{query}' and the response '{response}', rate the utility of the response from 1 to 5."
)

# Create LLMChains for each step
retrieval_chain = retrieval_prompt | llm.with_structured_output(RetrievalResponse)
relevance_chain = relevance_prompt | llm.with_structured_output(RelevanceResponse)
#generation_chain = generation_prompt | llm.with_structured_output(GenerationResponse)
generation_chain = generation_prompt | llm | StrOutputParser()
support_chain = support_prompt | llm.with_structured_output(SupportResponse)
utility_chain = utility_prompt | llm.with_structured_output(UtilityResponse)


##### Defining the self RAG logic flow

In [14]:
def self_rag(query,vectorstore,top_k=3):
    print(f"\n Processing query : {query}")

    #step:1 Determine if retrieval is necessary
    print("Step 1: Determining if retrieval is necessary...")
    input_data = {"query": query}
    retrieval_decision = retrieval_chain.invoke(input_data).response.strip().lower()
    print(f"Retrieval decision: {retrieval_decision}")

    if retrieval_decision == 'yes':
        # Step 2: Retrieve relevant documents
        print("Step 2: Retrieving relevant documents...")
        docs = vectorstore.similarity_search(query, k=top_k)
        contexts = [doc.page_content for doc in docs]
        print(f"Retrieved {len(contexts)} documents")

        # Step 3: Evaluate relevance of retrieved documents
        print("Step 3: Evaluating relevance of retrieved documents...")
        relevant_contexts = []
        for i, context in enumerate(contexts):
            input_data = {"query": query, "context": context}
            relevance = relevance_chain.invoke(input_data).response.strip().lower()
            print(f"Document {i+1} relevance: {relevance}")
            if relevance == 'yes':
                relevant_contexts.append(context)
        
        print(f"Number of relevant contexts: {len(relevant_contexts)}")

        # If no relevant contexts found, generate without retrieval
        if not relevant_contexts:
            print("No relevant contexts found. Generating without retrieval...")
            input_data = {"query": query, "context": "No relevant context found."}
            return generation_chain.invoke(input_data)
        
        # Step 4: Generate response using relevant contexts
        print("Step 4: Generating responses using relevant contexts...")
        responses = []
        for i, context in enumerate(relevant_contexts):
            print(f"Generating response for context {i+1}...")
            input_data = {"query": query, "context": context}
            response = generation_chain.invoke(input_data)

            # Step 5: Assess support
            print(f"Step 5: Assessing support for response {i+1}...")
            input_data = {"response": response, "context": context}
            support = support_chain.invoke(input_data).response.strip().lower()
            print(f"Support assessment: {support}")

            # Step 6: Evaluate utility
            print(f"Step 6: Evaluating utility for response {i+1}...")
            input_data = {"query": query, "response": response}
            utility = int(utility_chain.invoke(input_data).response)
            print(f"Utility score: {utility}")
            
            responses.append((response, support, utility))

        # Select the best response based on support and utility
        print("Selecting the best response...")
        best_response = max(responses, key=lambda x: (x[1] == 'fully supported', x[2]))
        print(f"Best response support: {best_response[1]}, utility: {best_response[2]}")
        return best_response[0]
    else:
        # Generate without retrieval
        print("Generating without retrieval...")
        input_data = {"query": query, "context": "No retrieval necessary."}
        return generation_chain.invoke(input_data)

In [15]:
#Test teh self RAG
query = "What is the impact of climate change on the environment?"
response = self_rag(query, vector_store)

print("\nFinal response:")
print(response)


 Processing query : What is the impact of climate change on the environment?
Step 1: Determining if retrieval is necessary...
Retrieval decision: yes
Step 2: Retrieving relevant documents...
Retrieved 3 documents
Step 3: Evaluating relevance of retrieved documents...
Document 1 relevance: yes
Document 2 relevance: yes
Document 3 relevance: yes
Number of relevant contexts: 3
Step 4: Generating responses using relevant contexts...
Generating response for context 1...
Step 5: Assessing support for response 1...
Support assessment: the impact of climate change on the environment is multifaceted and far-reaching.
Step 6: Evaluating utility for response 1...
Utility score: 5
Generating response for context 2...
Step 5: Assessing support for response 2...
Support assessment: content='the impact of climate change on the environment is multifaceted and far-reaching. climate change is altering terrestrial ecosystems by shifting habitat ranges, changing species distributions, and impacting ecosy

In [16]:
query = "how did harry beat quirrell?"
response = self_rag(query, vector_store)

print("\nFinal response:")
print(response)


 Processing query : how did harry beat quirrell?
Step 1: Determining if retrieval is necessary...
Retrieval decision: yes
Step 2: Retrieving relevant documents...
Retrieved 3 documents
Step 3: Evaluating relevance of retrieved documents...
Document 1 relevance: irrelevant
Document 2 relevance: irrelevant
Document 3 relevance: irrelevant
Number of relevant contexts: 0
No relevant contexts found. Generating without retrieval...

Final response:
content="I'd be happy to help you with that.  However, I must inform you that I couldn't find any relevant context for your question. Nonetheless, I can provide a general response.\n\nHarry Potter managed to defeat Lord Voldemort (not Quirrell) in their final battle. However, if you are asking about the battle between Harry Potter and Quirrell, it's because Quirrell possessed Voldemort's physical form at the time.\n\nIn the book 'Harry Potter and the Philosopher's Stone,' Harry Potter defeats Quirrell with the help of his mother's love. When Vold