<a href="https://colab.research.google.com/drive/10V7Xf6Jcyz_yTml4KNXjyn7IrSbuaZpU?usp=sharing" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>

###	✅ What is Corrective-RAG?

Corrective RAG is a technique that introduces an additional step to verify and correct the information retrieved before generating the final response. This method aims to reduce errors and inconsistencies in the generated output by cross-checking the retrieved information against known facts or trusted sources. It often involves a separate model or module dedicated to fact-checking and error correction.

### 🔧 Corrective RAG Implementation

1. **Initial Retrieval**: We retrieve the top 3 most relevant documents for the query.
2. **Initial Response Generation**: Using the retrieved context, we generate an initial response.
3. **Critique Generation**: We ask the model to critique its own response, identifying potential errors or missing information.
4. **Additional Retrieval**: Based on the critique, we retrieve additional relevant documents.
5. **Final Response Generation**: We generate an improved response considering the initial response, critique, and additional context.

# ⚙️ Setup

1. **[LLM](https://groq.com/):** Groq's free Open source LLM endpoints([Groq API Key](https://console.groq.com/keys))
2. **[Vector Store](https://www.pinecone.io/learn/vector-database/):** [ChromaDB](https://www.trychroma.com/)
3. **[Embedding Model](https://qdrant.tech/articles/what-are-embeddings/):** [nomic-embed-text-v1.5](https://www.nomic.ai/blog/posts/nomic-embed-text-v1)
4. **[LLM Framework](https://python.langchain.com/v0.2/docs/introduction/):** LangChain
5. **[Huggingface API Key](https://huggingface.co/settings/tokens)**



### Install required libraries

In [1]:
!pip install -q -U \
     Sentence-transformers==3.0.1 \
     langchain==0.3.19 \
     langchain-groq==0.2.4 \
     langchain-chroma==0.2.2 \
     langchain-community==0.3.18 \
     langchain-huggingface==0.1.2 \
     einops==0.8.1

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m227.1/227.1 kB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m67.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m611.1/611.1 kB[0m [31m32.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m75.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m121.9/121.9 kB[0m [31m5.5 MB/s[0m eta [36m0:00:

### Import related libraries related to Langchain, HuggingfaceEmbedding

In [2]:
from langchain_groq import ChatGroq
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import ChatPromptTemplate
from langchain.document_loaders import WebBaseLoader



In [3]:
import os
import getpass

#### Provide a Groq API key. You can create one to access free open-source models at the following link.

[Groq API Creation Link](https://console.groq.com/keys)




In [4]:
os.environ["GROQ_API_KEY"] = getpass.getpass()

··········


### Provide Huggingface API Key. You can create Huggingface API key at following link

[Huggingface API Creation Link](https://huggingface.co/settings/tokens)




In [6]:
os.environ["HF_TOKEN"] = getpass.getpass()

··········


### Step 1: Load and preprocess data code

In [7]:
def load_and_process_data(url):
    # Load data from web
    loader = WebBaseLoader(url)
    data = loader.load()

    # Split text into chunks (Experiment with Chunk Size and Chunk Overlap to get optimal chunking)
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    chunks = text_splitter.split_documents(data)

    return chunks

### Step 2: Create vector store code

In [8]:
def create_vector_store(chunks):
    embeddings = HuggingFaceEmbeddings(model_name="nomic-ai/nomic-embed-text-v1.5", model_kwargs = {'trust_remote_code': True})
    vectorstore = Chroma.from_documents(chunks, embeddings)
    return vectorstore

### Step 3: Corrective RAG related code

1. **Initial Retrieval:** We retrieve the top 3 most relevant documents for the query.
2. **Initial Response Generation:** Using the retrieved context, we generate an initial response.
3. **Critique Generation:** We ask the model to critique its own response, identifying potential errors or missing information.
4. **Additional Retrieval:** Based on the critique, we retrieve additional relevant documents.
5. **Final Response Generation:** We generate an improved response considering the initial response, critique, and additional context.

In [9]:
def corrective_rag(query, vectorstore, llm):
    # Initial retrieval
    initial_docs = vectorstore.similarity_search(query, k=3)
    initial_context = "\n".join([doc.page_content for doc in initial_docs])

    # Generate initial response
    initial_prompt = ChatPromptTemplate.from_template(
        "Based on the following context, please answer the query:\nContext: {context}\nQuery: {query}"
    )
    initial_chain = initial_prompt | llm
    initial_response = initial_chain.invoke({"context": initial_context, "query": query})

    # Generate critique
    critique_prompt = ChatPromptTemplate.from_template(
        "Please critique the following response to the query. Identify any potential errors or missing information:\nQuery: {query}\nResponse: {response}"
    )
    critique_chain = critique_prompt | llm
    critique = critique_chain.invoke({"response": initial_response.content, "query": query})

    # Retrieve additional information based on critique
    additional_docs = vectorstore.similarity_search(critique.content, k=2)
    additional_context = "\n".join([doc.page_content for doc in additional_docs])

    # Generate final response
    final_prompt = ChatPromptTemplate.from_template(
        "Based on the initial response, critique, and additional context, please provide an improved answer to the query:\nInitial Response: {initial_response}\nCritique: {critique}\nAdditional Context: {additional_context}\nQuery: {query}"
    )
    final_chain = final_prompt | llm
    final_response = final_chain.invoke({
        "initial_response": initial_response.content,
        "critique": critique.content,
        "additional_context": additional_context,
        "query": query
    })

    return final_response.content

### Step 4: Create chunk of web data to Chroma Vector Store

In [10]:
llm = ChatGroq(
    model="llama3-8b-8192",
    temperature=0.5
)

# Load and process data
url = "https://en.wikipedia.org/wiki/Artificial_intelligence"
chunks = load_and_process_data(url)

# Create vector store
vectorstore = create_vector_store(chunks)

  from tqdm.autonotebook import tqdm, trange
Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


modules.json:   0%|          | 0.00/255 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/140 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/71.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/120 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/2.06k [00:00<?, ?B/s]

configuration_hf_nomic_bert.py:   0%|          | 0.00/1.96k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/nomic-ai/nomic-bert-2048:
- configuration_hf_nomic_bert.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_hf_nomic_bert.py:   0%|          | 0.00/103k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/nomic-ai/nomic-bert-2048:
- modeling_hf_nomic_bert.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors:   0%|          | 0.00/547M [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/1.19k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

1_Pooling%2Fconfig.json:   0%|          | 0.00/286 [00:00<?, ?B/s]

### Step 5: Run Corrective RAG

This implementation demonstrates the key aspects of Corrective RAG:

1. Initial retrieval and response generation
2. Self-critique to identify potential improvements
3. Additional retrieval based on the critique
4. Final response generation incorporating all available information

In [11]:
# Example query
query = "What are the main applications of artificial intelligence?"

response = corrective_rag(query, vectorstore, llm)

print("Final Response:")
print(response)

Final Response:
Based on the initial response, critique, and additional context, an improved answer to the query would be:

Artificial intelligence (AI) has a wide range of applications across various industries and domains. Some of the main applications of AI include:

1. Advanced web search engines (e.g., Google Search)
2. Recommendation systems (used by YouTube, Amazon, and Netflix)
3. Virtual assistants (e.g., Google Assistant, Siri, and Alexa)
4. Autonomous vehicles (e.g., Waymo)
5. Generative and creative tools (e.g., ChatGPT, AI art, content generation, image synthesis, and music composition)
6. Superhuman play and analysis in strategy games (e.g., chess, Go, game playing, game theory, and sports analytics)

In addition to these applications, AI is also being used to solve specific problems for specific industries or institutions, such as:

* Healthcare: AI is being used for medical imaging analysis, diagnosis, and treatment planning, as well as for personalized medicine and pat