# 🧠 Corrective RAG (CRAG) Implementation from Scratch

Welcome to this notebook! Here, you'll find a **from-scratch implementation** of the Corrective RAG (CRAG) framework, inspired by the research paper:  
[Corrective RAG: Faithful and Efficient Retrieval-Augmented Generation via Corrective Reasoning](https://arxiv.org/pdf/2401.15884)

---

## 🚀 What is Corrective RAG?

Corrective RAG (CRAG) is an advanced Retrieval-Augmented Generation (RAG) approach that enhances factual accuracy and reliability in LLM-based question answering. It introduces a **corrective reasoning loop** that verifies, refines, and supplements retrieved knowledge before generating a final answer.

---

## 🛠️ Implementation Highlights

- **Built from scratch**: No high-level CRAG libraries used—every component is custom-coded for transparency and flexibility.
- **Google Gemini & Embeddings**: Uses `langchain-google-genai` and `llama_index.embeddings.google_genai` for LLM and embedding models.
- **Vector Store Indexing**: Documents are split, embedded, and indexed for efficient retrieval.
- **Factuality Evaluation**: Combines similarity scoring and LLM-based factual checks to classify answers as `CORRECT`, `INCORRECT`, or `AMBIGUOUS`.
- **Web Search Integration**: For ambiguous or incorrect cases, queries are rewritten and external web search is performed (via Serper API).
- **Knowledge Refinement**: Retrieved passages are split into fine-grained "knowledge strips" and filtered for relevance using the LLM.
- **Final Answer Generation**: The LLM generates a response strictly based on the refined internal and/or external knowledge.

---

## 📂 Project Structure

- **Document Loading & Indexing**: Loads documents from the `data/` directory, splits them into nodes, and builds a vector index.
- **Factuality Agent**: Evaluates if retrieved passages support the query, invoking LLM checks when uncertain.
- **Query Rewriting & Web Search**: Rewrites queries for optimal web search and fetches up-to-date information.
- **Knowledge Refinement**: Extracts and recomposes only the most relevant knowledge strips for answer generation.
- **CRAG Inference Pipeline**: Orchestrates the above steps to deliver factually accurate and context-aware answers.

---

## 📊 Example Usage

- Ask a question like:  
    *"What is the difference between Flat white and cappuccino?"*
- The system retrieves, verifies, and refines knowledge, then generates a grounded answer.
- If internal knowledge is insufficient, it automatically supplements with web search.

---
## CITATION 
```bibtex
@misc{yan2024correctiveretrievalaugmentedgeneration,
      title={Corrective Retrieval Augmented Generation}, 
      author={Shi-Qi Yan and Jia-Chen Gu and Yun Zhu and Zhen-Hua Ling},
      year={2024},
      eprint={2401.15884},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2401.15884}, 
}
```
---

## 📖 References

- [Corrective RAG Paper (arXiv:2401.15884)](https://arxiv.org/pdf/2401.15884)

---

## 💡 Get Started

1. Place your documents in the `data/` folder.
2. Set your API keys for Google and Serper.
3. Run the notebook cells in order.
4. Try your own queries and explore the corrective reasoning process!

---

**Happy experimenting with CRAG!** 🦾✨

REFERENCE : [Corrective RAG paper](https://arxiv.org/pdf/2401.15884)
![Corrective RAG Algorithm](https://raw.githubusercontent.com/prasanna00019/RAG-Playground/main/Corrective_RAG/flow.png)

The flowchart of the algorithm

In [None]:
%pip install -U langchain-google-genai

In [None]:
# !pip install llama_index.llms.langchain
# !pip install langchain_community
# %pip install llama-index-embeddings-google-genai
# !pip install llama_index
import os
GOOGLE_API_KEY = ""
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)

In [5]:
from llama_index.core import VectorStoreIndex, Settings, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.google_genai import GoogleGenAIEmbedding

In [None]:
documents = SimpleDirectoryReader("data").load_data()
splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

# 2. Setup embedding and LLM
embed_model = GoogleGenAIEmbedding(
    model_name="text-embedding-004",
    embed_batch_size=100,
    api_key=""
)

Settings.embed_model = embed_model
Settings.llm = llm

# 3. Create vector index
vector_index = VectorStoreIndex(nodes)

In [64]:
def llm_based_factual_check(query, chunks,web_context=""):
  if(web_context==""):
    prompt = f"""
    You are a factual evaluator. Given the query:
    "{query}"

    And the following retrieved passages:
    {chunks}

    Decide if the retrieved passages sufficiently support the query.
    If you think you don't know the answer to this query, return "DONTKNOW".
    else return only one word: CORRECT, INCORRECT, or AMBIGUOUS.
    """
  else:
    prompt = f"""
    You are a factual evaluator. Given the query:
    "{query}"

    And the following retrieved passages:
    {chunks}

    And the following retrieved web context:
    {web_context}

    Decide if the retrieved passages sufficiently support the query.
    else return only one word: CORRECT, INCORRECT, or AMBIGUOUS.
    """
  result = llm.invoke(prompt)
  return result.content.strip().upper()

In [65]:
# tool
def evaluator_agent(query, vector_index, top_k=5, correct_thresh=0.75, incorrect_thresh=0.4):
    """
    Takes in a query and returns a score,confidence level and response.
    """
    query_engine = vector_index.as_query_engine(similarity_top_k=top_k)
    response = query_engine.query(query)
    similarity_scores = [node.score for node in response.source_nodes]

    # If similarity looks uncertain, ask LLM to verify
    if all(score >= correct_thresh for score in similarity_scores):
        return similarity_scores, "CORRECT", response
    elif all(score <= incorrect_thresh for score in similarity_scores):
        return similarity_scores, "INCORRECT", response
    else:
        # Optional: LLM-based re-verification
        llm_verdict = llm_based_factual_check(query, [node.node.text for node in response.source_nodes])
        print("llmverdict: ",llm_verdict)
        if(llm_verdict=="DONTKNOW"):
          rewritten_q = query_rewriter(query)
          web_context = search_web(rewritten_q)
          final_confidence = llm_based_factual_check(query, [node.node.text for node in response.source_nodes],web_context)
          return similarity_scores, final_confidence, response
        else:
          return similarity_scores, llm_verdict, response

s,c,response=evaluator_agent("What is the difference between Flat white and cappuccino?",vector_index)
print(s,c,response)

llmverdict:  CORRECT
[0.6414507745301866, 0.6274194287931348, 0.6058399572764629, 0.5943758868876117, 0.5675433654179854] CORRECT The preparation method is a key difference between a Flat White and a Cappuccino. For a Flat White, the coffee is added to the cup first, followed by warm milk, and the milk foam is prepared last, lying under the crema and taking on its color and taste. In contrast, for a Cappuccino, the hot milk and milk foam are prepared first, and then the coffee is added, flowing through the milk foam to create a characteristic white foam top.

Regarding ingredients and machine settings, a Flat White typically uses 60 ml of espresso with machine settings for 2 seconds of milk foam and 14 seconds of warm milk, along with normal coffee strength and temperature. A Cappuccino uses 50 ml of espresso, with machine settings for 14 seconds of milk foam, and strong coffee strength and high temperature.


In [49]:
query_2="What is the history of the flat white and how did it originate in Italy? ?"
scores_2,c,response=evaluator_agent(query_2,vector_index)
print(scores_2,c,response)

[0.6022679957581402, 0.5665247476663722, 0.566452335371869, 0.5534099582177929, 0.5119777212800196] INCORRECT The flat white originated in Australia. There is no information indicating that it originated in Italy.


In [52]:
# tool
def generator_agent(query,knowledge):
  """
  Takes in a query and knowledge and returns a response.
  """
  prompt = f"""
  You are a helpful and factually accurate assistant.
  Answer the following user query based strictly on the provided context.
  Avoid making up information, and mention if something is not found in the context.

  User Query:
  {query}

  Context:
  {knowledge}

  Answer:"""
  response = llm.invoke(prompt)
  return response.content.strip()

In [53]:
#tool
def query_rewriter(query):
  """
  Takes in a query and returns a rewritten effective query for a web search engine.
  """
  prompt = f"""
You are a helpful assistant whose job is to rewrite user queries to make them more effective for a web search engine.

Rewrite the following user query to make it clear, unambiguous, and optimized for retrieving accurate and up-to-date information from the web.

Example 1:
User Query: how did Tesla start?
Rewritten Web Search Query: when and by whom was Tesla Motors founded?

Example 2:
User Query: effects of covid
Rewritten Web Search Query: what are the long-term health effects of COVID-19 according to WHO?

User Query:
"{query}"
Rewritten Web Search Query:
"""
  response = llm.invoke(prompt)
  rewritten_query = response.content.strip().split("\n")[-1].replace("Rewritten Web Search Query:", "").strip()
  return rewritten_query


In [54]:
#tool
def knowledge_refine(query, source_nodes=response, strip_size=2):
    """
    Takes in the query and retrieved nodes.
    Splits each document into strips, asks the LLM to score them in bulk, and returns relevant ones.
    """
    combined_docs = ""
    for i, node in enumerate(source_nodes.source_nodes):
        combined_docs += f"[Document {i+1}]\n{node.text.strip()}\n<end-of-document>\n"

    prompt = f"""
You are an expert at extracting fine-grained knowledge segments from retrieved documents.

Given the user query: "{query}"

Instructions:
1. For each document, break it into smaller "knowledge strips" of 1–3 sentences each.
2. Score each strip on a scale from 0 to 1 for how relevant it is to the query.
3. Keep only those strips with score >= 0.6.
4. Recompose the selected strips **in order**, without repeating the score or document ID.

Your output should be a clean, ordered recomposition of the relevant strips.

Documents:
{combined_docs}

Final Refined Knowledge:
"""
    response = llm.invoke(prompt)
    return response.content.strip()


In [13]:
a=knowledge_refine("What is the difference between Flat white and cappuccino?")

In [14]:
print(a)

For cappuccino, the coffee is prepared after the hot milk and milk foam, then flows through the milk foam at the top. The milk foam, which weighs less than the coffee, floats above it, creating the characteristic white foam top. The word cappuccino probably comes from the Capuchin monks, and came to be used because the milk foam resembles the monk’s hood, with the colour of the monks’ hoods also recalling the brown hue of the beverage. A cappuccino typically uses 50 ml espresso and milk foam, with machine settings often involving 14 seconds for milk foam.

In contrast, the flat white is prepared by adding the coffee to the cup first, followed by warm milk. For flat white, the milk foam is prepared after the coffee and lies under the crema. This results in the milk foam at the top taking on the colour and flavour of the crema. A flat white typically uses 60 ml espresso, warm milk, and milk foam.


In [None]:
import os
os.environ["SERPER_API_KEY"] = ""

In [None]:
from langchain_community.utilities import GoogleSerperAPIWrapper
def search_web(inputs: str) -> str:
    """
    Searches the web using Serper.dev.
    """
    search = GoogleSerperAPIWrapper()
    return search.run(inputs)

REFERENCE : [Corrective RAG paper](https://arxiv.org/pdf/2401.15884)
![Corrective RAG Algorithm](https://raw.githubusercontent.com/prasanna00019/RAG-Playground/main/Corrective_RAG/algorithm.png)

**This algorithm is implemented below**


In [66]:
def algorithm_CRAG_inference(query):
  Internal_knowledge=""
  external_knowledge=""
  web_query=""
  k=""
  print(c," :c")
  print(" ")
  if(c=="CORRECT"):
   Internal_knowledge= knowledge_refine(query)
   k=Internal_knowledge
  elif(c=="INCORRECT"):
   web_query=query_rewriter(query)
   external_knowledge=search_web(web_query)
   k=external_knowledge
  elif(c=="AMBIGUOUS"):
    Internal_knowledge= knowledge_refine(query)
    web_query=query_rewriter(query)
    external_knowledge=search_web(web_query)
    k=Internal_knowledge+external_knowledge
    # print("internal knowledge: ",Internal_knowledge)
    # print("external knowledge: ",external_knowledge," :end")
  # print("k: ",k)
  print(" ")
  G=generator_agent(query,k)
  return G

In [58]:
final_CRAG_output_2=algorithm_CRAG_inference(query_2)
print(final_CRAG_output_2)

INCORRECT  :c
 
 
The origin of the flat white is disputed, but it is widely accepted to have originated in Oceania, specifically either New Zealand or Australia, in the 1980s. Coffee historian Ian Bersten suggests it may have originated in England in the 1950s. In Australia, it was introduced as a balance between an intense espresso and a milky latte. Australian barista Alan Preston and New Zealander Derek Townsend are among those who claim to have invented the drink.

The provided context does not state that the flat white originated in Italy. While it mentions "Italian sugar growers in the Sunshine State are said to have inspired the 'invention' of the flat white," this does not indicate an Italian origin for the drink itself.


In [67]:
final_CRAG_output=algorithm_CRAG_inference("What is the difference between Flat white and cappuccino?")
print(final_CRAG_output)

CORRECT  :c
 
 
The main difference between a Flat white and a cappuccino, according to the context, lies in their preparation method and the resulting appearance of the milk foam:

*   **Flat White:** The coffee is added to the cup first, followed by warm milk. The milk foam is prepared in the final stage and lies *under* the crema, taking on its colour and taste. This results in the milk foam at the top having the colour and flavour of the crema.
*   **Cappuccino:** The milk is prepared first, then the coffee. The coffee is prepared after the hot milk and milk foam, flowing *through* the milk foam at the top. The milk foam, being lighter than the coffee, floats above it, creating a characteristic white foam top.
