# Corrective RAG (CRAG)

An advanced RAG approach that **evaluates** retrieved documents and **takes corrective actions** to improve answer quality.

![CRAG Architecture](https://miro.medium.com/v2/resize:fit:2000/format:webp/1*qKV_BQ4X2cFVhU1DIMRtKw.png)

**Problem with Traditional RAG:** It blindly uses all retrieved documents, even irrelevant ones, leading to poor answers or hallucinations.

**CRAG Solution:** Evaluate document relevance first using LLM, then decide what action to take.

## Architecture Overview

```
Query -> Embed (Metis) -> Retrieve (ChromaDB) -> Evaluate (LLM) -> Corrective Action -> Generate
                                                       |
                                                [Relevance Check]
                                                       |
                                          +------------+------------+
                                          |            |            |
                                       CORRECT     AMBIGUOUS    INCORRECT
                                       (>=50%)     (30-50%)      (<=30%)
                                          |            |            |
                                       Filter     Filter+Web    Web Search
                                                                 (Tavily)
```

## Step 1: Document Loading & Chunking

Large documents need to be split into smaller chunks for effective retrieval.

**Strategy:** `RecursiveCharacterTextSplitter`
- Tries to split on natural boundaries (paragraphs -> sentences -> words)
- Maintains semantic coherence within chunks
- Uses overlap to preserve context between chunks

**Parameters:**
- `chunk_size=500`: Each chunk ~500 characters
- `chunk_overlap=50`: 50 characters overlap between consecutive chunks

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    separators=["\n\n", "\n", " ", ""]  # Priority order
)
chunks = text_splitter.split_documents(documents)

## Step 2: Embedding

Convert text chunks into dense vector representations for semantic search.

**Model:** `text-embedding-3-small` (via Metis AI API)
- OpenAI's embedding model
- 1536-dimensional vectors
- Good balance between performance and cost

**How it works:**
- Text -> Neural Network -> Fixed-size vector
- Similar texts have similar vectors (close in vector space)

In [None]:
import requests

def embed_documents(texts):
    response = requests.post(
        "https://api.metisai.ir/openai/v1/embeddings",
        headers={"Authorization": f"Bearer {METIS_API_KEY}"},
        json={
            "model": "text-embedding-3-small",
            "input": texts
        }
    )
    return [item["embedding"] for item in response.json()["data"]]

## Step 3: Vector Store & Retrieval

Store embeddings and retrieve relevant documents using similarity search.

**Vector Database:** ChromaDB
- Open-source, lightweight
- Supports persistence to disk
- Built-in similarity search

**Retrieval Method:** Cosine Similarity
- Finds k most similar documents to query
- Returns documents ranked by similarity score

In [None]:
from langchain_community.vectorstores import Chroma

# Store embeddings
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

# Retrieve top-k similar documents
docs = vectorstore.similarity_search(query, k=4)

## Step 4: LLM-based Relevance Evaluation

Evaluate how relevant each retrieved document is to the query using an LLM.

**Model:** `qwen2.5-vl-3b-instruct` (via Aval AI API)
- Lightweight and fast
- Good at following instructions
- Batch evaluation (all documents at once)

**Why LLM over traditional methods?**

| Traditional (Bi-Encoder) | LLM Evaluation |
|--------------------------|----------------|
| Limited semantic understanding | Deep semantic understanding |
| Can't handle complex queries | Understands nuanced queries |
| Requires fine-tuning | Works out of the box |

**Output:** For each document: `yes` (relevant) or `no` (not relevant)

In [None]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(
    model="qwen2.5-vl-3b-instruct",
    temperature=0,
    openai_api_key=AVALAI_API_KEY,
    openai_api_base="https://api.avalai.ir/v1"
)

prompt = ChatPromptTemplate.from_messages([
    ("human", """Check which documents are relevant to the question.

Question: {question}

Documents:
{documents}

Answer 'yes' or 'no' for each document. Format: yes, no, yes, no""")
])

grader = prompt | llm | StrOutputParser()

# Example: "no, no, yes, no" -> [0.0, 0.0, 1.0, 0.0]

## Step 5: Path Determination

Based on **relevance ratio** (% of relevant documents), choose one of three corrective paths.

**Thresholds:**
- `>= 50%` relevant -> **Correct**: Most documents are useful
- `<= 30%` relevant -> **Incorrect**: Documents are not relevant  
- `30% - 50%` -> **Ambiguous**: Uncertain, need additional sources

In [None]:
def determine_path(scores, threshold_correct=0.5, threshold_incorrect=0.3):
    relevant_count = sum(1 for s in scores if s >= 0.5)
    relevance_ratio = relevant_count / len(scores)
    
    if relevance_ratio >= threshold_correct:
        return "correct"
    elif relevance_ratio <= threshold_incorrect:
        return "incorrect"
    else:
        return "ambiguous"

# Example: scores = [0.0, 0.0, 1.0, 0.0]
# relevance_ratio = 1/4 = 0.25 -> "incorrect"

## Step 6: Corrective Actions

### 6.1 Correct Path (score >= 0.5)

Documents are relevant -> **Decompose, Filter, Recompose**

- Remove documents with individual score < 0.3
- Keep only high-quality, relevant documents
- Use filtered documents for generation

In [None]:
def correct_path(scored_docs, min_score=0.3):
    # Filter out low-relevance documents
    filtered = [doc for doc, score in scored_docs if score >= min_score]
    return filtered, "Retrieved Documents (Filtered)"

### 6.2 Incorrect Path (relevance <= 30%)

Documents are not relevant -> **Web Search Fallback**

- Discard all retrieved documents
- Rewrite question for better web search
- Search the web using Tavily API
- Use web results for generation

**Search Engine:** Tavily (fast, accurate, designed for LLM apps)

In [None]:
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.documents import Document

# Question rewriter for better web search
rewriter = ChatPromptTemplate.from_messages([
    ("human", "Rewrite this question for web search:\n\n{question}")
]) | llm | StrOutputParser()

# Tavily search
search_tool = TavilySearchResults(k=3)

def incorrect_path(query, max_results=3):
    # Rewrite question
    rewritten = rewriter.invoke({"question": query})
    
    # Search web
    results = search_tool.invoke({"query": rewritten})
    
    web_docs = [
        Document(page_content=r["content"], metadata={"url": r["url"]})
        for r in results[:max_results]
    ]
    return web_docs, "Web Search (Tavily)"

### 6.3 Ambiguous Path (30% < relevance < 50%)

Uncertain relevance -> **Hybrid Approach**

- Keep filtered documents (some might be useful)
- Also search the web for additional context
- Combine both sources for generation

This provides the best of both worlds when confidence is low.

In [None]:
def ambiguous_path(query, scored_docs):
    # Get filtered documents
    filtered_docs, _ = correct_path(scored_docs)
    # Get web results
    web_docs, _ = incorrect_path(query, max_results=2)
    # Combine both
    return filtered_docs + web_docs, "Hybrid (Documents + Web)"

## Step 7: Answer Generation

Generate final answer using LLM with selected documents as context.

**Model:** `gpt-4o-mini` (via Aval AI API)
- Fast and cost-effective
- Good at following instructions
- Handles context well

**Prompt includes:**
- Context from selected documents
- Knowledge source (so LLM knows where info came from)
- User's question

In [None]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.7,
    openai_api_key=AVALAI_API_KEY,
    openai_api_base="https://api.avalai.ir/v1"
)

prompt = f"""
Context (from {knowledge_source}):
{context}

Question: {question}

Answer based on the context. If context is insufficient, say so.
"""

answer = llm.invoke(prompt)

## Step 8: Streamlit Interface

A web-based chat interface for interacting with PDFs.

**Features:**
- PDF upload and automatic processing
- Real-time chat with document
- Displays which path was taken (Correct/Incorrect/Ambiguous)
- Shows relevance scores and knowledge source

**Components:**
- `st.file_uploader`: Upload PDF files
- `st.chat_input`: User question input
- `st.chat_message`: Display conversation
- Session state: Maintain chat history

In [None]:
import streamlit as st
from langchain_community.document_loaders import PyPDFLoader

# PDF Upload
pdf_file = st.file_uploader("Upload PDF", type=['pdf'])

if pdf_file:
    loader = PyPDFLoader(pdf_file)
    pages = loader.load()
    vector_store.load_documents([p.page_content for p in pages])

# Chat interface
if prompt := st.chat_input("Ask a question..."):
    result = rag.query(prompt, return_metadata=True)
    
    st.chat_message("user").write(prompt)
    st.chat_message("assistant").write(result["answer"])
    
    # Show path info
    st.info(f"Path: {result['path_type']} | Score: {result['avg_relevance_score']:.2f}")

## Summary

| Step | Component | Technology | Provider |
|------|-----------|------------|----------|
| 1 | Chunking | RecursiveCharacterTextSplitter | LangChain |
| 2 | Embedding | text-embedding-3-small | Metis AI |
| 3 | Storage | ChromaDB | Local |
| 4 | Evaluation | qwen2.5-vl-3b-instruct (LLM) | Aval AI |
| 5 | Path Selection | Threshold-based | - |
| 6 | Web Search | Tavily + Question Rewriter | Tavily + Metis AI |
| 7 | Generation | gpt-4o-mini | Aval AI |
| 8 | Interface | Streamlit | - |

**Key Benefits:**
- Reduces hallucination from irrelevant context
- Falls back to web when knowledge base fails
- Adaptive strategy based on confidence
- Batch evaluation (single LLM call for all docs)

**Run the app:**
```bash
streamlit run app.py
```