# Corrective RAG (CRAG)

An advanced RAG approach that **evaluates** retrieved documents and **takes corrective actions** to improve answer quality.

**Problem with Traditional RAG:** It blindly uses all retrieved documents, even irrelevant ones, leading to poor answers or hallucinations.

**CRAG Solution:** Evaluate document relevance first, then decide what action to take.

## Architecture Overview

```
Query → Embed → Retrieve → Evaluate → Corrective Action → Generate
                              ↓
                       [Relevance Score]
                              ↓
                 ┌───────────┼───────────┐
                 ↓           ↓           ↓
              CORRECT    AMBIGUOUS   INCORRECT
              (≥0.5)     (0.3-0.5)    (≤0.3)
```

## Step 1: Document Loading & Chunking

Large documents need to be split into smaller chunks for effective retrieval.

**Strategy:** `RecursiveCharacterTextSplitter`
- Tries to split on natural boundaries (paragraphs → sentences → words)
- Maintains semantic coherence within chunks
- Uses overlap to preserve context between chunks

**Parameters:**
- `chunk_size=500`: Each chunk ~500 characters
- `chunk_overlap=50`: 50 characters overlap between consecutive chunks

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    separators=["\n\n", "\n", " ", ""]  # Priority order
)
chunks = text_splitter.split_documents(documents)

## Step 2: Embedding

Convert text chunks into dense vector representations for semantic search.

**Model:** `text-embedding-3-small` (via Aval AI API)
- OpenAI's latest embedding model
- 1536-dimensional vectors
- Good balance between performance and cost

**How it works:**
- Text → Neural Network → Fixed-size vector
- Similar texts have similar vectors (close in vector space)

In [None]:
import requests

def embed_documents(texts):
    response = requests.post(
        "https://api.avalai.ir/v1/embeddings",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={
            "model": "text-embedding-3-small",
            "input": texts
        }
    )
    return [item["embedding"] for item in response.json()["data"]]

## Step 3: Vector Store & Retrieval

Store embeddings and retrieve relevant documents using similarity search.

**Vector Database:** ChromaDB
- Open-source, lightweight
- Supports persistence to disk
- Built-in similarity search

**Retrieval Method:** Cosine Similarity
- Finds k most similar documents to query
- Returns documents ranked by similarity score

In [None]:
from langchain_community.vectorstores import Chroma

# Store embeddings
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

# Retrieve top-k similar documents
docs = vectorstore.similarity_search(query, k=4)

## Step 4: Relevance Evaluation

Score how relevant each document is to the query.

**Model:** Cross-Encoder (`ms-marco-MiniLM-L-6-v2`)
- Fine-tuned on MS MARCO (Question-Answering dataset)
- ~22M parameters, fast inference

**Why Cross-Encoder over Bi-Encoder?**

| Bi-Encoder | Cross-Encoder |
|------------|---------------|
| Encodes query and doc separately | Encodes query + doc together |
| Fast (can pre-compute) | Slower (must compute per pair) |
| Less accurate | More accurate |

Cross-Encoder sees the **interaction** between query and document, leading to better relevance judgments.

In [None]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model = AutoModelForSequenceClassification.from_pretrained(
    "cross-encoder/ms-marco-MiniLM-L-6-v2"
)
tokenizer = AutoTokenizer.from_pretrained(
    "cross-encoder/ms-marco-MiniLM-L-6-v2"
)

def score_relevance(query, document):
    inputs = tokenizer(query, document, return_tensors="pt", truncation=True)
    outputs = model(**inputs)
    score = torch.sigmoid(outputs.logits).item()  # 0 to 1
    return score

## Step 5: Path Determination

Based on average relevance score, choose one of three corrective paths.

**Thresholds:**
- `≥ 0.5` → **Correct**: Documents are highly relevant
- `≤ 0.3` → **Incorrect**: Documents are not relevant
- `0.3 - 0.5` → **Ambiguous**: Uncertain, need additional sources

In [None]:
def determine_path(scored_docs, threshold_correct=0.5, threshold_incorrect=0.3):
    avg_score = sum(score for _, score in scored_docs) / len(scored_docs)
    
    if avg_score >= threshold_correct:
        return "correct"
    elif avg_score <= threshold_incorrect:
        return "incorrect"
    else:
        return "ambiguous"

## Step 6: Corrective Actions

### 6.1 Correct Path (score ≥ 0.5)

Documents are relevant → **Decompose, Filter, Recompose**

- Remove documents with individual score < 0.3
- Keep only high-quality, relevant documents
- Use filtered documents for generation

In [None]:
def correct_path(scored_docs, min_score=0.3):
    # Filter out low-relevance documents
    filtered = [doc for doc, score in scored_docs if score >= min_score]
    return filtered, "Retrieved Documents (Filtered)"

### 6.2 Incorrect Path (score ≤ 0.3)

Documents are not relevant → **Web Search Fallback**

- Discard all retrieved documents
- Search the web for fresh information
- Use web results for generation

**Search Engine:** DuckDuckGo (no API key required)

In [None]:
from duckduckgo_search import DDGS
from langchain_core.documents import Document

def incorrect_path(query, max_results=3):
    results = DDGS().text(query, max_results=max_results)
    web_docs = [
        Document(page_content=r["body"], metadata={"source": r["href"]})
        for r in results
    ]
    return web_docs, "Web Search"

### 6.3 Ambiguous Path (0.3 < score < 0.5)

Uncertain relevance → **Hybrid Approach**

- Keep filtered documents (some might be useful)
- Also search the web for additional context
- Combine both sources for generation

This provides the best of both worlds when confidence is low.

In [None]:
def ambiguous_path(query, scored_docs):
    # Get filtered documents
    filtered_docs, _ = correct_path(scored_docs)
    # Get web results
    web_docs, _ = incorrect_path(query, max_results=2)
    # Combine both
    return filtered_docs + web_docs, "Hybrid (Documents + Web)"

## Step 7: Answer Generation

Generate final answer using LLM with selected documents as context.

**Model:** `gpt-4o-mini` (via Aval AI API)
- Fast and cost-effective
- Good at following instructions
- Handles context well

**Prompt includes:**
- Context from selected documents
- Knowledge source (so LLM knows where info came from)
- User's question

In [None]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.7,
    openai_api_key=AVALAI_API_KEY,
    openai_api_base="https://api.avalai.ir/v1"
)

prompt = f"""
Context (from {knowledge_source}):
{context}

Question: {question}

Answer based on the context. If context is insufficient, say so.
"""

answer = llm.invoke(prompt)

## Step 8: Streamlit Interface

A web-based chat interface for interacting with PDFs.

**Features:**
- PDF upload and automatic processing
- Real-time chat with document
- Displays which path was taken (Correct/Incorrect/Ambiguous)
- Shows relevance scores and knowledge source

**Components:**
- `st.file_uploader`: Upload PDF files
- `st.chat_input`: User question input
- `st.chat_message`: Display conversation
- Session state: Maintain chat history

## Summary

| Step | Component | Technology | Purpose |
|------|-----------|------------|--------|
| 1 | Chunking | RecursiveCharacterTextSplitter | Split docs into semantic chunks |
| 2 | Embedding | text-embedding-3-small | Convert text to vectors |
| 3 | Storage | ChromaDB | Store and search vectors |
| 4 | Evaluation | Cross-Encoder (MiniLM) | Score query-doc relevance |
| 5 | Path Selection | Threshold-based | Choose corrective action |
| 6 | Correction | Filter / Web / Hybrid | Get best context |
| 7 | Generation | gpt-4o-mini | Generate final answer |
| 8 | Interface | Streamlit | Web-based chat UI |

**Key Benefits:**
- Reduces hallucination from irrelevant context
- Falls back to web when knowledge base fails
- Adaptive strategy based on confidence

**Run the app:**
```bash
streamlit run app.py
```

## Summary

| Step | Component | Technology | Purpose |
|------|-----------|------------|--------|
| 1 | Chunking | RecursiveCharacterTextSplitter | Split docs into semantic chunks |
| 2 | Embedding | text-embedding-3-small | Convert text to vectors |
| 3 | Storage | ChromaDB | Store and search vectors |
| 4 | Evaluation | Cross-Encoder (MiniLM) | Score query-doc relevance |
| 5 | Path Selection | Threshold-based | Choose corrective action |
| 6 | Correction | Filter / Web / Hybrid | Get best context |
| 7 | Generation | gpt-4o-mini | Generate final answer |
| 8 | Interface | Streamlit | Web-based chat UI |

**Key Benefits:**
- Reduces hallucination from irrelevant context
- Falls back to web when knowledge base fails
- Adaptive strategy based on confidence

**Run the app:**
```bash
streamlit run app.py
```