# 09 - Self-RAG (Self-Reflective RAG)

**Complexity:** ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê

**Use Cases:** Exploratory research, quality-critical applications, self-correcting systems

**Key Features:**
- LLM decides autonomously when to retrieve
- Self-critique mechanism
- Iterative refinement
- Citation validation

**Flow:**
```
Query ‚Üí Need Retrieval? ‚Üí 
  Yes: Retrieve + Generate ‚Üí Self-Critique ‚Üí Retry if poor
  No: Direct Generate ‚Üí Self-Critique
```

In [None]:
import sys
sys.path.append('../..')

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from shared.config import OPENAI_VECTOR_STORE_PATH, DEFAULT_MODEL
from shared.utils import load_vector_store, print_section_header, format_docs
from shared.prompts import (
    RETRIEVAL_NEED_PROMPT, SELF_CRITIQUE_PROMPT, 
    CITATION_CHECK_PROMPT, RAG_PROMPT_TEMPLATE
)
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

print_section_header("Setup: Self-RAG")

embeddings = OpenAIEmbeddings()
vectorstore = load_vector_store(OPENAI_VECTOR_STORE_PATH, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
llm = ChatOpenAI(model=DEFAULT_MODEL, temperature=0)

print("‚úÖ Setup complete!")

## 2. Self-RAG Components

In [None]:
print_section_header("Self-RAG Components")

# Retrieval need decider
retrieval_decider = RETRIEVAL_NEED_PROMPT | llm | StrOutputParser()

# Response critic
response_critic = SELF_CRITIQUE_PROMPT | llm | StrOutputParser()

# Citation checker
citation_checker = CITATION_CHECK_PROMPT | llm | StrOutputParser()

print("‚úì Self-RAG components initialized:")
print("  - Retrieval need decider")
print("  - Response critic")
print("  - Citation checker")

## 3. Test Components

In [None]:
print_section_header("Testing Self-RAG Components")

# Test retrieval decision
queries = [
    "What is 2+2?",  # NO retrieval
    "What is RAG in LangChain?",  # YES retrieval
]

print("Retrieval Need Decisions:\n")
for q in queries:
    decision = retrieval_decider.invoke({"query": q}).strip()
    print(f"{decision:3} | {q}")

## 4. Self-RAG Pipeline

In [None]:
print_section_header("Self-RAG Pipeline")

def self_rag_pipeline(query: str, max_iterations: int = 2):
    """Self-RAG with iterative refinement."""
    print(f"\n{'='*80}")
    print(f"Query: {query}")
    print('='*80)
    
    iteration = 0
    
    while iteration < max_iterations:
        iteration += 1
        print(f"\n--- Iteration {iteration} ---")
        
        # Decide if retrieval needed
        need_retrieval = retrieval_decider.invoke({"query": query})
        print(f"Retrieval needed: {need_retrieval.strip()}")
        
        # Retrieve or use general knowledge
        if "YES" in need_retrieval.upper():
            docs = retriever.invoke(query)
            context = format_docs(docs)
            print(f"Retrieved {len(docs)} documents")
        else:
            context = "Using general knowledge."
            print("Using general knowledge only")
        
        # Generate response
        gen_prompt = ChatPromptTemplate.from_messages([
            ("system", f"Context: {context}"),
            ("user", "{query}")
        ])
        response = (gen_prompt | llm | StrOutputParser()).invoke({"query": query})
        print(f"\nGenerated ({len(response)} chars)")
        
        # Self-critique
        critique = response_critic.invoke({
            "query": query,
            "context": context[:1000],
            "response": response
        })
        
        print(f"\nüîç Critique:\n{critique}")
        
        # Check if retry needed
        if "SHOULD_RETRY: yes" not in critique.lower():
            print("\n‚úì Response approved!")
            return response, iteration, critique
        else:
            print("\n‚ö†Ô∏è  Retrying with refinement...")
    
    print(f"\n‚ö†Ô∏è  Max iterations reached")
    return response, iteration, critique

print("‚úì Self-RAG pipeline ready")

## 5. Test Self-RAG

In [None]:
print_section_header("Self-RAG Test")

# Test 1: Query needing retrieval
query1 = "What is the difference between similarity and MMR retrieval?"
response1, iters1, _ = self_rag_pipeline(query1)

print(f"\n\nFinal Response ({iters1} iteration{'s' if iters1 > 1 else ''}):")
print(response1[:300])

In [None]:
# Test 2: Simple query (no retrieval)
query2 = "What is 5 + 7?"
response2, iters2, _ = self_rag_pipeline(query2)

print(f"\n\nFinal Response ({iters2} iteration{'s' if iters2 > 1 else ''}):")
print(response2)

## Summary

**Advantages:**
‚úÖ Autonomous decision-making  
‚úÖ Self-correction capability  
‚úÖ Only retrieves when needed (efficient)  
‚úÖ Quality assurance built-in  

**Limitations:**
- Very slow (multiple LLM calls)
- Expensive (iterations + critique)
- Complex to tune
- May over-correct

**When to Use:**
- Quality is paramount
- Research applications
- Self-correcting systems
- Budget allows higher cost

**Production Tips:**
- Limit max iterations
- Cache critique results
- Monitor iteration distribution
- Set quality thresholds

**Next:** [10_agentic_rag.ipynb](10_agentic_rag.ipynb) - Autonomous agents with tools