# RAG Detailed Survey

**Paper Discussion**

[arxiv.org](https://arxiv.org/pdf/2312.10997)

**Class Notes** :- https://drive.google.com/file/d/1_uRq1ZjtjNH7abY_bsHcS2JcJm83xkpn/view?usp=sharing

**Notes**

## **Retrieval-Augmented Generation for Large Language Models: A Survey**

Yes, this is also a RAG paper, but it's a **comprehensive survey paper** (2023) by Gao et al. that provides an extensive overview of the RAG field, rather than introducing a specific new RAG method.

### **Key Contributions**

This survey systematically organizes RAG research into:

**1. Three RAG Paradigms:**

- **Naive RAG**: The original "Retrieve-Read" approach (index → retrieve → generate)
- **Advanced RAG**: Pre-retrieval and post-retrieval optimizations (query rewriting, re-ranking, context compression)
- **Modular RAG**: Flexible architectures with specialized modules (search, memory, routing, task adapters)

**2. Core Components Analysis:**

- **Retrieval**: Indexing strategies, query optimization, embedding models, retrieval sources (text, KG, multi-modal)
- **Generation**: Context curation, LLM fine-tuning, post-retrieval processing
- **Augmentation**: Iterative, recursive, and adaptive retrieval patterns

**3. Comprehensive Evaluation Framework:**

- Covers 26 tasks and nearly 50 datasets
- Quality scores: context relevance, answer faithfulness, answer relevance
- Required abilities: noise robustness, negative rejection, information integration, counterfactual robustness
- Evaluation tools: RAGAS, ARES, TruLens, RGB, RECALL

### **Key Insights**

**RAG Challenges:**

- Retrieval precision/recall issues
- Generation hallucinations and irrelevance
- Robustness to noisy or contradictory information
- Handling long contexts efficiently

**RAG vs. Fine-tuning:**

- RAG excels at dynamic knowledge updates and interpretability
- Fine-tuning better for style/format replication
- Hybrid approaches combining both show promise

**Future Directions:**

- RAG + Long Context LLMs (200K+ tokens)
- Hybrid RAG-Fine-tuning approaches
- Scaling laws for RAG models
- Production-ready systems (efficiency, security)
- Multi-modal RAG (images, audio, video, code, KG)

### **Practical Value**

This survey is highly valuable for:

- Understanding the **evolution** of RAG from 2020-2023
- Comparing **100+ RAG methods** systematically
- Identifying **best practices** for different scenarios (single-hop QA, multi-hop reasoning, domain-specific tasks)
- Choosing appropriate **evaluation metrics and benchmarks**
- Understanding the **RAG ecosystem** (LangChain, LlamaIndex, etc.)

**Bottom line**: While the first paper introduced RAG, this survey maps the entire RAG landscape, showing how the field has rapidly evolved with Advanced and Modular approaches, and where it's heading next.

## **Research Papers**

### **1. Pre-training (Orange/Yellow Branch - Left)**

RAG techniques integrated during the model pre-training phase:

**2020-2023:**

- **REALM** - First to jointly pre-train retriever and language model
- **RETRO** - Retrieval-Enhanced Transformer
- **Atlas** - Few-shot learning with retrieval
- **InstructRetro** - Instruction-tuned RETRO

**2023-2024:**

- **CoG** (Copy is all you need)
- **RAVEN** - Retrieval-augmented encoder-decoder
- **TIGER** - More recent pre-training approach

### **2. Fine-tuning (Green Branch - Center)**

RAG methods applied during model fine-tuning:

**2023 Era:**

- **PROMPTAGATOR** - Prompt-based retrieval
- **RAG** - Original RAG paper (Lewis et al., 2020)
- **P-RAG** - Prompt-based RAG variants

**2024 Era:**

- **Self-RAG** - Self-reflective retrieval
- **FLARE** - Forward-looking active retrieval
- **AAR** - Augmentation-adapted retriever
- **SURGE** - Another fine-tuning approach
- **SANTA** - Code and text integration
- **CoN** (Chain-of-Note)
- **UPRISE** - Universal prompt retrieval
- **CT-RAG** - Context-tuned RAG
- **UniMS-RAG** - Unified multi-source RAG
- **BEQ** - Query optimization
- **EAR** - Event argument retrieval
- **Dual-Feedback-ToD** - Task-oriented dialogue
- **MK-ToD** - Multi-knowledge dialogue
- **RAST** - Retrieval-augmented generation
- **Self-Mem** - Self-memory mechanism
- **Retrieve-and-Sample**
- **RAG-Robust** - Robustness improvements
- **LM-Indexer**
- **RA-e2e** - End-to-end retrieval augmentation

### **3. Inference (Dark Blue/Teal Branch - Right)**

RAG techniques applied during inference (most common approach):

**2023 Era:**

- **DSP** (Demonstrate-Search-Predict)
- **RECITE** - Recitation-augmented generation
- **GenRead** - Generate rather than retrieve
- **ICRALM** - In-context retrieval
- **RePLUG** - Retrieve and plug
- **Filter-Reranker** - Post-retrieval filtering

**2024 Era (Major Expansion):**

*Query & Retrieval:*

- **HyDE** - Hypothetical document embeddings
- **CRAG** - Corrective RAG
- **RAPTOR** - Recursive tree-organized retrieval
- **G-Retriever** - Graph retrieval
- **BGM** - Bridge model for retrieval

*Advanced Reasoning:*

- **IRCoT** - Interleaving retrieval with chain-of-thought
- **ITER-RETGEN** - Iterative retrieval-generation
- **SKR** - Self-knowledge guided retrieval
- **ITRG** - Iterative retrieval
- **ToC** (Tree of Clarifications)
- **1-PAGER** - Single page generation
- **FABULA** - Narrative generation
- **KnowledGPT** - Knowledge graph integration

*Context Processing:*

- **Token-Elimination** - Reducing context
- **Recomp** - Context compression
- **PRCA** - Context adapter
- **NoiseRAG** - Noise handling

*Multi-modal & Specialized:*

- **CREA-ICL** - Cross-lingual ICL
- **IAG** - Interactive augmented generation
- **NoMIRACL** - Multilingual retrieval
- **PKG** - Parametric knowledge guidance
- **KALMV** - Knowledge augmentation
- **RoG** - Reasoning on graphs
- **HyKGE** - Hybrid knowledge graph
- **PGRA** - Prompt-guided retrieval

*Domain-Specific:*

- **PaperQA** - Scientific paper QA
- **RADA** - Domain adaptation
- **FILCO** - Filtered context
- **QLM-Doc-ranking** - Document ranking

## **Three RAG Paradigms**

### **1. Naive RAG (Left - Blue)**

**The simplest, original approach:**

**Pipeline:**

1. **User** → **Query** → **Documents**
2. **Indexing**: Documents chunked and vectorized
3. **Retrieval**: Find relevant chunks via similarity search
4. **Prompt + Frozen LLM**: Combine query + retrieved docs
5. **Output**: Generate answer

**Characteristics:**

- Linear, sequential flow
- "Retrieve-Read" pattern
- No optimization steps
- Fixed retrieval (no refinement)

**Limitations:**

- Low retrieval precision
- No context filtering
- Potential hallucinations
- Cannot handle complex queries well

---

### **2. Advanced RAG (Middle - Orange)**

**Adds pre-retrieval and post-retrieval optimization:**

**Enhanced Pipeline:**

**Pre-Retrieval (Yellow Box):**

- **Query Routing**: Direct to appropriate data source
- **Query Rewriting**: Reformulate for better retrieval
- **Query Expansion**: Add related terms

↓

**Retrieval**: Same as Naive RAG

↓

**Post-Retrieval (Orange Box):**

- **Rerank**: Reorder by relevance
- **Summary**: Compress retrieved content
- **Fusion**: Combine information from multiple sources

↓

**Prompt + Frozen LLM** → **Output**

**Key Improvements:**

- Better query formulation
- More relevant retrieval
- Context compression/filtering
- Still follows chain-like structure

**Examples:**

- Query rewriting (HyDE, RRR)
- Re-ranking (Cohere rerank, bge-reranker)
- Context compression (LongLLMLingua)

---

### **3. Modular RAG (Right - Blue/Orange)**

**Most flexible, component-based architecture:**

**Top Section - Modules:**
Plug-and-play components that can be mixed and matched:

- **Search**: Database/web search
- **Routing**: Route to different pipelines
- **Predict**: Generate hypothetical answers
- **Retrieve**: Core retrieval module
- **Rewrite**: Query reformulation
- **Rerank**: Re-score documents
- **Read**: Extract information
- **Demonstrate**: Provide examples
- **Memory**: Store conversation history
- **Fusion**: Merge multiple sources

**Bottom Section - Patterns:**
Different workflow configurations:

**Naive RAG Pattern:**

`Retrieve → Read`

**Advanced RAG Pattern:**

`Retrieve → Rerank → Read`

**DSP Pattern** (Khattab et al., 2022):

`Demonstrate → Retrieve → Rewrite → Read → Search → Predict`

**ITER-RETGEN Pattern** (Shao et al., 2023):

`Rewrite → Demonstrate → Retrieve → Read → Retrieve → Read
(Iterative loop)`

**Key Features:**

- **Non-linear flow**: Not just sequential
- **Iterative processes**: Multiple retrieval rounds
- **Adaptive retrieval**: Decide when to retrieve
- **Module replacement**: Swap components as needed
- **Task-specific**: Configure for different use cases

---

## 

## **Evolution Timeline**

**2020-2021: Naive RAG**

- Original RAG paper (Lewis et al.)
- Simple retrieve-then-read
- Proof of concept

**2022-2023: Advanced RAG**

- Pre-retrieval optimization (query rewriting)
- Post-retrieval refinement (reranking)
- Context compression
- Better indexing strategies

**2023-2024: Modular RAG**

- Component-based architecture
- Iterative retrieval (ITER-RETGEN)
- Adaptive retrieval (Self-RAG, FLARE)
- Multi-hop reasoning (DSP)
- Integration with fine-tuning

## **Practical Examples**

**Naive RAG:**

- Basic chatbot with document retrieval
- Simple FAQ systems

**Advanced RAG:**

- Enterprise knowledge bases with reranking
- Customer support with query expansion
- HyDE for better semantic matching

**Modular RAG:**

- Multi-step research assistants (DSP pattern)
- Complex QA requiring reasoning (ITER-RETGEN)
- Agentic systems that decide when to retrieve (Self-RAG)
- Domain-specific workflows with custom modules

## **Practical Examples**

### **Iterative RAG:**

**Question:** "What are the health benefits of the Mediterranean diet?"

1. First retrieval: General overview
2. Generate partial answer about heart health
3. Second retrieval: More specific studies
4. Generate expanded answer about diabetes prevention
5. Third retrieval: Latest research
6. Final comprehensive answer

### **Recursive RAG:**

**Question:** "Compare the economic policies of the last three US presidents and their impact on unemployment."

**Decomposition:**

- Sub-Q1: Who were the last three presidents?
- Sub-Q2: What were Obama's economic policies?
    - Sub-Q2a: What was unemployment under Obama?
- Sub-Q3: What were Trump's economic policies?
    - Sub-Q3a: What was unemployment under Trump?
- Sub-Q4: What were Biden's economic policies?
    - Sub-Q4a: What was unemployment under Biden?
- Final: Synthesize comparison

### **Adaptive RAG:**

**Question:** "What is the capital of France?"

**Flow:**

- Judge: This is basic knowledge → Skip retrieval
- Generate: "Paris" (from parametric memory)
- Response: "Paris"

**Question:** "What was the stock price of NVIDIA yesterday?"

**Flow:**

- Judge: Recent info needed → Retrieve
- Retrieve: Latest NVIDIA stock data
- Generate: Answer with current price
- Response: Factual answer with citation

---

## **When to Use Each**

**Use Iterative when:**

- Building comprehensive reports
- Need deep coverage of a topic
- Quality matters more than speed

**Use Recursive when:**

- Question has clear logical structure
- Multi-hop reasoning required
- Can decompose into independent sub-problems

**Use Adaptive when:**

- Mixed query types (some simple, some complex)
- Need optimal efficiency
- Have well-trained models with control mechanisms
- Resource constraints (cost/latency)