```{contents}
```
### Types of RAG (Retrieval-Augmented Generation)



**Retrieval-Augmented Generation (RAG)** is a pattern where an LLM:

1. **Retrieves external knowledge**
2. **Uses that knowledge to generate grounded answers**

Different **types of RAG** exist based on **how retrieval, context handling, and reasoning are done**.

---

###  Naive RAG (Basic RAG)

**Description**

The simplest RAG form:

* Retrieve top-k chunks
* Stuff them into the prompt
* Generate an answer

```
Query → Vector Search → Stuff Context → LLM
```

---

**Demonstration (LangChain – Naive RAG)**

```python
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer using only the context"),
    ("human", "Context:\n{context}\n\nQuestion:\n{question}")
])

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
)

chain.invoke("How does ticket escalation work?")
```

---

**Pros / Cons**

✅ Simple
❌ Hallucinations if context is weak
❌ No reranking or validation

---

### Stuff RAG

**Description**

All retrieved chunks are **stuffed into a single prompt**.

```
Retrieve → Concatenate → LLM
```

Used internally by many early RAG examples.

---

**Demonstration**

```python
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type="stuff"
)
```

---

**When to Use**

* Small datasets
* Few short chunks

---

### MapReduce RAG

**Description**

Processes large documents safely:

* Map: summarize each chunk
* Reduce: combine summaries

```
Retrieve → Map (per chunk) → Reduce → LLM
```

---

**Demonstration**

```python
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type="map_reduce"
)
```

---

**When to Use**

* Large documents
* Long PDFs
* Reports

---

### Refine RAG

**Description**

Builds the answer **incrementally**:

* First chunk → initial answer
* Next chunks → refine answer

```
Chunk 1 → Answer
Chunk 2 → Refine
Chunk 3 → Refine
```

---

**Demonstration**

```python
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type="refine"
)
```

---

**When to Use**

* Ordered documents
* High accuracy summaries

---

###  Conversational RAG

**Description**

Adds **chat memory**:

* Rewrites follow-up questions
* Retrieves context-aware results

```
Chat History → Question Condenser → Retrieve → Answer
```

---

**Demonstration**

```python
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

conv_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    memory=memory
)
```

---

**When to Use**

* Chatbots
* Multi-turn Q&A

---

### 6. Hybrid RAG

**Description**

Combines:

* **Keyword search (BM25)**
* **Vector search**

```
BM25 + Vector → Merge → Rerank → LLM
```

---

**Demonstration**

```python
hybrid_retriever = EnsembleRetriever(
    retrievers=[bm25, vector_retriever],
    weights=[0.4, 0.6]
)
```

---

**When to Use**

* Enterprise RAG
* Error codes, IDs, logs

---

### 7. Reranked RAG (Production Standard)

**Description**

Adds **precision** using rerankers.

```
Retrieve (Top 20) → Rerank → Top 5 → LLM
```

---

**Demonstration**

```python
reranker = CrossEncoderReranker(
    model=HuggingFaceCrossEncoder("BAAI/bge-reranker-base"),
    top_n=5
)

compression_retriever = ContextualCompressionRetriever(
    base_retriever=retriever,
    base_compressor=reranker
)
```

---

**When to Use**

* High-accuracy systems
* IT support, legal, finance

---

### MMR-based RAG (Diversity-Aware)

**Description**

Uses **Maximal Marginal Relevance** to reduce redundancy.

```
Retrieve → MMR → Diverse Context → LLM
```

---

**Demonstration**

```python
retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 5, "fetch_k": 20, "lambda_mult": 0.6}
)
```

---

**When to Use**

* Repetitive documents
* Chunk overlap heavy data

---

### Agentic RAG

**Description**

LLM **decides how and when to retrieve**:

* Iterative retrieval
* Tool usage
* Multi-step reasoning

```
Reason → Retrieve → Reason → Retrieve → Answer
```

---

**Demonstration (Conceptual)**

```python
agent = create_react_agent(
    llm=llm,
    tools=[retriever_tool]
)
```

---

**When to Use**

* Complex queries
* Multi-hop reasoning

---

###  Contextual / Adaptive RAG

**Description**

Retrieval strategy changes dynamically:

* Based on query type
* Based on confidence
* Based on token budget

```
Simple query → Small RAG
Complex query → Hybrid + Rerank
```

---

**Demonstration (Conceptual LCEL)**

```python
def route(query):
    return hybrid_chain if len(query) > 50 else simple_chain
```

---

### RAG Type Comparison

| RAG Type       | Accuracy  | Cost      | Complexity |
| -------------- | --------- | --------- | ---------- |
| Naive          | Low       | Low       | Low        |
| Stuff          | Low       | Low       | Low        |
| MapReduce      | Medium    | Medium    | Medium     |
| Refine         | High      | High      | Medium     |
| Conversational | Medium    | Medium    | Medium     |
| Hybrid         | High      | Medium    | High       |
| Reranked       | Very High | High      | High       |
| Agentic        | Very High | Very High | Very High  |

---

### Production-Grade RAG (Recommended Stack)

```
Hybrid Retrieval
   ↓
MMR
   ↓
Reranker
   ↓
Context Compression
   ↓
LLM
   ↓
Source Attribution
```

---

### Interview-Ready Summary

> “RAG systems range from naive stuffing approaches to advanced hybrid, reranked, and agentic architectures. Production-grade RAG typically combines hybrid retrieval, reranking, MMR, and strict context control to maximize accuracy and minimize hallucinations.”

---

### Rule of Thumb

* **Prototype → Naive / Stuff RAG**
* **Large docs → MapReduce / Refine**
* **Chat → Conversational RAG**
* **Enterprise → Hybrid + Rerank**
* **Complex reasoning → Agentic RAG**

