```{contents}
```

## MapReduce Chain

### What a MapReduce Chain Is

A **MapReduce Chain** is a LangChain abstraction designed to **process large inputs by splitting them into chunks**, handling each chunk independently (**Map phase**), and then **combining the partial results into a final answer** (**Reduce phase**).

> It is inspired by the classic **MapReduce** pattern used in distributed systems.

This chain is commonly used for **summarization, analysis, and aggregation** over large documents.

---

### Why MapReduce Chain Exists

LLMs have **context window limits**.
Large documents cannot be processed in a single prompt.

MapReduce solves this by:

* Scaling to large inputs
* Avoiding context overflow
* Enabling parallel processing
* Improving reliability on long texts

---

### Conceptual Flow

```
Large Input
   ↓
Split into chunks
   ↓
Map: Process each chunk independently
   ↓
Reduce: Combine partial results
   ↓
Final Output
```

---

### Core Phases Explained

### Map Phase

* Each chunk is sent to the LLM independently
* Produces partial outputs (e.g., summaries, insights)

Example:

```
Chunk 1 → Summary 1
Chunk 2 → Summary 2
Chunk 3 → Summary 3
```

---

### Reduce Phase

* All partial outputs are combined
* LLM synthesizes a final result

Example:

```
[Summary 1, Summary 2, Summary 3] → Final Summary
```

---

### Basic MapReduce Chain Demonstration

#### Step 1: Define Map Prompt



In [3]:
from langchain_core.prompts import PromptTemplate

map_prompt = PromptTemplate.from_template(
    "Summarize the following text:\n{text}"
)




---

#### Step 2: Define Reduce Prompt



In [4]:
reduce_prompt = PromptTemplate.from_template(
    "Combine the following summaries into a concise final summary:\n{text}"
)




---

#### Step 3: Create the MapReduce Chain



In [6]:
from langchain_classic.chains.summarize import load_summarize_chain
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

chain = load_summarize_chain(
    llm=llm,
    chain_type="map_reduce",
    map_prompt=map_prompt,
    combine_prompt=reduce_prompt
)






---

#### Step 4: Run the Chain

In [8]:
from langchain_core.documents import Document

# Create sample documents
documents = [
    Document(page_content="Artificial Intelligence is transforming industries worldwide. Machine learning algorithms enable computers to learn from data and improve their performance over time without explicit programming."),
    Document(page_content="Natural Language Processing allows machines to understand and generate human language. Applications include chatbots, translation services, and sentiment analysis."),
    Document(page_content="Computer vision enables machines to interpret and understand visual information from the world. It powers technologies like facial recognition, autonomous vehicles, and medical image analysis.")
]

In [9]:
result = chain.invoke({"input_documents": documents})
print(result["output_text"])


Artificial Intelligence is transforming industries through machine learning algorithms that enable computers to autonomously improve their performance. Key areas of AI include Natural Language Processing (NLP), which allows machines to understand and generate human language for applications like chatbots and translation services, and computer vision, which enables the interpretation of visual information for technologies such as facial recognition and autonomous vehicles.




Here:

* `documents` is a list of `Document` objects
* Each document is processed in the map phase

---

### Internal Execution (Simplified)

```
for doc in documents:
    map_output.append(LLM(map_prompt(doc)))

final_output = LLM(reduce_prompt(map_output))
```

---

### Common Use Cases

* Long document summarization
* Meeting transcript summarization
* Log analysis
* Report aggregation
* Multi-document RAG summarization

---

### MapReduce vs Stuff Chain

| Aspect       | MapReduce       | Stuff        |
| ------------ | --------------- | ------------ |
| Scalability  | High            | Low          |
| Context size | Small per chunk | Entire input |
| Parallelism  | Yes             | No           |
| Cost         | Higher          | Lower        |

---

### MapReduce vs Refine Chain

| Aspect     | MapReduce        | Refine     |
| ---------- | ---------------- | ---------- |
| Processing | Parallel         | Sequential |
| Latency    | Lower (parallel) | Higher     |
| Accuracy   | Medium           | Higher     |

---

### Limitations of MapReduce Chain

* ❌ More LLM calls (higher cost)
* ❌ Loss of global context in map phase
* ❌ Legacy abstraction
* ❌ Less control than LCEL

---

### MapReduce Chain vs LCEL (Modern Approach)

### Legacy MapReduce

```python
load_summarize_chain(chain_type="map_reduce")
```

### LCEL Equivalent (Conceptual)

```python
chunks | map_chain | reduce_chain
```

LCEL allows:

* Custom aggregation
* Parallelism
* Streaming
* Better observability

---

### When to Use MapReduce Chain

* Very large documents
* Summarization tasks
* When context limits are hit
* Legacy LangChain codebases

---

### When NOT to Use It

* Small inputs
* High-precision reasoning
* Agent-based workflows
* New projects (prefer LCEL)

---

### Best Practices

* Use deterministic settings (temperature = 0)
* Keep map outputs concise
* Limit number of chunks
* Validate reduce output
* Monitor token usage

---

### Interview-Ready Summary

> “A MapReduce Chain in LangChain processes large inputs by splitting them into chunks, summarizing or analyzing each chunk independently, and then combining the results. It enables scalable LLM processing but is a legacy abstraction superseded by LCEL.”

---

### Rule of Thumb

* **Large documents → MapReduce**
* **Small documents → Stuff**
* **Incremental understanding → Refine**
* **New systems → LCEL-based pipelines**

