```{contents}
```
## Chunk Overlap 


**Chunk overlap** is the practice of **repeating a small portion of text between consecutive chunks** when splitting documents.

> It ensures that **context at chunk boundaries is not lost** during embedding and retrieval.

Chunk overlap is applied by **Text Splitters**, not by retrievers or LLMs.

---

### Why Chunk Overlap Is Necessary

Without overlap:

* Sentences get cut in half
* Important references are split
* Retrieval misses relevant context
* Answers become incomplete or wrong

Overlap preserves **semantic continuity** across chunks.

---

### Conceptual Example

### Original Text

```
LangChain is a framework for building LLM-powered applications.
It supports prompts, chains, agents, and retrieval.
```

---

### Without Overlap

```
Chunk 1: LangChain is a framework for building LLM-powered
Chunk 2: applications. It supports prompts, chains, agents, and retrieval.
```

Meaning is fragmented.

---

### With Overlap (Overlap = 5 words)

```
Chunk 1: LangChain is a framework for building LLM-powered applications
Chunk 2: building LLM-powered applications. It supports prompts, chains, agents, and retrieval
```

Context is preserved.

---

### Where Chunk Overlap Fits in the Pipeline

```
Document Loader
   ↓
Text Splitter (chunk_size + chunk_overlap)
   ↓
Chunks
   ↓
Embeddings
   ↓
Vector Store
```

Chunk overlap is an **ingestion-time decision**.

---

### Chunk Size vs Chunk Overlap

### Chunk Size

* Maximum length of each chunk

### Chunk Overlap

* Portion of text reused in the next chunk

Example:

```text
chunk_size = 500
chunk_overlap = 50
```

Each new chunk starts **50 characters/tokens before** the previous chunk ends.

---

### Demonstration (LangChain)



In [1]:
from langchain_classic.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=20
)

long_text = """
Large language models (LLMs) are very large deep learning models that are pre-trained on vast amounts of data. 
The underlying transformer is a set of neural networks that consist of an encoder and a decoder with self-attention capabilities. 
The encoder and decoder extract meanings from a sequence of text and understand the relationships between words and phrases in it.

Transformer LLMs are capable of unsupervised training, although a more precise explanation is that transformers perform self-learning. 
It is through this process that transformers learn to understand basic grammar, languages, and knowledge.

Unlike earlier recurrent neural networks (RNN) that sequentially process inputs, transformers process entire sequences in parallel. 
This allows the data scientists to use GPUs for training transformer-based LLMs, significantly reducing the training time.
"""

chunks = splitter.split_text(long_text)

for i, chunk in enumerate(chunks):
    print(f"Chunk {i}:\n{chunk}\n")



Chunk 0:
Large language models (LLMs) are very large deep learning models that are pre-trained on vast

Chunk 1:
pre-trained on vast amounts of data.

Chunk 2:
The underlying transformer is a set of neural networks that consist of an encoder and a decoder

Chunk 3:
and a decoder with self-attention capabilities.

Chunk 4:
The encoder and decoder extract meanings from a sequence of text and understand the relationships

Chunk 5:
the relationships between words and phrases in it.

Chunk 6:
Transformer LLMs are capable of unsupervised training, although a more precise explanation is that

Chunk 7:
explanation is that transformers perform self-learning.

Chunk 8:
It is through this process that transformers learn to understand basic grammar, languages, and

Chunk 9:
languages, and knowledge.

Chunk 10:
Unlike earlier recurrent neural networks (RNN) that sequentially process inputs, transformers

Chunk 11:
transformers process entire sequences in parallel.

Chunk 12:
This allows the data s



You will observe:

* Last 20 characters of Chunk N
* Reappear at the start of Chunk N+1

---

### How Overlap Improves Retrieval

### Without Overlap

* Query matches partial concept
* Chunk score too low
* Not retrieved

### With Overlap

* Full concept appears in at least one chunk
* Higher embedding similarity
* Better recall

---

### Chunk Overlap in RAG Answers

Overlap helps when:

* Definitions span multiple sentences
* Pronouns reference earlier context
* Steps are explained across paragraphs
* Code blocks cross chunk boundaries

---

### Choosing the Right Overlap Size

### General Rule

```
chunk_overlap ≈ 10–20% of chunk_size
```

---

### Recommended Values

| Chunk Size | Overlap |
| ---------- | ------- |
| 300        | 30–50   |
| 500        | 50–80   |
| 800        | 80–120  |

---

### Overlap: Character vs Token

#### Character-Based Overlap

* Simpler
* Language-agnostic
* Slightly imprecise for tokens

#### Token-Based Overlap

* Precise for model limits
* More expensive
* Used with `TokenTextSplitter`

---

#### Too Little Overlap (Problems)

* Broken sentences
* Lost references
* Lower recall
* Incomplete answers

---

### Too Much Overlap (Problems)

* Duplicate embeddings
* Higher storage cost
* Redundant retrieval
* Slower indexing

---

### Common Mistakes

#### Overlap = 0

❌ Almost always bad for RAG

#### Overlap > 50%

❌ Wasteful and noisy

#### Changing overlap at query time

❌ Overlap is ingestion-time only

---

### Chunk Overlap vs Sliding Window

| Concept | Chunk Overlap       | Sliding Window        |
| ------- | ------------------- | --------------------- |
| Purpose | Preserve boundaries | Continuous context    |
| Cost    | Low                 | High                  |
| Usage   | RAG ingestion       | Time-series / streams |

---

### Best Practices

* Always use some overlap
* Tune overlap per document type
* Increase overlap for dense text
* Reduce overlap for repetitive text
* Keep overlap stable once indexed

---

### Interview-Ready Summary

> “Chunk overlap is the intentional repetition of text between adjacent chunks to preserve semantic continuity. It improves retrieval recall and answer quality in RAG systems and is configured at ingestion time via text splitters.”

---

### Rule of Thumb

* **No overlap → broken context**
* **10–20% overlap → optimal**
* **More overlap → more cost**
* **Tune once, index once**
