# Level 2 - Week 2 - 02 Chunking and Idempotent Ingestion

**Estimated time:** 60-90 minutes

## Learning Objectives

- Implement chunking with overlap
- Create stable chunk IDs
- Avoid duplicate ingestion


## Overview

Chunking controls retrieval quality and idempotency.
Stable chunk IDs prevent duplicate entries.

## Practice Steps

- Implement a fixed-size chunker.
- Choose a stable chunk_id strategy.


### Sample code

Fixed-size chunker with overlap.


In [None]:
from dataclasses import dataclass

@dataclass
class Chunk:
    text: str
    start: int
    end: int


def chunk_text(text: str, chunk_size: int = 1200, overlap: int = 200) -> list[Chunk]:
    if overlap < 0 or overlap >= chunk_size:
        raise ValueError('overlap must be >=0 and < chunk_size')
    chunks = []
    i = 0
    n = len(text)
    while i < n:
        j = min(i + chunk_size, n)
        chunks.append(Chunk(text=text[i:j], start=i, end=j))
        if j == n:
            break
        i = j - overlap
    return chunks


### Student fill-in

Add a chunk_id strategy (hash or doc_id + index).


In [None]:
def chunk_id_hash(text: str) -> str:
    # TODO: implement hash-based chunk_id
    return 'todo'

def chunk_id_index(doc_id: str, index: int) -> str:
    return f"{doc_id}#{index:05d}"


## Self-check

- Are chunk IDs stable across re-ingestion?
- Is overlap configured and documented?
