# Chunk Windowing

Once you've matched the top n similar chunks to a query, you could just put each chunk by itself into context. This will work ok, but can we do better? Maybe you could put whatever document the chunk came from entirely into context. This is better than supplying the chunk in isolation and works well if all the documents are consistent in size and relatively small. But what if there are really large documents in the corpus? You could end up sending a bunch of irrelevant context to the LLM. Maybe this is somewhat ok too, but it comes with additional latency and cost. If you're looking for a middle ground, you could try chunk windowing.

<!-- more -->

Chunk windowing is pretty straightforward. Choose a window size. Don't overthink it. We are just trying to find a good middle ground between sending too much context (the whole document) and too little (a single chunk). Four is a good default. Just fetch 2 chunks before and one after the current chunk for a total of 4 chunks. Concatenate them and provide it all together in context. It's that simple.

In [25]:
def expand_slice(chunks, match_index, n: int = 4):
    half_window = n // 2
    start = max(0, match_index - half_window)
    end = min(len(chunks), start + n)
    if end == len(chunks):
        start = max(0, end - n)
    print(f"start: {start}, end: {end}")
    return chunks[start:end]


document_chunks = ["A", "B", "C", "D", "E", "F", "G", "H"]

In [23]:
expand_slice(document_chunks, 1)

start: 0, end: 4


['A', 'B', 'C', 'D']

In [22]:
expand_slice(document_chunks, 4)

start: 2, end: 6


['C', 'D', 'E', 'F']

In [26]:
expand_slice(document_chunks, 7)

start: 4, end: 8


['E', 'F', 'G', 'H']