---
title: "Recursive"
icon: "arrows-spin"
---


<Warning>
This chain will not be released! 
    
It seems pretty unimportant these days and replceable with a simple map reduce given that context windows are so large.

Likely it'll appear for some speciality use cases and in these cases users will probably can optimize the graph on their own.
</Warning>



```mermaid
graph TD
    %% Level 0 - Original Docs
    A1[Doc1] --> B1[Sum1]
    A2[Doc2] --> B2[Sum2]
    A3[Doc3] --> B3[Sum3]
    A4[Doc4] --> B4[Sum4]
    A5[Doc5] --> B5[Sum5]
    A6[Doc6] --> B6[Sum6]
    A7[Doc7] --> B7[Sum7]
    A8[Doc8] --> B8[Sum8]

    %% Level 1 - First Combines
    B1 --> C1[CombSum1]
    B2 --> C1
    B3 --> C2[CombSum2]
    B4 --> C2
    B5 --> C3[CombSum3]
    B6 --> C3
    B7 --> C4[CombSum4]
    B8 --> C4

    %% Level 2 - Mega Combines
    C1 --> D1[MegaSum1]
    C2 --> D1
    C3 --> D2[MegaSum2]
    C4 --> D2

    %% Level 3 - Final Summary
    D1 --> E[FINAL_SUMMARY]
    D2 --> E
```


## Example dataset


This text is sourced from [Project Gutenberg](https://www.gutenberg.org/ebooks/2600) and is in the public domain. Redistribution is permitted, but the following attribution must be preserved:

> This eBook is for the use of anyone anywhere at no cost and with
> almost no restrictions whatsoever. You may copy it, give it away or
> re-use it under the terms of the Project Gutenberg License included
> with this eBook or online at [www.gutenberg.org](https://www.gutenberg.org).
>
> Public domain text provided by Project Gutenberg:
> [https://www.gutenberg.org/ebooks/2600](https://www.gutenberg.org/ebooks/2600)




## 🛠️ Step 1: Download the Text

In [1]:
from pathlib import Path
import requests

# URL of the plain text file from Project Gutenberg
url = "https://www.gutenberg.org/cache/epub/1184/pg1184.txt"
output_path = Path("war_and_peace_gutenberg.txt")

# Check if file already exists
if output_path.exists():
    print(f"File '{output_path}' already exists. Skipping download.")
else:
    response = requests.get(url)
    if response.status_code == 200:
        output_path.write_text(response.text + attribution, encoding="utf-8")
        print(f"Downloaded and saved to '{output_path}' with attribution.")
    else:
        print(f"Failed to download. Status code: {response.status_code}")


File 'war_and_peace_gutenberg.txt' already exists. Skipping download.


## 🧱 Step 2: Split Text into Chunks

In [2]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text = output_path.read_text()

splitter = RecursiveCharacterTextSplitter(
    chunk_size=100_000,
    chunk_overlap=500,
)

texts = splitter.split_text(text)
print(f"Chunks created: {len(texts)}")

Chunks created: 27


## 🧾 Step 3: Convert to Document Format

In [3]:
from langchain_core.documents import Document

documents = [Document(page_content=chunk) for chunk in texts]

## 🔄 Step 4: Define Output Schema (Optional)

In [4]:
from pydantic import BaseModel, Field

class Person(BaseModel):
    name: str
    age: str | None = None
    hair_color: str | None = None
    source_doc_ids: list[str] = Field(
        default=[],
        description="The IDs of the documents where the information was found."
    )

class PeopleRoot(BaseModel):
    people: list[Person]


## 🤖 Step 5: Build Recursive Summarizer

In [5]:
from langchain.chains import create_recursive_document_chain
from langchain.chat_models import init_chat_model

# Choose model ID (adjust to what your setup supports)
model = init_chat_model("claude-opus-4-20250514")

summarizer = create_recursive_document_chain(
    model,
    map_prompt="Produce a summary in bullet points with up to 3 bullets.",
).compile(name="RecursiveSummarizer")

## 🚀 Step 6: Run Summarization

In [6]:
output = summarizer.invoke({"documents": documents[:8]})
print(output)

Output parser received a `max_tokens` stop reason. The output is likely incomplete—please increase `max_tokens` or shorten your prompt.
Traceback (most recent call last):
  File "/home/eugene/.cache/uv/archive-v0/H7PJAEZVghiAsX_gNYVSD/lib/python3.12/site-packages/langchain_core/output_parsers/openai_tools.py", line 336, in parse_result
    pydantic_objects.append(name_dict[res["type"]](**res["args"]))
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/eugene/.cache/uv/archive-v0/H7PJAEZVghiAsX_gNYVSD/lib/python3.12/site-packages/pydantic/main.py", line 253, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for PeopleRoot
people
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev

ValidationError: 1 validation error for PeopleRoot
people
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing