# RLM — Recursive Language Model Wrapper

This notebook walks through using the `rlm` package to answer questions over
long contexts that exceed a single LLM's context window.

**Prerequisites:** Copy `.env.example` to `.env` and add your OpenAI API key.

In [None]:
from pathlib import Path

from dotenv import load_dotenv

load_dotenv(Path(".env"))

## 1. Setup

Create an OpenAI client and wrap it with `RLMWrapper`.

In [None]:
from openai import OpenAI

from rlm import RLMConfig, RLMWrapper

wrapper = RLMWrapper(
    OpenAI(),
    root_model="gpt-4.1-mini",
    sub_model="gpt-4.1-mini",
    config=RLMConfig(verbose=True),
)

## 2. Single-string context

Hide a fact deep inside a very long string and ask the model to find it.
The RLM loop will chunk the context and use sub-LLM calls to locate the answer.

In [None]:
long_text = (
    "The quick brown fox jumps over the lazy dog. " * 5000
    + "SECRET: The magic number is 42. "
    + "The quick brown fox jumps over the lazy dog. " * 5000
)

print(f"Context length: {len(long_text):,} characters")

In [None]:
response = wrapper.generate(
    query="What is the magic number hidden in the text?",
    context=long_text,
    on_event=lambda e: print(f"  [{e.type}] {e.preview[:80]}"),
)

print(f"\nAnswer: {response.answer}")
print(f"Iterations: {response.iterations}")
print(f"Sub-calls: {response.sub_calls}")
print(f"Tokens (in/out): {response.total_input_tokens}/{response.total_output_tokens}")

## 3. Multi-document context

Pass a list of strings as context. Each string is a separate document.
The model can index into `context[i]` to inspect individual documents.

In [None]:
documents = [
    f"Document {i}: {'Lorem ipsum dolor sit amet. ' * 200}" for i in range(50)
]
documents[37] = (
    "Document 37: The annual revenue of Acme Corp in 2024 was $4.2 billion. "
    + "This was driven primarily by growth in the cloud services division. " * 100
)

print(f"{len(documents)} documents, total {sum(len(d) for d in documents):,} chars")

In [None]:
response = wrapper.generate(
    query="What was the annual revenue of Acme Corp in 2024?",
    context=documents,
    on_event=lambda e: print(f"  [{e.type}] {e.preview[:80]}"),
)

print(f"\nAnswer: {response.answer}")
print(f"Iterations: {response.iterations}")
print(f"Sub-calls: {response.sub_calls}")

## 4. Cost tracking

Configure per-token pricing to track the cost of a generation.

In [None]:
priced = RLMWrapper(
    OpenAI(),
    root_model="gpt-4.1-mini",
    config=RLMConfig(
        cost_per_input_token=0.40 / 1_000_000,
        cost_per_output_token=1.60 / 1_000_000,
    ),
)

response = priced.generate(
    query="What is the magic number?",
    context=long_text,
)

print(f"Answer: {response.answer}")
print(f"Cost: ${response.cost:.4f}")

## 5. Inspecting the REPL state

After generation, `response.repl_variables` shows what the model
computed in the REPL environment.

In [None]:
for name, summary in response.repl_variables.items():
    print(f"{name}: {summary}")