# RLM â€” Recursive Language Model Wrapper

This notebook walks through using the `rlm` package to answer questions over
long contexts that exceed a single LLM's context window.

**Prerequisites:** Copy `.env.example` to `.env` and add your OpenAI API key.

In [1]:
from pathlib import Path

from dotenv import load_dotenv

load_dotenv(Path(".env"))

True

## 1. Setup

Create an OpenAI client and wrap it with `RLMWrapper`.

In [2]:
from openai import OpenAI
import os

from rlm import RLMConfig, RLMWrapper

wrapper = RLMWrapper(
    OpenAI(api_key=os.getenv("KIT-AI-KEY"), base_url=os.getenv("KIT-BASE-URL")),
    root_model=os.getenv("KIT-GPT-OSS-120b-MODEL"),
    sub_model=os.getenv("KIT-QWEN3-235b-a22b-instruct-MODEL"),
    config=RLMConfig(verbose=True),
)

## 2. Single-string context

Hide a fact deep inside a very long string and ask the model to find it.
The RLM loop will chunk the context and use sub-LLM calls to locate the answer.

In [3]:
long_text = (
    "The quick brown fox jumps over the lazy dog. " * 5000
    + "SECRET: The magic number is 42. "
    + "The quick brown fox jumps over the lazy dog. " * 5000
)

print(f"Context length: {len(long_text):,} characters")

Context length: 450,032 characters


In [5]:
response = wrapper.generate(
    query="What is the magic number hidden in the text?",
    context=long_text,
    on_event=lambda e: print(f"  [{e.type}] {e.preview[:80]}"),
)

print(f"\nAnswer: {response.answer}")
print(f"Iterations: {response.iterations}")
print(f"Sub-calls: {response.sub_calls}")
print(f"Tokens (in/out): {response.total_input_tokens}/{response.total_output_tokens}")

  [iteration_start] Starting iteration 1/25


httpx INFO: HTTP Request: POST https://api.example.com/v1 "HTTP/1.1 200 OK"
rlm.orchestrator INFO: iter 1  root output (96 chars)


  [code_generated] ```repl
# Let's inspect the beginning of the context to see its nature
print(con
  [code_executed] The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the 
  [iteration_start] Starting iteration 2/25


httpx INFO: HTTP Request: POST https://api.example.com/v1 "HTTP/1.1 200 OK"
rlm.orchestrator INFO: iter 2  root output (0 chars)


  [code_generated] 
  [iteration_start] Starting iteration 3/25


httpx INFO: HTTP Request: POST https://api.example.com/v1 "HTTP/1.1 200 OK"
rlm.orchestrator INFO: iter 3  root output (483 chars)


  [code_generated] ```repl
import re

# Find all occurrences of numbers (integers) in the context
n
  [code_executed] Found 1 numbers total.
. SECRET: The magic number is 42. The quick brown fox jum
  [iteration_start] Starting iteration 4/25


httpx INFO: HTTP Request: POST https://api.example.com/v1 "HTTP/1.1 200 OK"
rlm.orchestrator INFO: iter 4  root output (9 chars)


  [code_generated] FINAL(42)
  [final_answer] 42

Answer: 42
Iterations: 4
Sub-calls: 0
Tokens (in/out): 6219/500


## 3. Multi-document context

Pass a list of strings as context. Each string is a separate document.
The model can index into `context[i]` to inspect individual documents.

In [6]:
documents = [
    f"Document {i}: {'Lorem ipsum dolor sit amet. ' * 200}" for i in range(50)
]
documents[37] = (
    "Document 37: The annual revenue of Acme Corp in 2024 was $4.2 billion. "
    + "This was driven primarily by growth in the cloud services division. " * 100
)

print(f"{len(documents)} documents, total {sum(len(d) for d in documents):,} chars")

50 documents, total 281,898 chars


In [7]:
response = wrapper.generate(
    query="What was the annual revenue of Acme Corp in 2024?",
    context=documents,
    on_event=lambda e: print(f"  [{e.type}] {e.preview[:80]}"),
)

print(f"\nAnswer: {response.answer}")
print(f"Iterations: {response.iterations}")
print(f"Sub-calls: {response.sub_calls}")

  [iteration_start] Starting iteration 1/25


httpx INFO: HTTP Request: POST https://api.example.com/v1 "HTTP/1.1 200 OK"
rlm.orchestrator INFO: iter 1  root output (21 chars)


  [code_generated] FINAL({revenue_2024})
  [final_answer] {revenue_2024}

Answer: {revenue_2024}
Iterations: 1
Sub-calls: 0


## 4. Cost tracking

Configure per-token pricing to track the cost of a generation.

In [None]:
priced = RLMWrapper(
    OpenAI(api_key=os.getenv("KIT-AI-KEY"), base_url=os.getenv("KIT-BASE-URL")),
    root_model=os.getenv("KIT-GPT-OSS-120b-MODEL"),
    config=RLMConfig(
        cost_per_input_token=0.40 / 1_000_000,
        cost_per_output_token=1.60 / 1_000_000,
    ),
)

response = priced.generate(
    query="What is the magic number?",
    context=long_text,
)

print(f"Answer: {response.answer}")
print(f"Cost: ${response.cost:.4f}")

OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

## 5. Inspecting the REPL state

After generation, `response.repl_variables` shows what the model
computed in the REPL environment.

In [None]:
for name, summary in response.repl_variables.items():
    print(f"{name}: {summary}")