A clean-room implementation of Recursive Language Models based on the research paper by Alex Zhang and Omar Khattab (MIT CSAIL, Oct 2025).
RLMs are an inference strategy that enables language models to process unbounded context by treating input as programmable variables in a REPL environment, rather than direct prompts.
- Traditional LLMs degrade as context grows (even within their context window)
- Example: GPT-5 solves <10% of tasks with 75k+ token histories
Instead of:
LLM("Here's 1M tokens... answer this question") # โ FailsDo this:
context = load_1M_tokens() # Store as REPL variable
# LLM programmatically explores it:
# - Peek at structure
# - Grep for patterns
# - Chunk intelligently
# - Recursively query sub-LLMs
# - Build answer incrementally- Unbounded Context: Handle 1M+ token inputs by treating context as data
- Recursive Exploration: Root LM can spawn sub-LLM calls via
llm_query() - Programmable: Full Python REPL for complex context manipulation
- Drop-in Replacement:
rlm.completion(context, query)replacesllm.completion(prompt) - Learnable Trajectories: Exploration strategies are trainable via RL
OOLONG Benchmark (128k+ tokens):
- RLM(GPT-4o-mini) outperforms GPT-5 by +33% (2x performance)
- Cheaper than GPT-5 per query
- Even when context fits in window, RLM wins
BrowseComp-Plus (1000 docs = 10M+ tokens):
- RLM(GPT-5): Perfect performance maintained
- Base GPT-5: Significant degradation
- Handles contexts that don't fit in any model's window
recursive-language-model/
โโโ rlm/
โ โโโ __init__.py # Package initialization
โ โโโ rlm.py # Base RLM abstract class
โ โโโ rlm_repl.py # Main RLM implementation
โ โโโ repl.py # REPL environment
โ โโโ utils/
โ โ โโโ llm.py # OpenAI client wrapper
โ โ โโโ prompts.py # Prompt templates
โ โ โโโ utils.py # Helper functions
โ โโโ logger/
โ โโโ root_logger.py # Root LM logger
โ โโโ repl_logger.py # REPL execution logger (Jupyter-style)
โโโ main.py # Example demonstrations
โโโ requirements.txt # Dependencies
โโโ .env.example # Environment variable template
โโโ README.md # This file
cd recursive-language-modelpip install -r requirements.txtRequired:
openai>=1.0.0- OpenAI API clientpython-dotenv>=1.0.0- Environment variable management
Optional:
rich>=13.0.0- Beautiful terminal output (highly recommended)
Create a .env file:
cp .env.example .envEdit .env and add your OpenAI API key:
OPENAI_API_KEY=sk-...
from rlm.rlm_repl import RLM_REPL
# Create RLM instance
rlm = RLM_REPL(
model="gpt-4o", # Root LM
recursive_model="gpt-4o-mini", # Sub-LM for recursion
enable_logging=True, # Show execution logs
max_iterations=10 # Max reasoning steps
)
# Use it like a normal LLM call
context = "Your huge context here..." # Can be 1M+ tokens!
query = "What is the magic number?"
answer = rlm.completion(context=context, query=query)
print(answer)python main.pyChoose from:
- Needle-in-Haystack - Find a number in 1M lines (intensive)
- Multi-Document Reasoning - Answer from multiple sources
- Counting and Aggregation - OOLONG-style task
- Simple Test - Quick validation (recommended to start)
User Query + Huge Context
โ
[RLM_REPL]
โ
Root LM (depth=0)
- Sees: Query only
- Has: REPL with `context` variable
- Can: Write Python code
โ
[REPL Env]
- Executes code
- Provides llm_query() for recursion
- Stores intermediate results
โ
Sub-LM calls (depth=1)
- Process chunks semantically
- Return results to REPL
โ
Root LM builds final answer
- Uses FINAL() or FINAL_VAR()
โ
Answer
for iteration in range(max_iterations):
# 1. Root LM decides next action
response = root_llm.completion(messages)
# 2. Extract and execute code blocks
if "```repl" in response:
execute_in_repl(code)
add_results_to_messages()
# 3. Check for final answer
if "FINAL(" in response:
return extract_answer()Available in REPL environment:
# 1. Context variable
context # Your huge input, loaded as Python variable
# 2. Recursive LLM query
result = llm_query("Summarize this chunk: " + chunk)
# 3. Final answer
FINAL("The answer is 42") # Direct answer
FINAL_VAR(my_answer) # Return a variableThe RLM autonomously discovers these patterns:
```repl
# Check structure first
print(type(context))
print(len(context))
print(context[:1000]) # Preview
### 2. Grepping
```python
```repl
import re
matches = re.findall(r'magic number is (\d+)', context)
print(matches)
### 3. Partition + Map
```python
```repl
# Chunk and query each
chunks = [context[i:i+50000] for i in range(0, len(context), 50000)]
results = []
for chunk in chunks:
result = llm_query(f"Find X in: {chunk}")
results.append(result)
### 4. Summarization
```python
```repl
sections = context.split("###")
summaries = [llm_query(f"Summarize: {s}") for s in sections]
final = llm_query(f"Answer based on: {summaries}")
## โ๏ธ Configuration Options
```python
RLM_REPL(
api_key=None, # OpenAI API key (or use env var)
model="gpt-4o", # Root LM model
recursive_model="gpt-4o-mini", # Sub-LM model (cheaper)
max_iterations=20, # Max reasoning steps
depth=0, # Recursion depth (future use)
enable_logging=True, # Colorful execution logs
track_costs=True # Track API usage and costs
)
rlm = RLM_REPL(track_costs=True)
answer = rlm.completion(context, query)
# Get cost summary
summary = rlm.cost_summary()
print(f"Total cost: ${summary['estimated_cost_usd']}")
print(f"Total tokens: {summary['total_tokens']}")
print(f"API calls: {summary['total_calls']}")from rlm.rlm_repl import RLM_REPL
# Test basic functionality
context = "Alice has 5 apples. Bob has 3 oranges."
query = "How many fruits total?"
rlm = RLM_REPL(
model="gpt-4o-mini",
enable_logging=True
)
result = rlm.completion(context, query)
assert "8" in result
print("โ Test passed!")python main.py
# Choose option 4 for quick test
# Choose option 5 for comprehensive suite (expensive!)- Context: 1M lines of random text
- Task: Find hidden magic number
- Strategy: Binary search, grepping, chunking
- Demonstrates: Handling massive contexts
- Context: 100 documents
- Task: Multi-hop question across docs
- Strategy: Extract relevant docs, aggregate info
- Demonstrates: Information synthesis
- Context: 5000 entries with metadata
- Task: Count entries matching criteria
- Strategy: Filter, map, reduce pattern
- Demonstrates: Structured data processing
- Context: Small text snippet
- Task: Basic arithmetic
- Strategy: Direct computation
- Demonstrates: Basic functionality
# String context
rlm.completion("Plain text...", "Query?")
# Structured data
rlm.completion({"key": "value", "data": [...]}, "Query?")
# List of messages
rlm.completion([
{"role": "user", "content": "Message 1"},
{"role": "assistant", "content": "Response 1"},
], "Query?")rlm = RLM_REPL()
# First query
answer1 = rlm.completion(context1, query1)
# Reset state
rlm.reset()
# Fresh query (no contamination)
answer2 = rlm.completion(context2, query2)# Quick tasks
rlm = RLM_REPL(max_iterations=5)
# Complex tasks
rlm = RLM_REPL(max_iterations=30)RLMs now support parallel LLM queries for dramatic speed improvements:
# OLD WAY - Sequential (slow)
results = []
for chunk in chunks:
result = llm_query(f"Process: {chunk}")
results.append(result)
# NEW WAY - Parallel (much faster!)
prompts = [f"Process: {chunk}" for chunk in chunks]
results = llm_query_batch(prompts) # All at once!Available functions:
llm_query(prompt)- Synchronous single queryllm_query_batch(prompts)- Parallel batch queries (recommended!)llm_query_async(prompt)- Async single query (for await)llm_query_batch_async(prompts)- Async batch queries
Sub-RLMs can now spawn their own RLMs for nested recursive reasoning:
# Enable depth > 1
rlm = RLM_REPL(
model="gpt-4o",
max_depth=2, # Allow nested RLM calls!
)
# Now sub-RLMs can recursively call other RLMs
# Useful for hierarchical data processingDepth levels:
max_depth=1(default): Sub-LLMs only, no recursionmax_depth=2: Sub-RLMs can spawn their own sub-LLMsmax_depth=3+: Deeper nesting for complex hierarchies
Use cases:
- Hierarchical document structures (company โ dept โ team)
- Multi-level summarization
- Recursive problem decomposition
- Tree-structured data processing
Run the test suite:
python test_async_depth.pyTests include:
- Batch Execution - Parallel LLM query performance
- Async Execution - Async/await syntax in REPL
- Depth=2 Recursion - Nested RLM calls
- Depth=3 Recursion - Deep nesting
- No Prefix Caching: Each call is independent (future optimization)
- OpenAI Only: Other providers not yet supported
- Thread-based Async: Uses executor, not true async LLM calls (future: native async)
- True Async LLM Calls: Native async OpenAI client
- Prefix Caching: Reuse common context prefixes
- Multi-Provider: Anthropic, local models
- Streaming: Real-time execution feedback
- RL Training: Learn optimal exploration strategies
- Visualization: Interactive trajectory viewer
Recursive Language Models Alex Zhang, Omar Khattab MIT CSAIL, October 2025 https://alexzhang13.github.io/blog/2025/rlm/
"RLMs are designed on the principle that fundamentally, LMs should decide how to break down problems to be digestible for an LM."
"If tomorrow, the best frontier LM can handle 10M tokens, then an RLM can handle 100M tokens (maybe at half the cost)."
This is a clean-room implementation for educational purposes. Feel free to:
- Report issues
- Suggest improvements
- Add new examples
- Extend functionality
MIT License - See original research paper for citation.
- Alex Zhang & Omar Khattab - Original RLM research
- MIT CSAIL - Research institution
- OpenAI - API infrastructure
@article{zhang2025rlm,
title = "Recursive Language Models",
author = "Zhang, Alex and Khattab, Omar",
year = "2025",
month = "October",
url = "https://alexzhang13.github.io/blog/2025/rlm/"
}Built with โค๏ธ to explore the future of long-context AI