Skip to content

joshideas/rlm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

RLM - Recursive Language Model Skill for Claude Code

A Claude Code skill that implements the Recursive Language Model (RLM) paradigm for processing arbitrarily large documents.

Background

This skill is inspired by the paper:

Recursive Language Models Alex L. Zhang, Tim Kraska, Omar Khattab MIT CSAIL arXiv:2512.24601v1 [cs.AI] 31 Dec 2025

The paper introduces RLMs as a general inference strategy that treats long prompts as part of an external environment rather than feeding them directly into the LLM's context window. This allows LLMs to:

  • Process inputs up to two orders of magnitude beyond their context window limits
  • Dramatically outperform base LLMs on long-context tasks
  • Maintain comparable or lower cost per query

Prompted by

@_joshideas (x.com)

Core Insight

From the paper:

"The key insight is that long prompts should not be fed into the neural network (e.g., Transformer) directly but should instead be treated as part of the environment that the LLM can symbolically interact with."

An RLM exposes the same interface as an LLM (string in, string out) but internally:

  1. Loads the prompt as a variable in a Python REPL environment
  2. Allows the LLM to write code that peeks into and decomposes the prompt
  3. Enables recursive self-calls on programmatic snippets of the variable
  4. Aggregates results into a final response

Implementation

This skill adapts the RLM paradigm to Claude Code's architecture:

File Structure

~/.claude/skills/rlm/
├── SKILL.md          # Skill definition with YAML frontmatter
├── rlm_loader.py     # Python utilities for document manipulation
├── .venv/            # Virtual environment with pdfplumber
└── README.md         # This file

Components

1. Document Loader (rlm_loader.py)

Provides programmatic access to large documents without loading them into context:

from rlm_loader import load_document

doc = load_document('/path/to/file.pdf')

doc.get_info()                    # Metadata: pages, lines, chars
doc.get_page(0)                   # Extract specific page
doc.get_lines(10, 50)             # Extract line range
doc.search("keyword")             # Find matches with context
doc.search(r"\d{4}", regex=True)  # Regex search
doc.chunk_by_pages()              # Split into page chunks
doc.chunk_by_chars(4000)          # Split by character count

2. Sub-Agent Processing

Uses Claude Code's Task tool to spawn parallel sub-agents for chunk processing:

  • Each sub-agent receives a chunk + specific question
  • Multiple sub-agents run concurrently
  • Results are aggregated by the root agent

3. Skill Definition (SKILL.md)

Instructs Claude Code on the RLM workflow:

  • Load and examine document structure
  • Select appropriate strategy (search, scan, parallel chunks, recursive decomposition)
  • Execute using code-based filtering and sub-agents
  • Synthesize final answer

Supported Formats

  • PDF (via pdfplumber)
  • Plain text (.txt)
  • Markdown (.md)
  • Code files (.py, .js, .ts, etc.)
  • JSON, XML, HTML, CSV
  • Log files

Usage

Invoke the skill:

/rlm /path/to/document.pdf "What are the key findings?"

Or reference a large file in conversation - the skill activates automatically for documents that would exceed reasonable context limits.

Strategies (from the paper)

Strategy When to Use Processing Cost
Targeted Search Looking for specific information Constant
Section Read Questions about specific parts Constant
Parallel Chunks Aggregation, summarization Linear
Recursive Decomposition Multi-hop reasoning, pairwise analysis Linear to Quadratic

Key Principles

From the paper's findings:

  1. Never load entire documents - Use programmatic access to minimize context usage
  2. Filter with code first - Use regex/keyword search before reading content
  3. Parallelize sub-agents - Launch multiple Task calls in single response
  4. Sub-agents are stateless - Provide all needed context in their prompt
  5. Verify before answering - Retrieve more context if uncertain

Performance (from paper benchmarks)

RLMs demonstrated:

  • 91.33% accuracy on BrowseComp+ (vs 70.47% for summarization baseline)
  • 58.00% F1 on OOLONG-Pairs (vs 0.04% for base GPT-5)
  • Strong performance at 10M+ token scale
  • Comparable or lower cost than base model calls at median

References

License

This implementation is provided as-is.

About

Recursive language search tool

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages