A Pi extension that brings Recursive Language Models to the Pi coding agent. Load a file, a directory, or raw text into a sandboxed Python REPL and let the agent explore it programmatically — with sub-LLM calls available for the parts that actually need natural-language reasoning.
Based on the RLM whitepaper (Zhang, Kraska, Khattab — 2025).
When you point a normal LLM at a 10 MB log file, it either drowns in tokens or summarizes badly. RLM flips the model: instead of stuffing everything into the context window, the agent gets a Python REPL with the data preloaded as a context variable. It writes code to explore, parse, regex, count, and aggregate — and only escalates to a sub-LLM (llm_query(...)) for tasks that genuinely require reading-comprehension of unstructured prose.
The result: deterministic operations stay deterministic, and the LLM hammer is reserved for the nails it's actually good at.
This is a Pi extension. Install with:
pi install git:github.com/ivanvza/pi-reeplRequires:
- A working Pi installation (
@mariozechner/pi-coding-agent≥ 0.70) python3on$PATH
Two entry points.
The agent calls this directly. Parameters:
| Param | Type | Notes |
|---|---|---|
query |
string | What you want to know or do. |
path |
string | File or directory. Supports @, ~/, relative, absolute. |
data |
string | Raw text. Mutually exclusive with path. |
maxIterations |
int | REPL turns the root LLM gets before being forced to submit. Default 15. |
model |
string | Override the model used for both the root and sub-LLM. Format: provider/id (e.g. anthropic/claude-haiku-4-5-20251001), or bare id if unambiguous in your ~/.pi/agent/models.json. Defaults to whatever model the calling agent is currently on. |
For path, files are loaded as-is; directories are walked and concatenated as <file name="...">...</file> blocks (10 MB cap, 1 MB per-file cap, skips node_modules and .git).
A thin forwarder. Two forms, with an optional --model flag:
/rlm [--model <provider/id>] <path> <query>
/rlm [--model <provider/id>] --data <inline text> -- <query>
It parses the args, then drops a user message asking the agent to call rlm_query. From there it's standard tool flow — Pi renders the call, streams progress inline, and Ctrl+O expands to the full code/output trajectory.
The active model is shown in the live status line as [model: provider/id] so you can verify the override actually took effect.
You don't have to use /rlm at all — just ask Pi naturally and it will pick up rlm_query when the data is too large to read directly. A grab-bag of things you can throw at it:
Pi session forensics
Use rlm_query on @~/.pi/agent/sessions/ — which tools fail most often,
broken down by tool name and error message?
Web server logs
/rlm ~/logs/nginx-access.log Find the top 10 IPs by 5xx response count
between 14:00 and 15:00 today.
JSONL event stream
Crunch @events.jsonl with rlm_query: per-day count of `event=signup` and
the median time-to-first-action for each cohort.
Codebase audit
Run rlm_query on @src/ and tell me every TODO/FIXME comment grouped
by file, with line numbers.
CSV aggregation
/rlm ~/Downloads/orders.csv group by region and give me total revenue
plus the top 3 SKUs per region.
Doc-sized markdown
Use rlm_query on @CHANGELOG.md to list every breaking change since v2.0,
in chronological order.
Mixed-format directory
/rlm @./samples/ I want a frequency map of every regex pattern found across
all files, grouped by file extension.
Inline blob
/rlm --data "very long log paste…" -- find the timestamp of the first
panic, and how many warnings preceded it.
Pin a specific model for the run
Useful when you want a cheap/fast model for a deterministic task, or a coder-tuned model for code-heavy crunching, without flipping the outer agent.
Natural language — Pi picks rlm_query and forwards the model name:
Use RLM with our coder model (qwen3-coder-next:cloud) on
@/Users/bandito/.pi/agent/sessions and count lines in files.
Or via the slash command directly:
/rlm --model qwen3-coder-next:cloud @/Users/bandito/.pi/agent/sessions count lines in files
Either way, the live status line will show [model: ollama_cloud/qwen3-coder-next:cloud] confirming the override took effect.
The agent decides between writing Python and falling back to llm_query (sub-LLM) on a per-step basis. Counting, parsing, regex, grouping → code. "What does this paragraph mean" → llm_query. The system prompt nudges it hard toward code for anything deterministic.
┌────────────────┐ Python code ┌──────────────────┐
│ Root LLM │ ─────────────────────▶ │ Python REPL │
│ (Pi session) │ │ (sandbox.ts + │
│ no tools — it │ ◀───────────────────── │ repl_server.py)│
│ must write │ stdout / errors │ │
│ ```python``` │ │ context = "..." │
└────────────────┘ └──────────────────┘
▲ │
│ │ llm_query(...)
│ ▼
│ ┌──────────────────┐
└────────────────────────────────│ Sub-LLM │
forwards reply │ (Pi session) │
└──────────────────┘
- The root LLM is created with no built-in tools (
tools: []) — it can only reply with text. The system prompt forces it to put executable Python in```pythonblocks. - Each block is run inside a long-lived Python subprocess. Variables persist across iterations.
llm_query(prompt)inside the REPL pauses the subprocess, the host calls a separate Pi session, and the reply is returned to the script.SUBMIT(answer)ends the loop and returns the result to the tool caller.
index.ts # Extension entry: registers rlm_query tool + /rlm command
src/
rlm-loop.ts # The iteration loop (root LLM ↔ sandbox ↔ sub-LLM)
sandbox.ts # Manages the Python subprocess + JSON protocol
prompts.ts # System prompt for the root LLM
types.ts # RlmStep, RlmResult
utils.ts # Misc helpers
repl/
repl_server.py # The sandbox process — reads JSON commands on stdin
MIT — see LICENSE.