codex-rlm

Codex-based Recursive Language Model harness skill with recursive sub-agents, tmux observability, and filesystem-backed symbolic orchestration.

What Is RLM

RLM (Recursive Language Model) is a pattern for solving tasks that are too large or complex for a single prompt/context window.

Instead of forcing one agent to hold the full problem at once, a root agent:

Decomposes the task into smaller sub-tasks.
Delegates those sub-tasks to child agents.
Aggregates child outputs into a final result.

This enables deeper reasoning over large inputs while keeping each sub-agent focused and bounded.

How This Harness Implements RLM

This harness implements RLM with a simple, inspectable runtime:

bash as the orchestration/program layer (rlm-query, rlm-batch)
Filesystem as the data/symbol layer (.rlm/<task>/... artifacts)
codex exec as the sub-agent runtime
One tmux session per sub-agent for live observability
Recursive depth propagation via environment variables (RLM_DEPTH, RLM_MAX_DEPTH)
Global slot-based concurrency control across the recursion tree

Each sub-agent writes explicit artifacts (prompt, events, stderr, exit code, done markers), so runs are auditable and debuggable.

What It Is Good For

Use this harness when a task benefits from decomposition and aggregation:

Large codebase analysis and change planning
Long-document synthesis and structured summarization
Map-reduce style extraction/classification over many files
Multi-part migration or refactor tasks with parallelizable chunks
Work requiring strong run observability (tmux + filesystem traces)

Skill layout

codex-rlm/
  SKILL.md
  prompts/
    rlm-agent.md
  scripts/
    rlm-query
    rlm-batch
  agents/
    openai.yaml

Requirements

To run this skill reliably, you need:

Codex CLI installed and available on PATH (default binary: codex)
Valid Codex/OpenAI authentication for codex exec
tmux installed and available on PATH (one session per sub-agent)
POSIX shell environment with bash
Standard Unix utilities used by scripts: sed, head, wc, tr, mkdir, rm, touch, sleep, kill, find, sort
Read/write access to your working directory (creates .rlm/<task>/... artifacts)
Read/write access to CODEX_HOME (defaults to ~/.codex) for nested Codex session/runtime state
Network access for model/API calls made by codex exec

Notes:

Default runtime model is gpt-5.3-codex (override with --model or RLM_MODEL).
Default execution profile is workspace-write + approval_policy=never.
RLM_SKIP_GIT_REPO_CHECK=1 by default. If you set it to 0, run inside a Git repository.

Installation

Install from this repo as canonical source.

mkdir -p ~/.codex/skills
ln -sfn /path/to/codex-rlm/codex-rlm ~/.codex/skills/codex-rlm

Alternative copy install:

rm -rf ~/.codex/skills/codex-rlm
cp -R /path/to/codex-rlm/codex-rlm ~/.codex/skills/codex-rlm

Quickstart

SKILL_DIR="${CODEX_RLM_SKILL_DIR:-$HOME/.codex/skills/codex-rlm}"
export PATH="${SKILL_DIR}/scripts:$PATH"

TASK="demo"
mkdir -p .rlm/$TASK
cat > .rlm/$TASK/task.md << 'PROMPT'
Summarize the most important risks in /path/to/large-file.txt.
PROMPT

rlm-query .rlm/$TASK/task.md .rlm/$TASK/result.out \
  --task "$TASK" \
  --max-depth 2 \
  --model gpt-5.3-codex &
RLM_PID=$!

# Optional monitoring
rlm-monitor --task "$TASK" --pid "$RLM_PID"

wait "$RLM_PID"
cat .rlm/$TASK/result.out

Benchmarking

Use scripts/benchmark.sh to run repeated A/B comparisons between:

rlm-query (recursive harness mode)
codex exec (direct single-agent baseline)

Default benchmark inputs:

Task prompt: .rlm/<task-name>/task.md
Dataset path inside prompt: testdata/<huge-text-file>.txt
Output directory: bench/rlm-vs-direct-<timestamp>/

Run with defaults (5 runs):

./scripts/benchmark.sh

Run with custom settings:

./scripts/benchmark.sh \
  --runs 3 \
  --task-prompt .rlm/<task-name>/task.md \
  --model gpt-5.3-codex \
  --max-depth 2 \
  --max-parallel 10 \
  --timeout 1200 \
  --sandbox danger-full-access \
  --approval-policy never \
  --out-dir bench/<task-name>-ab

Artifacts produced per benchmark run:

results.csv: per-run metrics for both modes (latency, tokens, exit status)
summary.txt: aggregate medians/means and success rates
run-*/rlm/* and run-*/direct/*: raw logs and outputs for drill-down

Defaults

RLM_MODEL=gpt-5.3-codex
RLM_SANDBOX_MODE=workspace-write
RLM_APPROVAL_POLICY=never
RLM_MAX_DEPTH=3
RLM_MAX_PARALLEL=15
RLM_TIMEOUT=1200
RLM_SKIP_GIT_REPO_CHECK=1
RLM_WORKSPACE_NETWORK_ACCESS=1 (workspace-write only)
RLM_WORKSPACE_ALLOW_CODEX_HOME=1 (workspace-write only)
RLM_CODEX_HOME_DIR=$CODEX_HOME or ~/.codex (workspace-write writable root target)

Override via env or command flags on rlm-query.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
codex-rlm		codex-rlm
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

codex-rlm

What Is RLM

How This Harness Implements RLM

What It Is Good For

Skill layout

Requirements

Installation

Quickstart

Benchmarking

Defaults

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

codex-rlm

What Is RLM

How This Harness Implements RLM

What It Is Good For

Skill layout

Requirements

Installation

Quickstart

Benchmarking

Defaults

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages