Skip to content

leefowlercu/codex-rlm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

codex-rlm

Codex-based Recursive Language Model harness skill with recursive sub-agents, tmux observability, and filesystem-backed symbolic orchestration.

What Is RLM

RLM (Recursive Language Model) is a pattern for solving tasks that are too large or complex for a single prompt/context window.

Instead of forcing one agent to hold the full problem at once, a root agent:

  1. Decomposes the task into smaller sub-tasks.
  2. Delegates those sub-tasks to child agents.
  3. Aggregates child outputs into a final result.

This enables deeper reasoning over large inputs while keeping each sub-agent focused and bounded.

How This Harness Implements RLM

This harness implements RLM with a simple, inspectable runtime:

  • bash as the orchestration/program layer (rlm-query, rlm-batch)
  • Filesystem as the data/symbol layer (.rlm/<task>/... artifacts)
  • codex exec as the sub-agent runtime
  • One tmux session per sub-agent for live observability
  • Recursive depth propagation via environment variables (RLM_DEPTH, RLM_MAX_DEPTH)
  • Global slot-based concurrency control across the recursion tree

Each sub-agent writes explicit artifacts (prompt, events, stderr, exit code, done markers), so runs are auditable and debuggable.

What It Is Good For

Use this harness when a task benefits from decomposition and aggregation:

  • Large codebase analysis and change planning
  • Long-document synthesis and structured summarization
  • Map-reduce style extraction/classification over many files
  • Multi-part migration or refactor tasks with parallelizable chunks
  • Work requiring strong run observability (tmux + filesystem traces)

Skill layout

codex-rlm/
  SKILL.md
  prompts/
    rlm-agent.md
  scripts/
    rlm-query
    rlm-batch
  agents/
    openai.yaml

Requirements

To run this skill reliably, you need:

  • Codex CLI installed and available on PATH (default binary: codex)
  • Valid Codex/OpenAI authentication for codex exec
  • tmux installed and available on PATH (one session per sub-agent)
  • POSIX shell environment with bash
  • Standard Unix utilities used by scripts: sed, head, wc, tr, mkdir, rm, touch, sleep, kill, find, sort
  • Read/write access to your working directory (creates .rlm/<task>/... artifacts)
  • Read/write access to CODEX_HOME (defaults to ~/.codex) for nested Codex session/runtime state
  • Network access for model/API calls made by codex exec

Notes:

  • Default runtime model is gpt-5.3-codex (override with --model or RLM_MODEL).
  • Default execution profile is workspace-write + approval_policy=never.
  • RLM_SKIP_GIT_REPO_CHECK=1 by default. If you set it to 0, run inside a Git repository.

Installation

Install from this repo as canonical source.

mkdir -p ~/.codex/skills
ln -sfn /path/to/codex-rlm/codex-rlm ~/.codex/skills/codex-rlm

Alternative copy install:

rm -rf ~/.codex/skills/codex-rlm
cp -R /path/to/codex-rlm/codex-rlm ~/.codex/skills/codex-rlm

Quickstart

SKILL_DIR="${CODEX_RLM_SKILL_DIR:-$HOME/.codex/skills/codex-rlm}"
export PATH="${SKILL_DIR}/scripts:$PATH"

TASK="demo"
mkdir -p .rlm/$TASK
cat > .rlm/$TASK/task.md << 'PROMPT'
Summarize the most important risks in /path/to/large-file.txt.
PROMPT

rlm-query .rlm/$TASK/task.md .rlm/$TASK/result.out \
  --task "$TASK" \
  --max-depth 2 \
  --model gpt-5.3-codex &
RLM_PID=$!

# Optional monitoring
rlm-monitor --task "$TASK" --pid "$RLM_PID"

wait "$RLM_PID"
cat .rlm/$TASK/result.out

Benchmarking

Use scripts/benchmark.sh to run repeated A/B comparisons between:

  • rlm-query (recursive harness mode)
  • codex exec (direct single-agent baseline)

Default benchmark inputs:

  • Task prompt: .rlm/<task-name>/task.md
  • Dataset path inside prompt: testdata/<huge-text-file>.txt
  • Output directory: bench/rlm-vs-direct-<timestamp>/

Run with defaults (5 runs):

./scripts/benchmark.sh

Run with custom settings:

./scripts/benchmark.sh \
  --runs 3 \
  --task-prompt .rlm/<task-name>/task.md \
  --model gpt-5.3-codex \
  --max-depth 2 \
  --max-parallel 10 \
  --timeout 1200 \
  --sandbox danger-full-access \
  --approval-policy never \
  --out-dir bench/<task-name>-ab

Artifacts produced per benchmark run:

  • results.csv: per-run metrics for both modes (latency, tokens, exit status)
  • summary.txt: aggregate medians/means and success rates
  • run-*/rlm/* and run-*/direct/*: raw logs and outputs for drill-down

Defaults

  • RLM_MODEL=gpt-5.3-codex
  • RLM_SANDBOX_MODE=workspace-write
  • RLM_APPROVAL_POLICY=never
  • RLM_MAX_DEPTH=3
  • RLM_MAX_PARALLEL=15
  • RLM_TIMEOUT=1200
  • RLM_SKIP_GIT_REPO_CHECK=1
  • RLM_WORKSPACE_NETWORK_ACCESS=1 (workspace-write only)
  • RLM_WORKSPACE_ALLOW_CODEX_HOME=1 (workspace-write only)
  • RLM_CODEX_HOME_DIR=$CODEX_HOME or ~/.codex (workspace-write writable root target)

Override via env or command flags on rlm-query.

About

Codex-based RLM Harness

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages