my-RDagent

Agent-first research and development loop: propose, code, execute, evaluate, repeat.

What This Repo Ships

Skills under skills/ — high-level orchestration (rd-agent) and stage-specific skills (rd-propose, rd-code, rd-execute, rd-evaluate)
CLI tool catalog via rdagent-tool — direct inspection and primitive operations for when a skill boundary is insufficient
Python package rd_agent — contracts, orchestration services, ports, and algorithms backing the skill and CLI surfaces
Enterprise infrastructure — PostgreSQL state store, Prometheus metrics, structured logging, HTTP API, resource-gated concurrency
Full-chain execution tracing — crash-safe JSONL trace from agent reasoning through graph engine decisions
Eval suite — acceptance tests (AC-1/AC-2), property-based tests, Kaggle LLM E2E harness
Tests — regression suites that lock the public surface and contracts

The public surface is transport-free: skills first, CLI tools second, no server abstraction.

Repository Setup

uv sync --extra test

One-command setup with skill installation and verification:

bash scripts/setup_env.sh                           # Claude, local, quick
bash scripts/setup_env.sh --all --scope-all --full-verify  # everything

Agent Skill Setup

The installer copies canonical skills/ packages into Claude/Codex runtime roots and creates a managed runtime bundle for CLI tool execution.

# Local install (repo-scoped)
uv run python scripts/install_agent_skills.py --runtime claude --scope local
uv run python scripts/install_agent_skills.py --runtime codex --scope local

# Global install (home-scoped)
uv run python scripts/install_agent_skills.py --runtime claude --scope global
uv run python scripts/install_agent_skills.py --runtime codex --scope global

Files are copied — the install is self-contained and independent of the source repo.

The installer writes:

Skills into .codex/skills/ or .claude/skills/ (local) or ~/.codex/skills/ / ~/.claude/skills/ (global)
A managed standalone runtime bundle at .codex/rd-agent/, .claude/rd-agent/, ~/.codex/rd-agent/, or ~/.claude/rd-agent/

Direct CLI catalog commands should be called from that installed runtime bundle root, not from an unrelated caller repo.

Start -> Inspect -> Continue

The operator playbook for using the pipeline.

Start

Use rd-agent first. It routes plain-language intent through persisted state:

If a paused run exists, it recommends the matching continuation skill
If preflight blockers exist, it surfaces the blocker and a repair action
If starting fresh, it recommends multi-branch exploration by default

Round execution modes: host_parallel, host_sequential, local_sequential, blocked, unknown. rd-agent records the best verified round mode before branch dispatch instead of asking the operator to pre-pick a host path. Use local_sequential as the rollback-safe path when host-assisted execution is unavailable or unverified.

For the public start contract, see skills/rd-agent/SKILL.md.

Inspect

Inspect current state before continuing. Use the skill contract first; drop to rd-tool-catalog only when you need a specific CLI tool:

If a round is degraded or blocked, inspect persisted round truth first, apply one recovery action, then continue through rd-agent after truth is repaired. Simulated evidence verifies V3 contract behavior; it does not prove that a real host runtime behaved the same way. Treat signal loss as an inspect-first state: re-check host results and persist repaired truth before continuing.

cd ~/.codex/rd-agent
uv run rdagent-tool list
uv run rdagent-tool describe rd_run_start

Continue

Route to the stage skill matching the paused run:

Stage	Skill	Entrypoint
Framing	`rd-propose`	`rd_agent.entry.rd_propose.rd_propose`
Build	`rd-code`	`rd_agent.entry.rd_code.rd_code`
Verify	`rd-execute`	`rd_agent.entry.rd_execute.rd_execute`
Synthesize	`rd-evaluate`	`rd_agent.entry.rd_evaluate.rd_evaluate`

Each skill package at skills/<name>/SKILL.md has the exact continuation contract and field-level details.

Default Orchestration

Skill: skills/rd-agent/SKILL.md
Entrypoint: rd_agent.entry.rd_agent.rd_agent
Purpose: start or continue the loop across single-branch and multi-branch execution

Two multi-branch contracts:

branch_hypotheses — label-only multi-branch exploration (legacy)
hypothesis_specs — structured exploration with DAG topology, parent selection, dynamic pruning, cross-branch sharing, holdout finalization, and standardized ranking. Holdout finalization is enabled when you provide holdout_evaluation_port; default split / evaluation / embedding helpers are available via rd_agent.ports.defaults

Optional embedding adapters can be injected through rd_agent(..., embedding_port=...). For example, an Ollama-backed adapter can live outside the core defaults:

from rd_agent.adapters import OllamaEmbeddingPort
from rd_agent.entry.rd_agent import rd_agent

result = rd_agent(
    ...,
    hypothesis_specs=specs,
    embedding_port=OllamaEmbeddingPort(model="embeddinggemma"),
)

Pull the embedding model in Ollama before use, for example:

ollama pull embeddinggemma

When finalization completes, the response is finalization-first: the holdout winner is the selected branch.

CLI Tool Catalog

Skill: skills/rd-tool-catalog/SKILL.md
Module: rd_agent.entry.tool_catalog
CLI: rdagent-tool

uv run rdagent-tool list                       # list all tools
uv run rdagent-tool describe rd_run_start      # inspect one tool
uv run rdagent-tool describe rd_explore_round

Tool categories: orchestration, inspection, primitives. Primitive subcategories: branch_lifecycle, branch_knowledge, branch_selection, memory.

Routing Model

rd-agent — default entry unless already inside a known stage
Stage skills — rd-propose / rd-code / rd-execute / rd-evaluate when working inside one owned stage
rd-tool-catalog — selective downshift when a skill boundary is insufficient
Narrow by category → primitive subcategory → specific tool

Verification

Quick gate:

make test-quick

Full gate:

make test
make lint
uv run lint-imports

Enterprise Infrastructure

Optional production features — install with extras:

pip install rd-agent[postgres]       # PostgreSQL state store
pip install rd-agent[observability]  # structlog + Prometheus
pip install rd-agent[api]            # FastAPI HTTP server
pip install rd-agent[enterprise]     # all of the above

Observability

Structured logging: structlog with contextvars propagation (run_id, branch_id, stage_key)
Prometheus metrics: 8 collectors (runs, rounds, branches, dispatch, stages, state ops, memory ops, trace events)
Full-chain tracing: crash-safe JSONL at .state/runs/{run_id}/trace.jsonl

Execution Tracing

Trace events cover three layers:

Layer	Events	Source
Graph Engine	`round_start/end`, `branch_spawn/merge/prune`, `convergence_eval`	MultiBranchService
Stage Execution	`stage_start/complete/decision`, `error`	SkillLoopService
Agent Reasoning	`agent_response`, `dispatch_result`	Dispatch adapters, ReceiptCollectionService

Query traces:

cat .state/runs/*/trace.jsonl | jq .kind | sort | uniq -c
cat .state/runs/{run_id}/trace.jsonl | jq 'select(.branch_id=="br-1")'

Configure: configure_tracing(enabled=True, root=".state") — auto-configured in all entry points.

Agent-level hooks: AgentTraceHook protocol for Python-side capture, hooks/post_tool_trace.sh for CC PostToolUse capture.

See dev_doc/TRACING_ARCHITECTURE.md for full design.

Crash Recovery

Wave dispatch uses checkpoint-and-resume:

WaveDispatchCheckpoint persisted after each wave
DispatchRecoveryAssessment determines resume/restart/skip action
Resource gate (SlotBasedResourceGate) bounds concurrent branch execution

See dev_doc/RESOURCE_CONCURRENCY_ARCHITECTURE.md for details.

HTTP API

uvicorn rd_agent.api.app:create_app --factory

Routes: POST /api/v1/runs, GET /api/v1/runs/{run_id}, GET /health

Eval Suite

Acceptance tests under tests/eval/:

Suite	Purpose
AC-1 (`test_ac1_*`)	Single-round, multi-round, finalization correctness
AC-2 (`test_ac2_*`)	PUCT, pruning, holdout, decay, convergence, fault injection
Kaggle E2E	LLM agent on Titanic competition with trajectory scoring

python -m pytest tests/eval/ -v              # all eval tests
python -m tests.eval.kaggle.run_llm_e2e      # LLM E2E (requires Claude Code)

See dev_doc/EVAL_SPECIFICATION.md for design.

Layout

my-RDagent/
  pyproject.toml
  .importlinter
  Makefile
  scripts/
    setup_env.sh
    install_agent_skills.py
    bump_version.py
  hooks/
    post_tool_trace.sh              # CC PostToolUse trace hook
  skills/
    _shared/references/             # cross-skill shared context
    rd-agent/                       # orchestration skill
      SKILL.md
      workflows/
      references/
    rd-propose/                     # framing stage
    rd-code/                        # build stage
    rd-execute/                     # verify stage
    rd-evaluate/                    # synthesize stage
    rd-tool-catalog/                # CLI tool inspection
  rd_agent/
    adapters/                       # concrete implementations (filesystem, postgres)
    algorithms/                     # pure math: decay, PUCT, pruning, holdout, merge
    api/                            # FastAPI HTTP routes and middleware
    compat/legacy/                  # legacy translation seam (isolated)
    contracts/                      # pydantic data contracts (18 models)
    devtools/                       # skill installer
    entry/                          # public entrypoints and CLI
    observability/                  # logging, metrics, tracing
    orchestration/                  # service layer (40+ modules)
    ports/                          # abstract ports (9 interfaces)
    tools/                          # CLI tool implementations
  tests/
    eval/                           # AC-1/AC-2 acceptance + Kaggle E2E
  dev_doc/                          # architecture documents
  docs/CODEMAPS/                    # token-lean architecture maps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

my-RDagent

What This Repo Ships

Repository Setup

Agent Skill Setup

Start -> Inspect -> Continue

Start

Inspect

Continue

Default Orchestration

CLI Tool Catalog

Routing Model

Verification

Enterprise Infrastructure

Observability

Execution Tracing

Crash Recovery

HTTP API

Eval Suite

Layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 415 Commits
.github/workflows		.github/workflows
.planning		.planning
dev_doc		dev_doc
docs/CODEMAPS		docs/CODEMAPS
hooks		hooks
rd_agent		rd_agent
scripts		scripts
skills		skills
tests		tests
.gitignore		.gitignore
.importlinter		.importlinter
CHANGELOG.md		CHANGELOG.md
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

my-RDagent

What This Repo Ships

Repository Setup

Agent Skill Setup

Start -> Inspect -> Continue

Start

Inspect

Continue

Default Orchestration

CLI Tool Catalog

Routing Model

Verification

Enterprise Infrastructure

Observability

Execution Tracing

Crash Recovery

HTTP API

Eval Suite

Layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages