Skip to content

hapesc/my-RDagent

Repository files navigation

my-RDagent

Agent-first research and development loop: propose, code, execute, evaluate, repeat.

What This Repo Ships

  • Skills under skills/ — high-level orchestration (rd-agent) and stage-specific skills (rd-propose, rd-code, rd-execute, rd-evaluate)
  • CLI tool catalog via rdagent-tool — direct inspection and primitive operations for when a skill boundary is insufficient
  • Python package rd_agent — contracts, orchestration services, ports, and algorithms backing the skill and CLI surfaces
  • Enterprise infrastructure — PostgreSQL state store, Prometheus metrics, structured logging, HTTP API, resource-gated concurrency
  • Full-chain execution tracing — crash-safe JSONL trace from agent reasoning through graph engine decisions
  • Eval suite — acceptance tests (AC-1/AC-2), property-based tests, Kaggle LLM E2E harness
  • Tests — regression suites that lock the public surface and contracts

The public surface is transport-free: skills first, CLI tools second, no server abstraction.

Repository Setup

uv sync --extra test

One-command setup with skill installation and verification:

bash scripts/setup_env.sh                           # Claude, local, quick
bash scripts/setup_env.sh --all --scope-all --full-verify  # everything

Agent Skill Setup

The installer copies canonical skills/ packages into Claude/Codex runtime roots and creates a managed runtime bundle for CLI tool execution.

# Local install (repo-scoped)
uv run python scripts/install_agent_skills.py --runtime claude --scope local
uv run python scripts/install_agent_skills.py --runtime codex --scope local

# Global install (home-scoped)
uv run python scripts/install_agent_skills.py --runtime claude --scope global
uv run python scripts/install_agent_skills.py --runtime codex --scope global

Files are copied — the install is self-contained and independent of the source repo.

The installer writes:

  • Skills into .codex/skills/ or .claude/skills/ (local) or ~/.codex/skills/ / ~/.claude/skills/ (global)
  • A managed standalone runtime bundle at .codex/rd-agent/, .claude/rd-agent/, ~/.codex/rd-agent/, or ~/.claude/rd-agent/

Direct CLI catalog commands should be called from that installed runtime bundle root, not from an unrelated caller repo.

Start -> Inspect -> Continue

The operator playbook for using the pipeline.

Start

Use rd-agent first. It routes plain-language intent through persisted state:

  • If a paused run exists, it recommends the matching continuation skill
  • If preflight blockers exist, it surfaces the blocker and a repair action
  • If starting fresh, it recommends multi-branch exploration by default

Round execution modes: host_parallel, host_sequential, local_sequential, blocked, unknown. rd-agent records the best verified round mode before branch dispatch instead of asking the operator to pre-pick a host path. Use local_sequential as the rollback-safe path when host-assisted execution is unavailable or unverified.

For the public start contract, see skills/rd-agent/SKILL.md.

Inspect

Inspect current state before continuing. Use the skill contract first; drop to rd-tool-catalog only when you need a specific CLI tool:

If a round is degraded or blocked, inspect persisted round truth first, apply one recovery action, then continue through rd-agent after truth is repaired. Simulated evidence verifies V3 contract behavior; it does not prove that a real host runtime behaved the same way. Treat signal loss as an inspect-first state: re-check host results and persist repaired truth before continuing.

cd ~/.codex/rd-agent
uv run rdagent-tool list
uv run rdagent-tool describe rd_run_start

Continue

Route to the stage skill matching the paused run:

Stage Skill Entrypoint
Framing rd-propose rd_agent.entry.rd_propose.rd_propose
Build rd-code rd_agent.entry.rd_code.rd_code
Verify rd-execute rd_agent.entry.rd_execute.rd_execute
Synthesize rd-evaluate rd_agent.entry.rd_evaluate.rd_evaluate

Each skill package at skills/<name>/SKILL.md has the exact continuation contract and field-level details.

Default Orchestration

  • Skill: skills/rd-agent/SKILL.md
  • Entrypoint: rd_agent.entry.rd_agent.rd_agent
  • Purpose: start or continue the loop across single-branch and multi-branch execution

Two multi-branch contracts:

  • branch_hypotheses — label-only multi-branch exploration (legacy)
  • hypothesis_specs — structured exploration with DAG topology, parent selection, dynamic pruning, cross-branch sharing, holdout finalization, and standardized ranking. Holdout finalization is enabled when you provide holdout_evaluation_port; default split / evaluation / embedding helpers are available via rd_agent.ports.defaults

Optional embedding adapters can be injected through rd_agent(..., embedding_port=...). For example, an Ollama-backed adapter can live outside the core defaults:

from rd_agent.adapters import OllamaEmbeddingPort
from rd_agent.entry.rd_agent import rd_agent

result = rd_agent(
    ...,
    hypothesis_specs=specs,
    embedding_port=OllamaEmbeddingPort(model="embeddinggemma"),
)

Pull the embedding model in Ollama before use, for example:

ollama pull embeddinggemma

When finalization completes, the response is finalization-first: the holdout winner is the selected branch.

CLI Tool Catalog

  • Skill: skills/rd-tool-catalog/SKILL.md
  • Module: rd_agent.entry.tool_catalog
  • CLI: rdagent-tool
uv run rdagent-tool list                       # list all tools
uv run rdagent-tool describe rd_run_start      # inspect one tool
uv run rdagent-tool describe rd_explore_round

Tool categories: orchestration, inspection, primitives. Primitive subcategories: branch_lifecycle, branch_knowledge, branch_selection, memory.

Routing Model

  1. rd-agent — default entry unless already inside a known stage
  2. Stage skillsrd-propose / rd-code / rd-execute / rd-evaluate when working inside one owned stage
  3. rd-tool-catalog — selective downshift when a skill boundary is insufficient
  4. Narrow by category → primitive subcategory → specific tool

Verification

Quick gate:

make test-quick

Full gate:

make test
make lint
uv run lint-imports

Enterprise Infrastructure

Optional production features — install with extras:

pip install rd-agent[postgres]       # PostgreSQL state store
pip install rd-agent[observability]  # structlog + Prometheus
pip install rd-agent[api]            # FastAPI HTTP server
pip install rd-agent[enterprise]     # all of the above

Observability

  • Structured logging: structlog with contextvars propagation (run_id, branch_id, stage_key)
  • Prometheus metrics: 8 collectors (runs, rounds, branches, dispatch, stages, state ops, memory ops, trace events)
  • Full-chain tracing: crash-safe JSONL at .state/runs/{run_id}/trace.jsonl

Execution Tracing

Trace events cover three layers:

Layer Events Source
Graph Engine round_start/end, branch_spawn/merge/prune, convergence_eval MultiBranchService
Stage Execution stage_start/complete/decision, error SkillLoopService
Agent Reasoning agent_response, dispatch_result Dispatch adapters, ReceiptCollectionService

Query traces:

cat .state/runs/*/trace.jsonl | jq .kind | sort | uniq -c
cat .state/runs/{run_id}/trace.jsonl | jq 'select(.branch_id=="br-1")'

Configure: configure_tracing(enabled=True, root=".state") — auto-configured in all entry points.

Agent-level hooks: AgentTraceHook protocol for Python-side capture, hooks/post_tool_trace.sh for CC PostToolUse capture.

See dev_doc/TRACING_ARCHITECTURE.md for full design.

Crash Recovery

Wave dispatch uses checkpoint-and-resume:

  • WaveDispatchCheckpoint persisted after each wave
  • DispatchRecoveryAssessment determines resume/restart/skip action
  • Resource gate (SlotBasedResourceGate) bounds concurrent branch execution

See dev_doc/RESOURCE_CONCURRENCY_ARCHITECTURE.md for details.

HTTP API

uvicorn rd_agent.api.app:create_app --factory

Routes: POST /api/v1/runs, GET /api/v1/runs/{run_id}, GET /health

Eval Suite

Acceptance tests under tests/eval/:

Suite Purpose
AC-1 (test_ac1_*) Single-round, multi-round, finalization correctness
AC-2 (test_ac2_*) PUCT, pruning, holdout, decay, convergence, fault injection
Kaggle E2E LLM agent on Titanic competition with trajectory scoring
python -m pytest tests/eval/ -v              # all eval tests
python -m tests.eval.kaggle.run_llm_e2e      # LLM E2E (requires Claude Code)

See dev_doc/EVAL_SPECIFICATION.md for design.

Layout

my-RDagent/
  pyproject.toml
  .importlinter
  Makefile
  scripts/
    setup_env.sh
    install_agent_skills.py
    bump_version.py
  hooks/
    post_tool_trace.sh              # CC PostToolUse trace hook
  skills/
    _shared/references/             # cross-skill shared context
    rd-agent/                       # orchestration skill
      SKILL.md
      workflows/
      references/
    rd-propose/                     # framing stage
    rd-code/                        # build stage
    rd-execute/                     # verify stage
    rd-evaluate/                    # synthesize stage
    rd-tool-catalog/                # CLI tool inspection
  rd_agent/
    adapters/                       # concrete implementations (filesystem, postgres)
    algorithms/                     # pure math: decay, PUCT, pruning, holdout, merge
    api/                            # FastAPI HTTP routes and middleware
    compat/legacy/                  # legacy translation seam (isolated)
    contracts/                      # pydantic data contracts (18 models)
    devtools/                       # skill installer
    entry/                          # public entrypoints and CLI
    observability/                  # logging, metrics, tracing
    orchestration/                  # service layer (40+ modules)
    ports/                          # abstract ports (9 interfaces)
    tools/                          # CLI tool implementations
  tests/
    eval/                           # AC-1/AC-2 acceptance + Kaggle E2E
  dev_doc/                          # architecture documents
  docs/CODEMAPS/                    # token-lean architecture maps

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages