Skip to content

[1/4] RFC 005: Agentic Harness Integration#387

Merged
Darktex merged 3 commits into
mainfrom
feature/issue-385-agentic-harnesses
Feb 18, 2026
Merged

[1/4] RFC 005: Agentic Harness Integration#387
Darktex merged 3 commits into
mainfrom
feature/issue-385-agentic-harnesses

Conversation

@Darktex
Copy link
Copy Markdown
Contributor

@Darktex Darktex commented Feb 17, 2026

Summary

  • Adds RFC 005 defining how OpenEnv integrates with external agentic harnesses (OpenClaw, Claude Code, Gemini CLI, Goose, etc.)
  • Updates RFC README to include RFC 004 and 005

Key Design Decisions

  1. Wrapping pattern: OpenEnv container provides filesystem, harness runs inside as a subprocess
  2. MCP tool injection: Environment tools plugged into harness config before session start
  3. Production mode: OpenEnv proxies to harness, handles session management only
  4. Simulation mode: Training loop controls episode boundaries; step() triggers entire harness ReAct loop
  5. OpenClaw as first harness: Concrete adapter to validate the abstraction

New Abstractions

  • HarnessConfig - Pydantic model for harness configuration
  • HarnessAdapter - ABC for harness-specific lifecycle management
  • HarnessEnvironment - MCPEnvironment subclass wrapping a harness
  • HarnessTransport - Enum for communication transport (stdio, HTTP, MCP)
  • Tool conflict resolution utilities

Open Questions (for review)

  1. Trajectory format standardization
  2. Intermediate observability (streaming harness steps)
  3. Multi-step episodes vs single-step sessions
  4. Harness-level reward shaping
  5. Resource limits (max LLM calls, tokens, tool invocations)

Test plan

  • RFC review and approval by maintainers
  • Validate design against OpenClaw's actual configuration format
  • Verify backward compatibility claim (no changes to existing types)
  • Subsequent PRs will add implementation with TDD

Closes #385

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 17, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Feb 17, 2026

Greptile Summary

This PR introduces RFC 005 (Agentic Harness Integration) and implements foundational components from RFC 004 (Rubrics) to support LLM-as-a-judge evaluation.

RFC 005 - Agentic Harness Integration:

  • Proposes a wrapping pattern where OpenEnv containers provide filesystem/sandbox while harnesses (OpenClaw, Claude Code, etc.) run inside
  • Defines HarnessConfig, HarnessAdapter, and HarnessEnvironment abstractions for integrating external agentic harnesses
  • Supports two modes: production (proxy to harness) and simulation (training loop controls episodes)
  • Key design: single step() call runs entire harness ReAct session (multiple LLM calls)
  • MCP tool injection mechanism to plug environment tools into harness config

RFC 004 Implementation (LLMClient + LLMJudge):

  • New LLMClient ABC with OpenAIClient for OpenAI-compatible endpoints (vLLM, TGI, Ollama, etc.)
  • LLMJudge rubric that uses LLM endpoints to evaluate actions/observations
  • Async-first design with forward() method
  • Chess environment migrated to use ChessWinLossRubric (trajectory-based rubric with exponential discounting)

Test Coverage:

  • Comprehensive tests for LLMClient (192 lines)
  • Thorough tests for LLMJudge rubric (300 lines)
  • Chess rubric migration tests (220 lines)

Backward Compatibility:
RFC 005 is entirely opt-in with no breaking changes. RFC 004 implementation validates the rubric system design.

Confidence Score: 5/5

  • This PR is safe to merge - it's an RFC with supporting implementation that maintains backward compatibility
  • RFC 005 is documentation-only with well-thought-out design patterns. The RFC 004 implementation (LLMClient + LLMJudge) is clean, well-tested (700+ lines of tests), follows existing patterns, and includes proper migration of chess environment. No breaking changes, security issues, or alignment violations detected.
  • No files require special attention

Important Files Changed

Filename Overview
rfcs/005-agentic-harnesses.md New RFC proposing harness integration pattern - well-structured with clear motivation and design
src/openenv/core/llm_client.py New LLM client abstraction with OpenAI-compatible implementation - clean, well-documented
src/openenv/core/rubrics/llm_judge.py LLM-as-a-judge rubric implementation with async support and robust score parsing
envs/chess_env/server/chess_environment.py Migrated to use ChessWinLossRubric (RFC 004 implementation)
envs/chess_env/server/rubrics.py New chess-specific rubric extending ExponentialDiscountingTrajectoryRubric

Flowchart

flowchart TB
    subgraph pr["PR #387: RFC 005 + RFC 004 Implementation"]
        rfc005["RFC 005
        Agentic Harness Integration"]
        
        llm["LLMClient Abstraction
        OpenAIClient Implementation"]
        
        judge["LLMJudge Rubric
        (RFC 004)"]
        
        chess["Chess Environment
        Rubric Migration"]
    end
    
    subgraph design["Design Decisions"]
        wrapping["Wrapping Pattern:
        Container provides FS,
        harness runs inside"]
        
        injection["MCP Tool Injection:
        Env tools → harness config
        before session start"]
        
        modes["Two Modes:
        Production: proxy to harness
        Simulation: step() = full session"]
    end
    
    subgraph impl["RFC 004 Implementation"]
        llmclient["LLMClient ABC +
        OpenAIClient"]
        
        llmjudge["LLMJudge extends Rubric
        async forward()"]
        
        chessmig["Chess env migrated to
        ChessWinLossRubric"]
    end
    
    rfc005 --> wrapping
    rfc005 --> injection
    rfc005 --> modes
    
    rfc005 -.->|"enables"| judge
    
    llm --> llmclient
    judge --> llmjudge
    chess --> chessmig
    
    llmjudge -.->|"uses"| llmclient
    
    classDef rfcBox fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef designBox fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    classDef implBox fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
    
    class rfc005,llm,judge,chess rfcBox
    class wrapping,injection,modes designBox
    class llmclient,llmjudge,chessmig implBox
Loading

Last reviewed commit: 79a784c

Defines how OpenEnv integrates with external agentic harnesses
(OpenClaw, Claude Code, Gemini CLI, etc.) that own their own
ReAct control loop, tools, and filesystem management.

Key design decisions:
- Wrapping pattern: OpenEnv container provides filesystem, harness runs inside
- MCP tool injection: Env tools plugged into harness before session start
- Multi-turn episodes: Each step() is one conversational turn, harness
  maintains context across turns within an episode
- Streaming: send_message_streaming() yields HarnessEvents as they happen
- Standard trajectory format: HarnessEvent schema for uniform observability
- Production mode: Proxy to harness with streaming, OpenEnv handles sessions
- OpenClaw as first concrete integration

Closes #385
Add four new sections addressing review feedback:

- Harness Security Boundary: network isolation (harness cannot
  reach orchestration API), MCP tool scoping (no reward tools),
  reward boundary (rubric runs after turn, not during)
- Trajectory Semantics: comparison table of traditional vs harness
  step/episode/trajectory definitions, with implications for rubric
  authors, training loops, and monitoring
- Temporal Semantics: how harness turns relate to RFC 001's "Time
  Problem" — wall-clock execution, synchronous from training loop,
  delays transparent to harness
- Architectural note on /harness endpoint: explicitly scoped as
  harness-specific specialization, not a general agent-hosting
  pattern; future generalization requires its own RFC
@Darktex Darktex changed the title RFC 005: Agentic Harness Integration [1/4] RFC 005: Agentic Harness Integration Feb 17, 2026
@Darktex Darktex merged commit 5680f64 into main Feb 18, 2026
4 checks passed
@greptile-apps greptile-apps Bot mentioned this pull request Feb 23, 2026
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for agentic harnesses

1 participant