[1/4] RFC 005: Agentic Harness Integration by Darktex · Pull Request #387 · huggingface/OpenEnv

Darktex · 2026-02-17T02:16:05Z

Summary

Adds RFC 005 defining how OpenEnv integrates with external agentic harnesses (OpenClaw, Claude Code, Gemini CLI, Goose, etc.)
Updates RFC README to include RFC 004 and 005

Key Design Decisions

Wrapping pattern: OpenEnv container provides filesystem, harness runs inside as a subprocess
MCP tool injection: Environment tools plugged into harness config before session start
Production mode: OpenEnv proxies to harness, handles session management only
Simulation mode: Training loop controls episode boundaries; step() triggers entire harness ReAct loop
OpenClaw as first harness: Concrete adapter to validate the abstraction

New Abstractions

HarnessConfig - Pydantic model for harness configuration
HarnessAdapter - ABC for harness-specific lifecycle management
HarnessEnvironment - MCPEnvironment subclass wrapping a harness
HarnessTransport - Enum for communication transport (stdio, HTTP, MCP)
Tool conflict resolution utilities

Open Questions (for review)

Trajectory format standardization
Intermediate observability (streaming harness steps)
Multi-step episodes vs single-step sessions
Harness-level reward shaping
Resource limits (max LLM calls, tokens, tool invocations)

Test plan

RFC review and approval by maintainers
Validate design against OpenClaw's actual configuration format
Verify backward compatibility claim (no changes to existing types)
Subsequent PRs will add implementation with TDD

Closes #385

greptile-apps · 2026-02-17T02:21:02Z

Greptile Summary

This PR introduces RFC 005 (Agentic Harness Integration) and implements foundational components from RFC 004 (Rubrics) to support LLM-as-a-judge evaluation.

RFC 005 - Agentic Harness Integration:

Proposes a wrapping pattern where OpenEnv containers provide filesystem/sandbox while harnesses (OpenClaw, Claude Code, etc.) run inside
Defines HarnessConfig, HarnessAdapter, and HarnessEnvironment abstractions for integrating external agentic harnesses
Supports two modes: production (proxy to harness) and simulation (training loop controls episodes)
Key design: single step() call runs entire harness ReAct session (multiple LLM calls)
MCP tool injection mechanism to plug environment tools into harness config

RFC 004 Implementation (LLMClient + LLMJudge):

New LLMClient ABC with OpenAIClient for OpenAI-compatible endpoints (vLLM, TGI, Ollama, etc.)
LLMJudge rubric that uses LLM endpoints to evaluate actions/observations
Async-first design with forward() method
Chess environment migrated to use ChessWinLossRubric (trajectory-based rubric with exponential discounting)

Test Coverage:

Comprehensive tests for LLMClient (192 lines)
Thorough tests for LLMJudge rubric (300 lines)
Chess rubric migration tests (220 lines)

Backward Compatibility:
RFC 005 is entirely opt-in with no breaking changes. RFC 004 implementation validates the rubric system design.

Confidence Score: 5/5

This PR is safe to merge - it's an RFC with supporting implementation that maintains backward compatibility
RFC 005 is documentation-only with well-thought-out design patterns. The RFC 004 implementation (LLMClient + LLMJudge) is clean, well-tested (700+ lines of tests), follows existing patterns, and includes proper migration of chess environment. No breaking changes, security issues, or alignment violations detected.
No files require special attention

Important Files Changed

Filename	Overview
rfcs/005-agentic-harnesses.md	New RFC proposing harness integration pattern - well-structured with clear motivation and design
src/openenv/core/llm_client.py	New LLM client abstraction with OpenAI-compatible implementation - clean, well-documented
src/openenv/core/rubrics/llm_judge.py	LLM-as-a-judge rubric implementation with async support and robust score parsing
envs/chess_env/server/chess_environment.py	Migrated to use ChessWinLossRubric (RFC 004 implementation)
envs/chess_env/server/rubrics.py	New chess-specific rubric extending ExponentialDiscountingTrajectoryRubric

Flowchart

flowchart TB
    subgraph pr["PR #387: RFC 005 + RFC 004 Implementation"]
        rfc005["RFC 005
        Agentic Harness Integration"]
        
        llm["LLMClient Abstraction
        OpenAIClient Implementation"]
        
        judge["LLMJudge Rubric
        (RFC 004)"]
        
        chess["Chess Environment
        Rubric Migration"]
    end
    
    subgraph design["Design Decisions"]
        wrapping["Wrapping Pattern:
        Container provides FS,
        harness runs inside"]
        
        injection["MCP Tool Injection:
        Env tools → harness config
        before session start"]
        
        modes["Two Modes:
        Production: proxy to harness
        Simulation: step() = full session"]
    end
    
    subgraph impl["RFC 004 Implementation"]
        llmclient["LLMClient ABC +
        OpenAIClient"]
        
        llmjudge["LLMJudge extends Rubric
        async forward()"]
        
        chessmig["Chess env migrated to
        ChessWinLossRubric"]
    end
    
    rfc005 --> wrapping
    rfc005 --> injection
    rfc005 --> modes
    
    rfc005 -.->|"enables"| judge
    
    llm --> llmclient
    judge --> llmjudge
    chess --> chessmig
    
    llmjudge -.->|"uses"| llmclient
    
    classDef rfcBox fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef designBox fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    classDef implBox fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
    
    class rfc005,llm,judge,chess rfcBox
    class wrapping,injection,modes designBox
    class llmclient,llmjudge,chessmig implBox

_{Last reviewed commit: 79a784c}

Defines how OpenEnv integrates with external agentic harnesses (OpenClaw, Claude Code, Gemini CLI, etc.) that own their own ReAct control loop, tools, and filesystem management. Key design decisions: - Wrapping pattern: OpenEnv container provides filesystem, harness runs inside - MCP tool injection: Env tools plugged into harness before session start - Multi-turn episodes: Each step() is one conversational turn, harness maintains context across turns within an episode - Streaming: send_message_streaming() yields HarnessEvents as they happen - Standard trajectory format: HarnessEvent schema for uniform observability - Production mode: Proxy to harness with streaming, OpenEnv handles sessions - OpenClaw as first concrete integration Closes #385

Add four new sections addressing review feedback: - Harness Security Boundary: network isolation (harness cannot reach orchestration API), MCP tool scoping (no reward tools), reward boundary (rubric runs after turn, not during) - Trajectory Semantics: comparison table of traditional vs harness step/episode/trajectory definitions, with implications for rubric authors, training loops, and monitoring - Temporal Semantics: how harness turns relate to RFC 001's "Time Problem" — wall-clock execution, synchronous from training loop, delays transparent to harness - Architectural note on /harness endpoint: explicitly scoped as harness-specific specialization, not a general agent-hosting pattern; future generalization requires its own RFC

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 17, 2026

Darktex force-pushed the feature/issue-385-agentic-harnesses branch from 79a784c to 70ee465 Compare February 17, 2026 08:13

Darktex mentioned this pull request Feb 17, 2026

Add agentic harness integration: types, HarnessEnvironment, OpenClaw adapter (RFC 005) #389

Open

3 tasks

Darktex changed the title ~~RFC 005: Agentic Harness Integration~~ [1/4] RFC 005: Agentic Harness Integration Feb 17, 2026

Merge branch 'main' into feature/issue-385-agentic-harnesses

05046d7

Darktex merged commit 5680f64 into main Feb 18, 2026
4 checks passed

greptile-apps Bot mentioned this pull request Feb 23, 2026

Add new RFCs to the README #403

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1/4] RFC 005: Agentic Harness Integration#387

[1/4] RFC 005: Agentic Harness Integration#387
Darktex merged 3 commits into
mainfrom
feature/issue-385-agentic-harnesses

Darktex commented Feb 17, 2026

Uh oh!

greptile-apps Bot commented Feb 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Darktex commented Feb 17, 2026

Summary

Key Design Decisions

New Abstractions

Open Questions (for review)

Test plan

Uh oh!

greptile-apps Bot commented Feb 17, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant