[Enhancement] - Add new environment - Support Unity ML-Agents-Envs#285
Conversation
|
The pushed env on HF is available here: |
Darktex
left a comment
There was a problem hiding this comment.
Summary
This PR adds a new Unity ML-Agents environment wrapper to OpenEnv, providing access to Unity's reinforcement learning environments (PushBlock, 3DBall, GridWorld, etc.) through the standardized OpenEnv HTTP/WebSocket interface.
Overall Assessment: Well-structured, comprehensive implementation with excellent documentation. A few minor issues to address.
Highlights
- Excellent Documentation: 611-line README with installation options, usage modes, API reference, and troubleshooting
- Proper OpenEnv Pattern: Uses factory pattern correctly with
create_app(UnityMLAgentsEnvironment, ...) - Comprehensive Tests: 19 tests covering core functionality
- Docker Support: Proper Dockerfile with headless mode considerations
Issues
1. Duplicated Fix from PR #286 (MINOR)
This PR includes the same tomli compatibility fix as PR #286. Coordinate to avoid duplication.
2. CI Workflow Changes (IMPORTANT)
The PR modifies .github/workflows/docker-build.yml:
- Changes path filters from
src/**toenvs/** - Renames
my-envtoconnect4_env - Adds custom context for Unity build
Please verify these changes don't break existing builds.
3. HuggingFace Spaces Link
The docs reference https://huggingface.co/spaces/Crashbandicoote2/unity_env - verify this link after deployment.
RFC Alignment
| Requirement | Status |
|---|---|
Uses create_app() with class factory |
✓ |
Implements reset() → Observation |
✓ |
Implements step(action) → Observation |
✓ |
Has state property |
✓ |
| Action/Observation extend base types | ✓ |
| Dockerized deployment | ✓ |
Verdict
APPROVE with minor comments - High-quality environment contribution. Issues are minor and can be addressed during merge.
Reviewed by Claude
Darktex
left a comment
There was a problem hiding this comment.
Note: This is an automated review by Claude Code (alignment-reviewer agent), not a human review. The account posting this is shared with the human maintainer.
Alignment Review Report
PR Summary
This PR adds a comprehensive Unity ML-Agents environment wrapper to OpenEnv, providing access to 17+ Unity RL environments (PushBlock, 3DBall, GridWorld, etc.) through the standardized OpenEnv interface. The implementation includes:
- Full environment wrapper with server/client architecture
- Docker deployment support
- Comprehensive test suite (19 tests)
- Documentation and example scripts
- Multi-mode deployment (direct, server, Docker)
Files changed: 18 files, +2858/-9 lines
Automated Checks
Code Quality
- Lint: Not run (ruff not available in review environment)
- Debug code: CLEAN - No debug statements found in production code
- Print statements found are only in docstrings, examples, and documentation (acceptable)
- Tests: 19 comprehensive unit tests provided
- Documentation: Extensive README with usage examples
Tier 1: Fixes Required
Critical Issues
1. Client-Server Separation Violation
Location: examples/unity_simple.py:359
from envs.unity_env.server.unity_environment import UnityMLAgentsEnvironmentIssue: The example script imports directly from the server directory, violating the client-server separation invariant (INVARIANTS.md #2).
Why this matters: Client code should never import from server/ directory. This creates tight coupling and breaks the architectural boundary between client and server.
Fix:
- For the "direct mode" use case in the example, the pattern should be to use
EnvClient.from_direct()or similar factory method that doesn't expose server internals - OR document this as an advanced/development-only pattern and move it to a separate dev example
- OR use the environment via the client interface even in direct mode
Severity: MUST FIX before merge
2. README Documentation Contains Server Import
Location: envs/unity_env/README.md:89, 362, 455
from envs.unity_env.server.unity_environment import UnityMLAgentsEnvironmentIssue: Documentation examples show importing from server directory.
Fix: Update documentation to use proper client interface patterns or clearly mark these as internal/development examples.
Severity: MUST FIX before merge
Minor Issues
3. Python Version Compatibility Fix
Location: src/openenv/cli/_validation.py:15-19
Change: Added fallback to tomli for Python <3.11 (tomllib not available)
Assessment: ✅ GOOD - This is a proper compatibility fix for older Python versions. The PR description mentions Python 3.10.12 is required for ML-Agents.
Tier 2: Alignment Discussion Points
1. Long Initialization Times & Timeout Configuration
Observation: The Unity environment can take 30-60+ seconds to initialize (downloading ~500MB binaries on first run). The code uses custom timeout configurations:
message_timeout_s: float = 180.0(3 minutes) in clientping_timeout=120(2 minutes) in WebSocket connection- Test fixture waits up to 60 seconds for server health
Principle at stake: User experience & production readiness (PRINCIPLES.md: "Production-readiness from day one")
The concern: While the implementation handles this correctly with appropriate timeouts, the 30-60s initialization time could be surprising in production. The caching strategy (persistent ~/.mlagents-cache) helps, but:
- First deployment on new infrastructure will be slow
- Docker cold starts will download binaries inside container
- Could impact autoscaling scenarios
Questions for maintainer:
- Should there be a pre-built Docker image with cached binaries?
- Should the README have a more prominent warning about first-run time?
- Is there a way to pre-download binaries during Docker build?
Suggested reviewer: @Darktex
2. Direct Mode Pattern
Observation: The implementation provides three modes:
- Direct mode:
UnityMLAgentsEnvironmentinstantiated directly (server code) - Server mode: Client connects to running server via WebSocket
- Docker mode: Client auto-starts Docker container
The "direct mode" is actively promoted in the README as "recommended for local development" and uses direct server imports.
Principle at stake: Client-server separation (INVARIANTS.md #2)
The concern: Is "direct mode" an acceptable pattern in OpenEnv? Other environments (echo_env, snake_env) don't prominently feature this pattern. This creates two ways to use environments:
- Via client interface (clean separation)
- Via direct server imports (breaks separation)
Trade-off: Direct mode is convenient for development and avoids server overhead, but it:
- Violates the architectural boundary
- Creates confusion about which pattern to use
- May encourage anti-patterns in user code
Questions for maintainer:
- Is direct mode acceptable as a development-only pattern?
- Should it be clearly marked as "advanced/internal" use?
- Should we have a
EnvClient.from_direct()factory that maintains the abstraction?
Suggested reviewer: @Darktex
3. Apple Silicon / Docker Compatibility
Observation: The README documents that Docker mode does NOT work on Apple Silicon (M1/M2/M3/M4) due to Unity's Mono runtime crashing under x86_64 emulation. Direct mode is recommended instead.
Principle at stake: Container isolation & reproducibility (PRINCIPLES.md: "Container isolation for reproducibility")
The concern: One of OpenEnv's core principles is container isolation. Having a major platform (Apple Silicon, increasingly common in development) where Docker doesn't work undermines this principle.
Assessment: This appears to be an upstream Unity limitation (not fixable in OpenEnv), and the PR handles it well:
- Clear documentation of the limitation
- Alternative modes provided (direct, server)
- Platform-specific guidance in README
Recommendation: ✅ ACCEPTABLE - This is a well-documented limitation with workarounds. Consider adding a warning in the Docker build process that detects Apple Silicon.
Suggested reviewer: @Darktex
4. Single Worker Limitation
Observation: Unity environments are not thread-safe. The code enforces workers=1 and documents this in multiple places.
Principle at stake: Production readiness & scalability
The concern: Single worker limits scalability. In production with high traffic, this could be a bottleneck.
Assessment: ✅ ACCEPTABLE - This is an upstream Unity limitation, not an OpenEnv issue. The implementation handles it correctly:
- Clearly documented
- Enforced in code comments
- WebSocket session support mitigates this (each connection can have its own env instance)
Note: The environment sets SUPPORTS_CONCURRENT_SESSIONS = False which is correct.
5. Reward Computation
Observation: Rewards come from Unity environment itself (line 323, 328 in unity_environment.py):
reward = float(terminal_steps[terminal_steps.agent_id[0]].reward)
reward = float(decision_steps[decision_steps.agent_id[0]].reward)Principle at stake: Rewards inside environment (PRINCIPLES.md, RFC 002)
Assessment: ✅ CORRECT - Rewards are computed by the Unity environment and passed through. No external reward computation. This follows the "rewards inside environment" principle.
6. Episode Termination / Reset Control
Observation: The environment correctly implements:
reset()for orchestration (returns new episode)step()returnsdone=Truewhen episode ends- No MCP tools expose reset/step to agents
Principle at stake: "Agents cannot reset" (INVARIANTS.md #1, PRINCIPLES.md)
Assessment: ✅ CORRECT - No violations found. The Gym-like API is only exposed via WebSocket for orchestration, not to agents via MCP.
7. Git Dependency for Packages
Observation: pyproject.toml installs packages from git:
"openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git"
"mlagents-envs @ git+https://github.com/Unity-Technologies/ml-agents.git#subdirectory=ml-agents-envs"Principle at stake: Stability & reproducibility
The concern: Git dependencies can break if:
- Upstream repo changes/deletes branches
- Network issues during install
- Commit hashes not pinned (subject to upstream changes)
Questions for maintainer:
- Should these be pinned to specific commit hashes?
- Is there a plan to use PyPI versions when available?
- This pattern exists in other envs - is it standard for OpenEnv?
Suggested reviewer: @Darktex
Tier 3: Positive Observations
Things Done Well ✅
-
Excellent Documentation: The README is comprehensive with:
- Multiple deployment modes explained
- Troubleshooting section
- Platform-specific guidance (Apple Silicon)
- Clear examples for each mode
-
Comprehensive Testing: 19 unit tests covering:
- Health endpoints
- Reset/step functionality
- Environment switching
- Action spaces (discrete & continuous)
- State tracking
-
Type Safety: Proper use of Pydantic models and generics:
UnityAction,UnityObservation,UnityStateall extend base types- Type annotations throughout
-
Error Handling: Good error messages and validation:
- Environment ID validation
- Timeout handling for slow initialization
- Graceful cleanup in
__del__andclose()
-
Async Support: Implements
reset_async()andstep_async()to avoid blocking event loop during slow Unity initialization -
Caching Strategy: Persistent cache for Unity binaries avoids re-downloading
Summary
Tier 1 Issues: 2 critical, 0 minor
Critical items to fix before merge:
- Remove server imports from
examples/unity_simple.py(client-server separation violation) - Update README to use proper client interface patterns
Tier 2 Issues: 7 alignment discussion points
Items for human review:
- Long initialization times & production implications
- "Direct mode" pattern and architectural boundaries
- Apple Silicon Docker compatibility (well-documented limitation)
- Single worker limitation (upstream constraint)
- Reward computation ✅ (correct)
- Reset/episode control ✅ (correct)
- Git dependencies for packages
Overall Assessment
This is a high-quality contribution with excellent documentation and comprehensive testing. The core implementation follows OpenEnv patterns correctly:
- ✅ Proper client-server separation (except for examples)
- ✅ Rewards stay inside environment
- ✅ No agent access to reset/simulation controls
- ✅ Type safety with Pydantic
- ✅ WebSocket for communication
Blocking issues: Fix the 2 Tier 1 items (server imports in examples/README).
Recommended next steps:
- Fix Tier 1 issues (should be quick - just update import patterns)
- Maintainer discussion on Tier 2 alignment points (especially "direct mode" pattern)
- Consider pre-building Docker image with cached binaries for faster cold starts
Recommendation: 🟡 APPROVE after Tier 1 fixes (with discussion of Tier 2 points)
Automated review by Claude Code | Learn more about OpenEnv's agentic workflow
Dismissing automated approval due to bug in review bot. The original review either had blank content or approved despite finding blocking issues. Please disregard this approval.
@Darktex Thanks! Addressed the Tier 1 issues in new commits |
aae2b6a to
b5e05a8
Compare
Darktex
left a comment
There was a problem hiding this comment.
Let's land this one first
Add Unity ML-Agents Environment
Summary
This PR adds a new Unity ML-Agents environment wrapper to OpenEnv, providing access to Unity's reinforcement learning environments (PushBlock, 3DBall, GridWorld, etc.) through the standardized OpenEnv HTTP/WebSocket interface.
Features
New Files
Supported Environments
Usage Examples
Known Limitations
workers=1Test Plan
pytest tests/envs/test_unity_environment.py -v)Dependencies
mlagents-envs(installed from Unity ML-Agents git repository)openenv-core[core](installed from git for latest features)fastapi,uvicorn,pydantic,numpy,pillow