Skip to content

[Enhancement] - Add new environment - Support Unity ML-Agents-Envs#285

Merged
Darktex merged 20 commits into
huggingface:mainfrom
AlirezaShamsoshoara:ali/feature/unity_env
Jan 15, 2026
Merged

[Enhancement] - Add new environment - Support Unity ML-Agents-Envs#285
Darktex merged 20 commits into
huggingface:mainfrom
AlirezaShamsoshoara:ali/feature/unity_env

Conversation

@AlirezaShamsoshoara
Copy link
Copy Markdown
Contributor

Add Unity ML-Agents Environment

unity_pushblock
unity_3dball

Summary

This PR adds a new Unity ML-Agents environment wrapper to OpenEnv, providing access to Unity's reinforcement learning environments (PushBlock, 3DBall, GridWorld, etc.) through the standardized OpenEnv HTTP/WebSocket interface.

Features

  • Full Unity ML-Agents Integration: Wraps all environments from the ML-Agents default registry (17+ environments)
  • Multiple Deployment Modes:
    • Direct Mode: Run Unity environments directly in-process (recommended for local development)
    • Server Mode: Client-server architecture via HTTP/WebSocket
    • Docker Mode: Containerized deployment for production/cloud environments
  • Action Space Support: Both discrete (PushBlock, GridWorld) and continuous (3DBall) action spaces
  • Dynamic Environment Switching: Switch between environments at runtime without restarting
  • Headless Mode: Run without graphics for faster training
  • HuggingFace Spaces Ready: Configured for deployment on HuggingFace Spaces

New Files

envs/unity_env/
├── README.md              # Comprehensive documentation
├── pyproject.toml         # Package configuration
├── client.py              # EnvClient implementation
├── models.py              # Action, Observation, State models
├── assets/                # Demo GIFs
│   ├── unity_pushblock.gif
│   └── unity_3dball.gif
└── server/
    ├── Dockerfile         # Docker configuration
    ├── app.py             # FastAPI server
    └── unity_environment.py  # Core environment wrapper

examples/
└── unity_simple.py        # Example usage script

tests/envs/
└── test_unity_environment.py  # Comprehensive test suite (19 tests)

Supported Environments

Environment Action Type Description
PushBlock Discrete (7) Push a block to a goal position
3DBall Continuous (2) Balance a ball on a platform
3DBallHard Continuous (2) Harder version of 3DBall
GridWorld Discrete (5) Navigate a grid to find goals
Basic Discrete (3) Simple left/right movement
+ 12 more Various All ML-Agents registry environments

Usage Examples

# Direct mode (simplest)
from envs.unity_env.client import UnityEnv
from envs.unity_env.models import UnityAction

env = UnityEnv.from_direct(no_graphics=True)
result = env.reset(env_id="PushBlock")
action = UnityAction(discrete_actions=[1])  # Move forward
result = env.step(action)
env.close()

# Server mode
with UnityEnv(base_url="http://localhost:8000") as env:
    result = env.reset(env_id="3DBall")
    action = UnityAction(continuous_actions=[0.5, -0.3])
    result = env.step(action)

Known Limitations

  • Apple Silicon + Docker: Docker mode does not work on M1/M2/M3/M4 Macs due to Unity's Mono runtime crashing under x86_64 emulation. Use direct mode or server mode instead (documented in README).
  • First Run: Downloads ~500MB of Unity binaries on first use (cached for subsequent runs)
  • Single Worker: Unity environments are not thread-safe; use workers=1

Test Plan

  • All 19 unit tests pass (pytest tests/envs/test_unity_environment.py -v)
  • Direct mode tested locally on macOS
  • Server mode tested locally
  • Docker mode tested on x86_64 Linux (GitHub Actions / cloud VM)
  • HuggingFace Spaces deployment tested
  • Documentation reviewed

Dependencies

  • mlagents-envs (installed from Unity ML-Agents git repository)
  • openenv-core[core] (installed from git for latest features)
  • fastapi, uvicorn, pydantic, numpy, pillow

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 12, 2026
@AlirezaShamsoshoara
Copy link
Copy Markdown
Contributor Author

The pushed env on HF is available here:
https://huggingface.co/spaces/Crashbandicoote2/unity_env

Copy link
Copy Markdown
Contributor

@Darktex Darktex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR adds a new Unity ML-Agents environment wrapper to OpenEnv, providing access to Unity's reinforcement learning environments (PushBlock, 3DBall, GridWorld, etc.) through the standardized OpenEnv HTTP/WebSocket interface.

Overall Assessment: Well-structured, comprehensive implementation with excellent documentation. A few minor issues to address.

Highlights

  • Excellent Documentation: 611-line README with installation options, usage modes, API reference, and troubleshooting
  • Proper OpenEnv Pattern: Uses factory pattern correctly with create_app(UnityMLAgentsEnvironment, ...)
  • Comprehensive Tests: 19 tests covering core functionality
  • Docker Support: Proper Dockerfile with headless mode considerations

Issues

1. Duplicated Fix from PR #286 (MINOR)

This PR includes the same tomli compatibility fix as PR #286. Coordinate to avoid duplication.

2. CI Workflow Changes (IMPORTANT)

The PR modifies .github/workflows/docker-build.yml:

  • Changes path filters from src/** to envs/**
  • Renames my-env to connect4_env
  • Adds custom context for Unity build

Please verify these changes don't break existing builds.

3. HuggingFace Spaces Link

The docs reference https://huggingface.co/spaces/Crashbandicoote2/unity_env - verify this link after deployment.

RFC Alignment

Requirement Status
Uses create_app() with class factory
Implements reset()Observation
Implements step(action)Observation
Has state property
Action/Observation extend base types
Dockerized deployment

Verdict

APPROVE with minor comments - High-quality environment contribution. Issues are minor and can be addressed during merge.


Reviewed by Claude

Darktex

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@Darktex Darktex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code (alignment-reviewer agent), not a human review. The account posting this is shared with the human maintainer.


Alignment Review Report

PR Summary

This PR adds a comprehensive Unity ML-Agents environment wrapper to OpenEnv, providing access to 17+ Unity RL environments (PushBlock, 3DBall, GridWorld, etc.) through the standardized OpenEnv interface. The implementation includes:

  • Full environment wrapper with server/client architecture
  • Docker deployment support
  • Comprehensive test suite (19 tests)
  • Documentation and example scripts
  • Multi-mode deployment (direct, server, Docker)

Files changed: 18 files, +2858/-9 lines


Automated Checks

Code Quality

  • Lint: Not run (ruff not available in review environment)
  • Debug code: CLEAN - No debug statements found in production code
    • Print statements found are only in docstrings, examples, and documentation (acceptable)
  • Tests: 19 comprehensive unit tests provided
  • Documentation: Extensive README with usage examples

Tier 1: Fixes Required

Critical Issues

1. Client-Server Separation Violation

Location: examples/unity_simple.py:359

from envs.unity_env.server.unity_environment import UnityMLAgentsEnvironment

Issue: The example script imports directly from the server directory, violating the client-server separation invariant (INVARIANTS.md #2).

Why this matters: Client code should never import from server/ directory. This creates tight coupling and breaks the architectural boundary between client and server.

Fix:

  • For the "direct mode" use case in the example, the pattern should be to use EnvClient.from_direct() or similar factory method that doesn't expose server internals
  • OR document this as an advanced/development-only pattern and move it to a separate dev example
  • OR use the environment via the client interface even in direct mode

Severity: MUST FIX before merge


2. README Documentation Contains Server Import

Location: envs/unity_env/README.md:89, 362, 455

from envs.unity_env.server.unity_environment import UnityMLAgentsEnvironment

Issue: Documentation examples show importing from server directory.

Fix: Update documentation to use proper client interface patterns or clearly mark these as internal/development examples.

Severity: MUST FIX before merge


Minor Issues

3. Python Version Compatibility Fix

Location: src/openenv/cli/_validation.py:15-19

Change: Added fallback to tomli for Python <3.11 (tomllib not available)

Assessment: ✅ GOOD - This is a proper compatibility fix for older Python versions. The PR description mentions Python 3.10.12 is required for ML-Agents.


Tier 2: Alignment Discussion Points

1. Long Initialization Times & Timeout Configuration

Observation: The Unity environment can take 30-60+ seconds to initialize (downloading ~500MB binaries on first run). The code uses custom timeout configurations:

  • message_timeout_s: float = 180.0 (3 minutes) in client
  • ping_timeout=120 (2 minutes) in WebSocket connection
  • Test fixture waits up to 60 seconds for server health

Principle at stake: User experience & production readiness (PRINCIPLES.md: "Production-readiness from day one")

The concern: While the implementation handles this correctly with appropriate timeouts, the 30-60s initialization time could be surprising in production. The caching strategy (persistent ~/.mlagents-cache) helps, but:

  • First deployment on new infrastructure will be slow
  • Docker cold starts will download binaries inside container
  • Could impact autoscaling scenarios

Questions for maintainer:

  1. Should there be a pre-built Docker image with cached binaries?
  2. Should the README have a more prominent warning about first-run time?
  3. Is there a way to pre-download binaries during Docker build?

Suggested reviewer: @Darktex


2. Direct Mode Pattern

Observation: The implementation provides three modes:

  1. Direct mode: UnityMLAgentsEnvironment instantiated directly (server code)
  2. Server mode: Client connects to running server via WebSocket
  3. Docker mode: Client auto-starts Docker container

The "direct mode" is actively promoted in the README as "recommended for local development" and uses direct server imports.

Principle at stake: Client-server separation (INVARIANTS.md #2)

The concern: Is "direct mode" an acceptable pattern in OpenEnv? Other environments (echo_env, snake_env) don't prominently feature this pattern. This creates two ways to use environments:

  • Via client interface (clean separation)
  • Via direct server imports (breaks separation)

Trade-off: Direct mode is convenient for development and avoids server overhead, but it:

  • Violates the architectural boundary
  • Creates confusion about which pattern to use
  • May encourage anti-patterns in user code

Questions for maintainer:

  1. Is direct mode acceptable as a development-only pattern?
  2. Should it be clearly marked as "advanced/internal" use?
  3. Should we have a EnvClient.from_direct() factory that maintains the abstraction?

Suggested reviewer: @Darktex


3. Apple Silicon / Docker Compatibility

Observation: The README documents that Docker mode does NOT work on Apple Silicon (M1/M2/M3/M4) due to Unity's Mono runtime crashing under x86_64 emulation. Direct mode is recommended instead.

Principle at stake: Container isolation & reproducibility (PRINCIPLES.md: "Container isolation for reproducibility")

The concern: One of OpenEnv's core principles is container isolation. Having a major platform (Apple Silicon, increasingly common in development) where Docker doesn't work undermines this principle.

Assessment: This appears to be an upstream Unity limitation (not fixable in OpenEnv), and the PR handles it well:

  • Clear documentation of the limitation
  • Alternative modes provided (direct, server)
  • Platform-specific guidance in README

Recommendation: ✅ ACCEPTABLE - This is a well-documented limitation with workarounds. Consider adding a warning in the Docker build process that detects Apple Silicon.

Suggested reviewer: @Darktex


4. Single Worker Limitation

Observation: Unity environments are not thread-safe. The code enforces workers=1 and documents this in multiple places.

Principle at stake: Production readiness & scalability

The concern: Single worker limits scalability. In production with high traffic, this could be a bottleneck.

Assessment: ✅ ACCEPTABLE - This is an upstream Unity limitation, not an OpenEnv issue. The implementation handles it correctly:

  • Clearly documented
  • Enforced in code comments
  • WebSocket session support mitigates this (each connection can have its own env instance)

Note: The environment sets SUPPORTS_CONCURRENT_SESSIONS = False which is correct.


5. Reward Computation

Observation: Rewards come from Unity environment itself (line 323, 328 in unity_environment.py):

reward = float(terminal_steps[terminal_steps.agent_id[0]].reward)
reward = float(decision_steps[decision_steps.agent_id[0]].reward)

Principle at stake: Rewards inside environment (PRINCIPLES.md, RFC 002)

Assessment: ✅ CORRECT - Rewards are computed by the Unity environment and passed through. No external reward computation. This follows the "rewards inside environment" principle.


6. Episode Termination / Reset Control

Observation: The environment correctly implements:

  • reset() for orchestration (returns new episode)
  • step() returns done=True when episode ends
  • No MCP tools expose reset/step to agents

Principle at stake: "Agents cannot reset" (INVARIANTS.md #1, PRINCIPLES.md)

Assessment: ✅ CORRECT - No violations found. The Gym-like API is only exposed via WebSocket for orchestration, not to agents via MCP.


7. Git Dependency for Packages

Observation: pyproject.toml installs packages from git:

"openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git"
"mlagents-envs @ git+https://github.com/Unity-Technologies/ml-agents.git#subdirectory=ml-agents-envs"

Principle at stake: Stability & reproducibility

The concern: Git dependencies can break if:

  • Upstream repo changes/deletes branches
  • Network issues during install
  • Commit hashes not pinned (subject to upstream changes)

Questions for maintainer:

  1. Should these be pinned to specific commit hashes?
  2. Is there a plan to use PyPI versions when available?
  3. This pattern exists in other envs - is it standard for OpenEnv?

Suggested reviewer: @Darktex


Tier 3: Positive Observations

Things Done Well ✅

  1. Excellent Documentation: The README is comprehensive with:

    • Multiple deployment modes explained
    • Troubleshooting section
    • Platform-specific guidance (Apple Silicon)
    • Clear examples for each mode
  2. Comprehensive Testing: 19 unit tests covering:

    • Health endpoints
    • Reset/step functionality
    • Environment switching
    • Action spaces (discrete & continuous)
    • State tracking
  3. Type Safety: Proper use of Pydantic models and generics:

    • UnityAction, UnityObservation, UnityState all extend base types
    • Type annotations throughout
  4. Error Handling: Good error messages and validation:

    • Environment ID validation
    • Timeout handling for slow initialization
    • Graceful cleanup in __del__ and close()
  5. Async Support: Implements reset_async() and step_async() to avoid blocking event loop during slow Unity initialization

  6. Caching Strategy: Persistent cache for Unity binaries avoids re-downloading


Summary

Tier 1 Issues: 2 critical, 0 minor

Critical items to fix before merge:

  1. Remove server imports from examples/unity_simple.py (client-server separation violation)
  2. Update README to use proper client interface patterns

Tier 2 Issues: 7 alignment discussion points

Items for human review:

  1. Long initialization times & production implications
  2. "Direct mode" pattern and architectural boundaries
  3. Apple Silicon Docker compatibility (well-documented limitation)
  4. Single worker limitation (upstream constraint)
  5. Reward computation ✅ (correct)
  6. Reset/episode control ✅ (correct)
  7. Git dependencies for packages

Overall Assessment

This is a high-quality contribution with excellent documentation and comprehensive testing. The core implementation follows OpenEnv patterns correctly:

  • ✅ Proper client-server separation (except for examples)
  • ✅ Rewards stay inside environment
  • ✅ No agent access to reset/simulation controls
  • ✅ Type safety with Pydantic
  • ✅ WebSocket for communication

Blocking issues: Fix the 2 Tier 1 items (server imports in examples/README).

Recommended next steps:

  1. Fix Tier 1 issues (should be quick - just update import patterns)
  2. Maintainer discussion on Tier 2 alignment points (especially "direct mode" pattern)
  3. Consider pre-building Docker image with cached binaries for faster cold starts

Recommendation: 🟡 APPROVE after Tier 1 fixes (with discussion of Tier 2 points)


Automated review by Claude Code | Learn more about OpenEnv's agentic workflow

Copy link
Copy Markdown
Contributor

@Darktex Darktex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Claude's review

@Darktex Darktex dismissed their stale review January 13, 2026 05:51

Dismissing automated approval due to bug in review bot. The original review either had blank content or approved despite finding blocking issues. Please disregard this approval.

@AlirezaShamsoshoara
Copy link
Copy Markdown
Contributor Author

See Claude's review

@Darktex Thanks! Addressed the Tier 1 issues in new commits

Copy link
Copy Markdown
Contributor

@Darktex Darktex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's land this one first

@Darktex Darktex merged commit 385c8d8 into huggingface:main Jan 15, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. New Environment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants