[Enhancement] - Add new environment - Support Unity ML-Agents-Envs by AlirezaShamsoshoara · Pull Request #285 · huggingface/OpenEnv

AlirezaShamsoshoara · 2026-01-12T05:37:56Z

Add Unity ML-Agents Environment

Summary

This PR adds a new Unity ML-Agents environment wrapper to OpenEnv, providing access to Unity's reinforcement learning environments (PushBlock, 3DBall, GridWorld, etc.) through the standardized OpenEnv HTTP/WebSocket interface.

Features

Full Unity ML-Agents Integration: Wraps all environments from the ML-Agents default registry (17+ environments)
Multiple Deployment Modes:
- Direct Mode: Run Unity environments directly in-process (recommended for local development)
- Server Mode: Client-server architecture via HTTP/WebSocket
- Docker Mode: Containerized deployment for production/cloud environments
Action Space Support: Both discrete (PushBlock, GridWorld) and continuous (3DBall) action spaces
Dynamic Environment Switching: Switch between environments at runtime without restarting
Headless Mode: Run without graphics for faster training
HuggingFace Spaces Ready: Configured for deployment on HuggingFace Spaces

New Files

envs/unity_env/
├── README.md              # Comprehensive documentation
├── pyproject.toml         # Package configuration
├── client.py              # EnvClient implementation
├── models.py              # Action, Observation, State models
├── assets/                # Demo GIFs
│   ├── unity_pushblock.gif
│   └── unity_3dball.gif
└── server/
    ├── Dockerfile         # Docker configuration
    ├── app.py             # FastAPI server
    └── unity_environment.py  # Core environment wrapper

examples/
└── unity_simple.py        # Example usage script

tests/envs/
└── test_unity_environment.py  # Comprehensive test suite (19 tests)

Supported Environments

Environment	Action Type	Description
PushBlock	Discrete (7)	Push a block to a goal position
3DBall	Continuous (2)	Balance a ball on a platform
3DBallHard	Continuous (2)	Harder version of 3DBall
GridWorld	Discrete (5)	Navigate a grid to find goals
Basic	Discrete (3)	Simple left/right movement
+ 12 more	Various	All ML-Agents registry environments

Usage Examples

# Direct mode (simplest)
from envs.unity_env.client import UnityEnv
from envs.unity_env.models import UnityAction

env = UnityEnv.from_direct(no_graphics=True)
result = env.reset(env_id="PushBlock")
action = UnityAction(discrete_actions=[1])  # Move forward
result = env.step(action)
env.close()

# Server mode
with UnityEnv(base_url="http://localhost:8000") as env:
    result = env.reset(env_id="3DBall")
    action = UnityAction(continuous_actions=[0.5, -0.3])
    result = env.step(action)

Known Limitations

Apple Silicon + Docker: Docker mode does not work on M1/M2/M3/M4 Macs due to Unity's Mono runtime crashing under x86_64 emulation. Use direct mode or server mode instead (documented in README).
First Run: Downloads ~500MB of Unity binaries on first use (cached for subsequent runs)
Single Worker: Unity environments are not thread-safe; use workers=1

Test Plan

All 19 unit tests pass (pytest tests/envs/test_unity_environment.py -v)
Direct mode tested locally on macOS
Server mode tested locally
Docker mode tested on x86_64 Linux (GitHub Actions / cloud VM)
HuggingFace Spaces deployment tested
Documentation reviewed

Dependencies

mlagents-envs (installed from Unity ML-Agents git repository)
openenv-core[core] (installed from git for latest features)
fastapi, uvicorn, pydantic, numpy, pillow

AlirezaShamsoshoara · 2026-01-12T05:41:53Z

The pushed env on HF is available here:
https://huggingface.co/spaces/Crashbandicoote2/unity_env

Darktex

Summary

This PR adds a new Unity ML-Agents environment wrapper to OpenEnv, providing access to Unity's reinforcement learning environments (PushBlock, 3DBall, GridWorld, etc.) through the standardized OpenEnv HTTP/WebSocket interface.

Overall Assessment: Well-structured, comprehensive implementation with excellent documentation. A few minor issues to address.

Highlights

Excellent Documentation: 611-line README with installation options, usage modes, API reference, and troubleshooting
Proper OpenEnv Pattern: Uses factory pattern correctly with create_app(UnityMLAgentsEnvironment, ...)
Comprehensive Tests: 19 tests covering core functionality
Docker Support: Proper Dockerfile with headless mode considerations

Issues

1. Duplicated Fix from PR #286 (MINOR)

This PR includes the same tomli compatibility fix as PR #286. Coordinate to avoid duplication.

2. CI Workflow Changes (IMPORTANT)

The PR modifies .github/workflows/docker-build.yml:

Changes path filters from src/** to envs/**
Renames my-env to connect4_env
Adds custom context for Unity build

Please verify these changes don't break existing builds.

3. HuggingFace Spaces Link

The docs reference https://huggingface.co/spaces/Crashbandicoote2/unity_env - verify this link after deployment.

RFC Alignment

Requirement	Status
Uses `create_app()` with class factory	✓
Implements `reset()` → `Observation`	✓
Implements `step(action)` → `Observation`	✓
Has `state` property	✓
Action/Observation extend base types	✓
Dockerized deployment	✓

Verdict

APPROVE with minor comments - High-quality environment contribution. Issues are minor and can be addressed during merge.

Reviewed by Claude

Darktex

Note: This is an automated review by Claude Code (alignment-reviewer agent), not a human review. The account posting this is shared with the human maintainer.

Alignment Review Report

PR Summary

This PR adds a comprehensive Unity ML-Agents environment wrapper to OpenEnv, providing access to 17+ Unity RL environments (PushBlock, 3DBall, GridWorld, etc.) through the standardized OpenEnv interface. The implementation includes:

Full environment wrapper with server/client architecture
Docker deployment support
Comprehensive test suite (19 tests)
Documentation and example scripts
Multi-mode deployment (direct, server, Docker)

Files changed: 18 files, +2858/-9 lines

Automated Checks

Code Quality

Lint: Not run (ruff not available in review environment)
Debug code: CLEAN - No debug statements found in production code
- Print statements found are only in docstrings, examples, and documentation (acceptable)
Tests: 19 comprehensive unit tests provided
Documentation: Extensive README with usage examples

Tier 1: Fixes Required

Critical Issues

1. Client-Server Separation Violation

Location: examples/unity_simple.py:359

from envs.unity_env.server.unity_environment import UnityMLAgentsEnvironment

Issue: The example script imports directly from the server directory, violating the client-server separation invariant (INVARIANTS.md #2).

Why this matters: Client code should never import from server/ directory. This creates tight coupling and breaks the architectural boundary between client and server.

Fix:

For the "direct mode" use case in the example, the pattern should be to use EnvClient.from_direct() or similar factory method that doesn't expose server internals
OR document this as an advanced/development-only pattern and move it to a separate dev example
OR use the environment via the client interface even in direct mode

Severity: MUST FIX before merge

2. README Documentation Contains Server Import

Location: envs/unity_env/README.md:89, 362, 455

from envs.unity_env.server.unity_environment import UnityMLAgentsEnvironment

Issue: Documentation examples show importing from server directory.

Fix: Update documentation to use proper client interface patterns or clearly mark these as internal/development examples.

Severity: MUST FIX before merge

Minor Issues

3. Python Version Compatibility Fix

Location: src/openenv/cli/_validation.py:15-19

Change: Added fallback to tomli for Python <3.11 (tomllib not available)

Assessment: ✅ GOOD - This is a proper compatibility fix for older Python versions. The PR description mentions Python 3.10.12 is required for ML-Agents.

Tier 2: Alignment Discussion Points

1. Long Initialization Times & Timeout Configuration

Observation: The Unity environment can take 30-60+ seconds to initialize (downloading ~500MB binaries on first run). The code uses custom timeout configurations:

message_timeout_s: float = 180.0 (3 minutes) in client
ping_timeout=120 (2 minutes) in WebSocket connection
Test fixture waits up to 60 seconds for server health

Principle at stake: User experience & production readiness (PRINCIPLES.md: "Production-readiness from day one")

The concern: While the implementation handles this correctly with appropriate timeouts, the 30-60s initialization time could be surprising in production. The caching strategy (persistent ~/.mlagents-cache) helps, but:

First deployment on new infrastructure will be slow
Docker cold starts will download binaries inside container
Could impact autoscaling scenarios

Questions for maintainer:

Should there be a pre-built Docker image with cached binaries?
Should the README have a more prominent warning about first-run time?
Is there a way to pre-download binaries during Docker build?

Suggested reviewer: @Darktex

2. Direct Mode Pattern

Observation: The implementation provides three modes:

Direct mode: UnityMLAgentsEnvironment instantiated directly (server code)
Server mode: Client connects to running server via WebSocket
Docker mode: Client auto-starts Docker container

The "direct mode" is actively promoted in the README as "recommended for local development" and uses direct server imports.

Principle at stake: Client-server separation (INVARIANTS.md #2)

The concern: Is "direct mode" an acceptable pattern in OpenEnv? Other environments (echo_env, snake_env) don't prominently feature this pattern. This creates two ways to use environments:

Via client interface (clean separation)
Via direct server imports (breaks separation)

Trade-off: Direct mode is convenient for development and avoids server overhead, but it:

Violates the architectural boundary
Creates confusion about which pattern to use
May encourage anti-patterns in user code

Questions for maintainer:

Is direct mode acceptable as a development-only pattern?
Should it be clearly marked as "advanced/internal" use?
Should we have a EnvClient.from_direct() factory that maintains the abstraction?

Suggested reviewer: @Darktex

3. Apple Silicon / Docker Compatibility

Observation: The README documents that Docker mode does NOT work on Apple Silicon (M1/M2/M3/M4) due to Unity's Mono runtime crashing under x86_64 emulation. Direct mode is recommended instead.

Principle at stake: Container isolation & reproducibility (PRINCIPLES.md: "Container isolation for reproducibility")

The concern: One of OpenEnv's core principles is container isolation. Having a major platform (Apple Silicon, increasingly common in development) where Docker doesn't work undermines this principle.

Assessment: This appears to be an upstream Unity limitation (not fixable in OpenEnv), and the PR handles it well:

Clear documentation of the limitation
Alternative modes provided (direct, server)
Platform-specific guidance in README

Recommendation: ✅ ACCEPTABLE - This is a well-documented limitation with workarounds. Consider adding a warning in the Docker build process that detects Apple Silicon.

Suggested reviewer: @Darktex

4. Single Worker Limitation

Observation: Unity environments are not thread-safe. The code enforces workers=1 and documents this in multiple places.

Principle at stake: Production readiness & scalability

The concern: Single worker limits scalability. In production with high traffic, this could be a bottleneck.

Assessment: ✅ ACCEPTABLE - This is an upstream Unity limitation, not an OpenEnv issue. The implementation handles it correctly:

Clearly documented
Enforced in code comments
WebSocket session support mitigates this (each connection can have its own env instance)

Note: The environment sets SUPPORTS_CONCURRENT_SESSIONS = False which is correct.

5. Reward Computation

Observation: Rewards come from Unity environment itself (line 323, 328 in unity_environment.py):

reward = float(terminal_steps[terminal_steps.agent_id[0]].reward)
reward = float(decision_steps[decision_steps.agent_id[0]].reward)

Principle at stake: Rewards inside environment (PRINCIPLES.md, RFC 002)

Assessment: ✅ CORRECT - Rewards are computed by the Unity environment and passed through. No external reward computation. This follows the "rewards inside environment" principle.

6. Episode Termination / Reset Control

Observation: The environment correctly implements:

reset() for orchestration (returns new episode)
step() returns done=True when episode ends
No MCP tools expose reset/step to agents

Principle at stake: "Agents cannot reset" (INVARIANTS.md #1, PRINCIPLES.md)

Assessment: ✅ CORRECT - No violations found. The Gym-like API is only exposed via WebSocket for orchestration, not to agents via MCP.

7. Git Dependency for Packages

Observation: pyproject.toml installs packages from git:

"openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git"
"mlagents-envs @ git+https://github.com/Unity-Technologies/ml-agents.git#subdirectory=ml-agents-envs"

Principle at stake: Stability & reproducibility

The concern: Git dependencies can break if:

Upstream repo changes/deletes branches
Network issues during install
Commit hashes not pinned (subject to upstream changes)

Questions for maintainer:

Should these be pinned to specific commit hashes?
Is there a plan to use PyPI versions when available?
This pattern exists in other envs - is it standard for OpenEnv?

Suggested reviewer: @Darktex

Tier 3: Positive Observations

Things Done Well ✅

Excellent Documentation: The README is comprehensive with:
- Multiple deployment modes explained
- Troubleshooting section
- Platform-specific guidance (Apple Silicon)
- Clear examples for each mode
Comprehensive Testing: 19 unit tests covering:
- Health endpoints
- Reset/step functionality
- Environment switching
- Action spaces (discrete & continuous)
- State tracking
Type Safety: Proper use of Pydantic models and generics:
- UnityAction, UnityObservation, UnityState all extend base types
- Type annotations throughout
Error Handling: Good error messages and validation:
- Environment ID validation
- Timeout handling for slow initialization
- Graceful cleanup in __del__ and close()
Async Support: Implements reset_async() and step_async() to avoid blocking event loop during slow Unity initialization
Caching Strategy: Persistent cache for Unity binaries avoids re-downloading

Summary

Tier 1 Issues: 2 critical, 0 minor

Critical items to fix before merge:

Remove server imports from examples/unity_simple.py (client-server separation violation)
Update README to use proper client interface patterns

Tier 2 Issues: 7 alignment discussion points

Items for human review:

Long initialization times & production implications
"Direct mode" pattern and architectural boundaries
Apple Silicon Docker compatibility (well-documented limitation)
Single worker limitation (upstream constraint)
Reward computation ✅ (correct)
Reset/episode control ✅ (correct)
Git dependencies for packages

Overall Assessment

This is a high-quality contribution with excellent documentation and comprehensive testing. The core implementation follows OpenEnv patterns correctly:

✅ Proper client-server separation (except for examples)
✅ Rewards stay inside environment
✅ No agent access to reset/simulation controls
✅ Type safety with Pydantic
✅ WebSocket for communication

Blocking issues: Fix the 2 Tier 1 items (server imports in examples/README).

Recommended next steps:

Fix Tier 1 issues (should be quick - just update import patterns)
Maintainer discussion on Tier 2 alignment points (especially "direct mode" pattern)
Consider pre-building Docker image with cached binaries for faster cold starts

Recommendation: 🟡 APPROVE after Tier 1 fixes (with discussion of Tier 2 points)

Automated review by Claude Code | Learn more about OpenEnv's agentic workflow

Darktex

See Claude's review

Dismissing automated approval due to bug in review bot. The original review either had blank content or approved despite finding blocking issues. Please disregard this approval.

AlirezaShamsoshoara · 2026-01-15T00:04:03Z

See Claude's review

@Darktex Thanks! Addressed the Tier 1 issues in new commits

Darktex

Let's land this one first

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 12, 2026

AlirezaShamsoshoara added the New Environment label Jan 12, 2026

Darktex mentioned this pull request Jan 12, 2026

Fix tomli compatibility issue for older Python version #286

Merged

Darktex reviewed Jan 12, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

Darktex reviewed Jan 13, 2026

View reviewed changes

Darktex suggested changes Jan 13, 2026

View reviewed changes

AlirezaShamsoshoara added 20 commits January 15, 2026 10:35

add the updates to the docker build for HF and github

5f068bb

add the unity env to the env doc md

e5b5088

add the unity example to the example dir

58a6330

add gif contents for the README

e942c0f

add gitignore for unity env

a9e7c4b

add README file for unity env

0fd1d40

add init for unity env

3ff4271

add the client for unity env

d45b1ff

add the models for the unity env

7615b74

add the unity env yaml file

f1a813f

add the unity_env pyproject

d95f8ea

add the server init

66d22ae

add the docker file

5914f5c

add unity app and environment

4cae1c8

add the unity env unit tests

5a2d783

ruff format on unit tests for unity

e5c5426

fix the naming and path of the example for the README file

470eb20

fix the naming issue in unity_simple docstring on how to run examples

fbe9d87

addressed the Tier 1 review feedback

2bc7bd2

fix the lint issue on main

b5e05a8

AlirezaShamsoshoara force-pushed the ali/feature/unity_env branch from aae2b6a to b5e05a8 Compare January 15, 2026 18:37

Darktex approved these changes Jan 15, 2026

View reviewed changes

Darktex merged commit 385c8d8 into huggingface:main Jan 15, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] - Add new environment - Support Unity ML-Agents-Envs#285

[Enhancement] - Add new environment - Support Unity ML-Agents-Envs#285
Darktex merged 20 commits into
huggingface:mainfrom
AlirezaShamsoshoara:ali/feature/unity_env

AlirezaShamsoshoara commented Jan 12, 2026

Uh oh!

AlirezaShamsoshoara commented Jan 12, 2026

Uh oh!

Darktex left a comment

Uh oh!

This comment was marked as outdated.

Uh oh!

Darktex left a comment

Uh oh!

Darktex left a comment

Uh oh!

AlirezaShamsoshoara commented Jan 15, 2026

Uh oh!

Darktex left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AlirezaShamsoshoara commented Jan 12, 2026

Add Unity ML-Agents Environment

Summary

Features

New Files

Supported Environments

Usage Examples

Known Limitations

Test Plan

Dependencies

Uh oh!

AlirezaShamsoshoara commented Jan 12, 2026

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Summary

Highlights

Issues

1. Duplicated Fix from PR #286 (MINOR)

2. CI Workflow Changes (IMPORTANT)

3. HuggingFace Spaces Link

RFC Alignment

Verdict

Uh oh!

This comment was marked as outdated.

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Alignment Review Report

PR Summary

Automated Checks

Code Quality

Tier 1: Fixes Required

Critical Issues

1. Client-Server Separation Violation

2. README Documentation Contains Server Import

Minor Issues

3. Python Version Compatibility Fix

Tier 2: Alignment Discussion Points

1. Long Initialization Times & Timeout Configuration

2. Direct Mode Pattern

3. Apple Silicon / Docker Compatibility

4. Single Worker Limitation

5. Reward Computation

6. Episode Termination / Reset Control

7. Git Dependency for Packages

Tier 3: Positive Observations

Things Done Well ✅

Summary

Tier 1 Issues: 2 critical, 0 minor

Tier 2 Issues: 7 alignment discussion points

Overall Assessment

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Uh oh!

AlirezaShamsoshoara commented Jan 15, 2026

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants