[FEATURE] WebSocket-based Concurrency Architecture #239

rycerzes · 2025-12-07T20:28:21Z

Add WebSocket support with concurrent session management

Adds WebSocket endpoints for persistent environment sessions with configurable concurrency limits #194

High-level Diff

These are the results on the server side:

- env = MyEnvironment()
  app = create_app(
-      env,
+     MyEnvironment,              # Pass class, not instance
      MyAction,
      MyObservation,
+     max_concurrent_envs=4,      # Allow 4 concurrent WebSocket sessions
)

On the client side, it requires a change or url:

from envs.echo_env import EchoEnv, EchoAction

+ client = EchoEnv(base_url="ws://localhost:8000/ws")
- client = EchoEnv(base_url="http://localhost:8000")

result = client.reset()
result = client.step(EchoAction(message="Hello!"))

# or async with
+ result = await client.reset()
+ result = await client.step(EchoAction(message="Hello!"))

This leads to high concurrency with limited resources:

Changes

WebSocket endpoint at /ws with message protocol for reset/step/state/close
Factory pattern support: pass environment class instead of instance to create per-session environments
ConcurrencyConfig for setting max concurrent sessions, timeout, and capacity behavior
CONCURRENCY_SAFE flag on environments (defaults to False) with startup validation
Session capacity tracking and error handling
New client: WebSocketEnvClient for persistent connections

API

New types:

ConcurrencyConfig(max_concurrent_envs, session_timeout_seconds, reject_on_capacity)
SessionInfo and ServerCapacityStatus for session metadata
WebSocket message types: WSResetMessage, WSStepMessage, WSStateMessage, WSCloseMessage
Response types: WSObservationResponse, WSStateResponse, WSErrorResponse

Usage:

# Factory mode for concurrent sessions
app = create_app(
    env=MyEnvironment,  # Pass class, not instance
    max_concurrent_envs=4
)

Defaults to max_concurrent_envs=1 for backward compatibility. Environments must set CONCURRENCY_SAFE=True to allow higher concurrency.

TODO

Session timeout enforcement (tracked but not implemented)
openenv init needs the WebSocket code integrated into the template:
Resource monitoring (memory/CPU per session)
Connection queueing when reject_on_capacity=False
Mark safe environments as CONCURRENCY_SAFE=True
Update envs to support concurrency

…erver capabilities - Introduced WebSocketEnvClient for persistent sessions with multi-step interactions. - Updated HTTPEnvServer to support WebSocket connections and manage multiple concurrent environments. - Added WebSocket message types and responses for better communication. - Enhanced Environment interface with concurrency safety attributes.

rycerzes · 2025-12-07T20:35:13Z

@burtenshaw draft PR for the ws and concurrency. I have merged the #238 into this as well.

Few notes, before #232 gets merged:

~~openenv init generates boilerplate template according to the old structure~~
openenv init needs the WebSocket code integrated into the template:
- Add WebSocket client example/template
- Update server templates to show WebSocket endpoint usage
- Include documentation on CONCURRENCY_SAFE flag and concurrent sessions
VectorEnv abstraction for batched operations inspired by Gymnasium

burtenshaw · 2025-12-08T12:07:36Z

Amazing work @rycerzes . Thanks

openenv init generates boilerplate template according to the old structure.

I'll integrate this in a new PR for you to merge here.

VectorEnv abstraction for batched operations inspired by Gymnasium

I think we can leave this for a subsequent PR.

Also, this env might be useful to you. It's basically just a benchmarking env that let's you test concurrency asynchronously like this.

burtenshaw · 2025-12-08T12:52:48Z

@rycerzes could you help me to understand this please:

openenv init generates boilerplate template according to the old structure.

What do you mean by old structure? afaik #232 openenv init generates a template with a corresponding structure to the branch. i.e. from:

from openenv.core.env_server.interfaces import Environment
from openenv.core.env_server.types import State

rycerzes · 2025-12-08T13:22:34Z

@burtenshaw

Thanks for the clarification! You're absolutely right - I need to correct my earlier comment.

What do you mean by old structure? afaik #232 openenv init generates a template with a corresponding structure to the branch. i.e. from:
from openenv.core.env_server.interfaces import Environment
from openenv.core.env_server.types import State

I must have run openenv init from the main branch when I was testing, which would explain the confusion. The openenv init command on both the impl/concurrency branch and in #232 does generate the correct new structure with openenv.core imports.

I just verified this by running uv run openenv init test_env -o tests/ on the current branch, and it correctly generates all files with the new import structure. I have updated my above comment accordingly 👍

Also, this env might be useful to you. It's basically just a benchmarking env that let's you test concurrency asynchronously like this.

Thanks! That benchmark env would be perfect for testing the concurrency implementation. I'll take a look at it.
Apologies for the confusion on point 1!

Wauplin

Thanks for working on this very important piece @rycerzes! I've left quite some comments on how I would do things but some parts are left to the maintainers' decisions 🤗 Especially:

should we allow "instantiate a server by passing an env instead of an env factory" to keep backward compatibility? => I would say "no" since project is still in early phase
should we maintain both a "HTTP-based interface" and a "websocket-based interface"? => same, I would say "no" at it means doubling the amount of work (2 paths in the http server and 2 very similar clients to maintain with same interface with different internal logic). Better to keep only 1 interface that is more robust for the future. End users should not be impacted by this decision (except for the breaking change to adapt).

Apart from that, I usually tend to advice to simplify logic by not adding too many optional features at first. More options usually means more internal logic and more maintenance burden on the long run. So if something is not explicitly required, let's keep it for later.

Note that I haven't run the code myself. Will give it a try soon!

src/openenv/core/env_server/types.py

Wauplin · 2025-12-08T14:07:05Z

src/openenv/core/env_server/types.py

+    model_config = ConfigDict(
+        extra="forbid",
+        validate_assignment=True,
+    )


can be factorized in the base config I mentioned above (same for other BaseModels)

Wauplin · 2025-12-08T14:15:07Z

src/openenv/core/env_server/types.py

+class ServerCapacityStatus(BaseModel):
+    """Status of server capacity for concurrent sessions."""
+
+    model_config = ConfigDict(
+        extra="forbid",
+        validate_assignment=True,
+    )
+
+    active_sessions: int = Field(
+        ge=0,
+        description="Number of currently active sessions",
+    )
+    max_sessions: int = Field(
+        ge=1,
+        description="Maximum number of allowed sessions",
+    )
+    available_slots: int = Field(
+        ge=0,
+        description="Number of available session slots",
+    )
+    is_at_capacity: bool = Field(
+        description="Whether the server has reached maximum capacity",
+    )
+
+    @classmethod
+    def from_counts(cls, active: int, max_sessions: int) -> "ServerCapacityStatus":
+        """Create status from active and max session counts."""
+        available = max(0, max_sessions - active)
+        return cls(
+            active_sessions=active,
+            max_sessions=max_sessions,
+            available_slots=available,
+            is_at_capacity=active >= max_sessions,


I feel this class could be simplified with something like this:

class ServerCapacityStatus(BaseModel): """Status of server capacity for concurrent sessions.""" active_sessions: int = Field( ge=0, description="Number of currently active sessions") max_sessions: int = Field(ge=1, description="Maximum number of allowed sessions") @model_validator(mode="after") def check_capacity_bounds(self) -> "ServerCapacityStatus": if self.active_sessions > self.max_sessions: raise ValueError( f"active_sessions ({self.active_sessions}) cannot exceed " f"max_sessions ({self.max_sessions})" ) return self @property def available_slots(self) -> int: """Number of available session slots""" return max_sessions - active_sessions @property def is_at_capacity(self) -> int: # Not sure this property is really necessary """Whether the server has reached maximum capacity""" return self.available_slots == 0

This way available_slots and is_at_capacity are inferred properties, not stored values. And we always validate that active and max sessions are coherent.

Wauplin · 2025-12-08T16:01:50Z

src/openenv/core/env_server/http_server.py

+        # Register concurrency config endpoint
+        @app.get(
+            "/concurrency",
+            response_model=ConcurrencyConfig,
+            tags=["Environment Info"],
+            summary="Get concurrency configuration",
+            description="""
+Get the current concurrency configuration for this server.
+
+Returns information about:
+- **max_concurrent_envs**: Maximum number of concurrent WebSocket sessions
+- **session_timeout_seconds**: Timeout for inactive sessions (None if no timeout)
+- **reject_on_capacity**: Whether to reject or queue connections at capacity
+            """,
+        )
+        async def get_concurrency_config() -> ConcurrencyConfig:
+            """Return concurrency configuration."""
+            return self._concurrency_config
+


why not but not sure it's necessary ?

Wauplin · 2025-12-08T16:06:34Z

src/openenv/core/env_server/http_server.py

+                    msg_type = message_dict.get("type", "")
+
+                    try:
+                        if msg_type == "reset":


I feel logic could be simplified like this:

try: match msg_type: case "reset": ... # todo: implement response = WSObservationResponse(...) case "step": ... # todo: implement response = WSObservationResponse(...) case "state": ... # todo: implement response = WSStateResponse(...) case "close": ... # todo: implement case _: response = WSErrorResponse( data={"message": f"Unknown message type: {msg_type}", "code": "UNKNOWN_TYPE"} ) await websocket.send_text(response.model_dump_json()) except ValidationError as e: error_resp = WSErrorResponse( data={"message": "Invalid message", "code": "VALIDATION_ERROR", "errors": e.errors()} ) await websocket.send_text(error_resp.model_dump_json()) except Exception as e: error_resp = WSErrorResponse( data={"message": str(e), "code": "EXECUTION_ERROR"} ) await websocket.send_text(error_resp.model_dump_json())

This way you have clear logic based on msg_type value + the validation errors are all caught in the same place

Wauplin · 2025-12-08T16:08:34Z

src/openenv/core/env_server/http_server.py


 def create_app(
-    env: Environment,
+    env: Union[Environment, Callable[[], Environment], Type[Environment]],


Suggested change

env: Union[Environment, Callable[[], Environment], Type[Environment]],

env: Callable[[], Environment],

should be enough if we break backward compat'? (at least for now since we don't accept inputs for environment resets yet)

Wauplin · 2025-12-08T16:10:24Z

src/openenv/core/ws_env_client.py

+try:
+    import websockets
+    from websockets.sync.client import connect as ws_connect
+except ImportError:
+    websockets = None  # type: ignore
+    ws_connect = None  # type: ignore


Since websockets is made a required dependency in pyproject.toml I think we should consider it as always available (simplifies a bit the logic)

Wauplin · 2025-12-08T16:11:51Z

src/openenv/core/ws_env_client.py

+        ws_url = base_url.rstrip("/")
+        if ws_url.startswith("http://"):
+            ws_url = "ws://" + ws_url[7:]
+        elif ws_url.startswith("https://"):
+            ws_url = "wss://" + ws_url[8:]
+        elif not ws_url.startswith("ws://") and not ws_url.startswith("wss://"):
+            ws_url = "ws://" + ws_url


(nit) could be a unit-tested helper (can be hard to track all specificities when updating this type of logic in the future)

burtenshaw · 2025-12-08T19:42:59Z

@pankit-eng @zkwentz Can you validate these two backward compatibility points from @Wauplin on this PR . In short, should we go all in on websockets or maintain a http implementation?

should we allow "instantiate a server by passing an env instead of an env factory" to keep backward compatibility? => I would say "no" since project is still in early phase

Server side app will look like this:

# Factory mode for concurrent sessions
app = create_app(
    env=MyEnvironment,  # Pass class, not instance
    max_concurrent_envs=4
)

should we maintain both a "HTTP-based interface" and a "websocket-based interface"? => same, I would say "no" at it means doubling the amount of work (2 paths in the http server and 2 very similar clients to maintain with same interface with different internal logic). Better to keep only 1 interface that is more robust for the future. End users should not be impacted by this decision (except for the breaking change to adapt).

iiuc, it client code will only look like this:

from envs.echo_env import EchoEnv, EchoAction

client = EchoEnv(base_url="ws://localhost:8000/ws")

result = await client.reset()
result = await client.step(EchoAction(...))

burtenshaw · 2025-12-10T14:11:10Z

@rycerzes I tested out this branch and it worked well. I updated the PR description myself with a high-level before and after snippet and some benchmarking info.

rycerzes · 2025-12-10T16:18:14Z

Thanks @Wauplin for the detailed review! Really appreciate all the feedback - the suggestions on simplifying the message types with discriminators, refactoring the capacity status, and cleaning up the validation logic make a lot of sense. I'll work through these and have them resolved by end of Friday.

@burtenshaw Thanks for testing the branch and updating the PR description with the benchmarking info!

rycerzes and others added 5 commits December 4, 2025 23:01

impl concurrency management and session handling

e0a063d

add async to http server

95563b0

Merge remote-tracking branch 'origin/async-http' into impl/concurrency

1584902

concurrency config

3601357

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 7, 2025

rycerzes changed the base branch from main to release December 7, 2025 20:28

burtenshaw mentioned this pull request Dec 8, 2025

[RELEASE] 0.2.0 #232

Draft

burtenshaw changed the title ~~feat: WebSocket-based Concurrency Architecture~~ [FEATURE] WebSocket-based Concurrency Architecture Dec 8, 2025

chore: add websockets to pyproject.toml

600acb4

Wauplin reviewed Dec 8, 2025

View reviewed changes

burtenshaw mentioned this pull request Dec 10, 2025

[ENHANCEMENT] use websockets in cli template #243

Open

burtenshaw mentioned this pull request Dec 10, 2025

add HuggingFace-style AutoEnv and AutoAction classes #222

Open

	env: Union[Environment, Callable[[], Environment], Type[Environment]],
	env: Callable[[], Environment],

[FEATURE] WebSocket-based Concurrency Architecture #239

Are you sure you want to change the base?

[FEATURE] WebSocket-based Concurrency Architecture #239

Conversation

rycerzes commented Dec 7, 2025 • edited by burtenshaw Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add WebSocket support with concurrent session management

High-level Diff

Changes

API

TODO

Uh oh!

rycerzes commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

burtenshaw commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

burtenshaw commented Dec 8, 2025

Uh oh!

rycerzes commented Dec 8, 2025

Uh oh!

Wauplin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

burtenshaw commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

burtenshaw commented Dec 10, 2025

Uh oh!

rycerzes commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rycerzes commented Dec 7, 2025 •

edited by burtenshaw

Loading

rycerzes commented Dec 7, 2025 •

edited

Loading

burtenshaw commented Dec 8, 2025 •

edited

Loading

burtenshaw commented Dec 8, 2025 •

edited

Loading