openenvrolloutprocessor by shreymodi1 · Pull Request #336 · eval-protocol/python-sdk

shreymodi1 · 2025-11-17T20:58:42Z

name: Pull Request
about: Propose changes to the codebase
title: "Brief description of changes"
labels: ''
assignees: ''

Description

Please include a summary of the change and which issue is fixed or feature is implemented. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)
Implements # (issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update
Refactoring/Code cleanup
Build/CI/CD related changes
Other (please describe):

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.

Test A
Test B

Test Configuration:

Firmware version:
Hardware:
Toolchain:
SDK:

Checklist:

My code follows the style guidelines of this project (ran black ., isort ., flake8 .)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules
I have checked my code and corrected any misspellings

Screenshots (if applicable)

If applicable, add screenshots to help showcase your changes.

Additional context

Add any other context about the PR here.

Note

Adds a generic OpenEnv rollout processor with vLLM/TRL integration, a VLLMPolicy adapter, an execution metadata bag, and new integration tests plus optional OpenEnv deps.

OpenEnv Integration:
- OpenEnvRolloutProcessor (eval_protocol/pytest/openenv_rollout_processor.py): generic processor for OpenEnv HTTPEnvClient; runs rollout loops, builds prompts, calls policy (default LiteLLMPolicy or injected), tracks token usage, collects per-step rewards, and stores prompt_ids/completion_ids in execution_metadata.extra.
- create_openenv_vllm_rollout_func (eval_protocol/pytest/integrations/openenv_trl_vllm.py): bridges TRL with OpenEnv using VLLMPolicy; supports task rotation; returns GRPO-style prompt_ids/completion_ids/eval_score.
LLM Policy:
- VLLMPolicy (eval_protocol/mcp/execution/vllm_policy.py): converts chat messages to a prompt, calls TRL vLLM (server or colocated), decodes output, and returns OpenAI-compatible responses including raw token IDs.
Models:
- Extend ExecutionMetadata with extra: Dict[str, Any] for integration-specific data (e.g., step rewards, token IDs).
Tests:
- Add integration tests for BrowserGym (basic and eval), Echo (Hub), and TextArena (Docker) under tests/pytest/* (skipped on CI).
Config:
- Add openenv optional dependency group in pyproject.toml for OpenEnv packages.

^{Written by Cursor Bugbot for commit 707f7cd. This will update automatically on new commits. Configure here.}

eval_protocol/pytest/openenv_rollout_processor.py

jspisak · 2025-11-18T15:52:04Z

Love seeing this PR!

eval_protocol/pytest/openenv_rollout_processor.py

dphuang2 · 2025-11-18T19:45:35Z

eval_protocol/pytest/openenv_rollout_processor.py

+            """Process a single row with OpenEnv rollout."""
+            start_time = time.perf_counter()
+
+            print(f"\n[OpenEnvRolloutProcessor] Starting rollout for row...", flush=True)


its best practice to use logger, not print statements. Log level is an important part of debugging that you should callout yourself.

eval_protocol/pytest/openenv_rollout_processor.py

dphuang2

shouldn't there be dependency changes for openenv in pyproject.toml?

eval_protocol/pytest/integrations/openenv_trl_vllm.py

eval_protocol/mcp/execution/vllm_policy.py

eval_protocol/pytest/openenv_rollout_processor.py

dphuang2 · 2025-11-19T19:30:52Z

tests/pytest/test_openenv_browsergym_basic.py

+pytestmark = pytest.mark.skipif(os.getenv("CI") == "true", reason="Skip OpenEnv integration tests on CI")
+
+
+@pytest.mark.integration


why integration? How long do these tests take to run? If they take a while then we should run this as part of e2e-smoke-test.yml

dphuang2 · 2025-11-19T19:32:41Z

tests/pytest/test_openenv_browsergym_eval.py

+def _extract_goal_url_title(observation: Any) -> tuple[str, str, str]:
+    goal = getattr(observation, "goal", "") or ""
+    url = getattr(observation, "url", "") or ""
+    title = ""
+    metadata = getattr(observation, "metadata", {}) or {}
+    obs_dict = metadata.get("browsergym_obs", {}) or {}
+    if not goal:
+        goal = obs_dict.get("goal") or ""
+    if not url:
+        url = obs_dict.get("url") or ""
+    titles = obs_dict.get("open_pages_titles") or ()
+    active_idx = _as_scalar(obs_dict.get("active_page_index"))
+    try:
+        active_idx = int(active_idx)
+    except Exception:
+        active_idx = 0
+    if isinstance(titles, (list, tuple)) and 0 <= active_idx < len(titles):
+        title = titles[active_idx] or ""
+    return goal, url, title


did you write all this glue code? If you copy-pasted it, can we import directly from openenv instead?

dphuang2 · 2025-11-19T19:33:06Z

tests/pytest/test_openenv_echo_hub.py

+def prompt_builder(observation: Any, step: int, history: List[str]) -> str:
+    """
+    Echo env is very simple; we just send a short instruction.
+    """
+    return "Please repeat back the next message exactly."


same here, its not a ton of code. But if you copy-pasted it—can we import the implementation directly?

dphuang2 · 2025-11-19T19:35:14Z

eval_protocol/pytest/openenv_rollout_processor.py

+        """Process evaluation rows and return async tasks."""
+
+        semaphore = config.semaphore
+        max_steps = config.steps or 8


steps already has a default (30) why did you add 8 here?

cursor · 2025-11-20T06:17:17Z

eval_protocol/pytest/integrations/openenv_trl_vllm.py

+            "completion_ids": episode_completion_ids,  # List[List[int]] - tokens per episode
+            "logprobs": episode_logprobs,  # List[List[float]] - logprobs per episode
+            "eval_score": eval_scores,
+        }


Bug: Missing rewards in rollout function return value

The rollout_func computes total_rewards at line 439 (summing step rewards per episode for GRPO training) but doesn't include it in the return dictionary. The function returns prompt_ids, completion_ids, logprobs, and eval_score, but GRPO training requires rewards to update the policy. The computed total_rewards variable is never used, causing the training loop to lack the reward signal needed for reinforcement learning.

cursor · 2025-11-20T06:17:18Z

eval_protocol/pytest/integrations/openenv_trl_vllm.py

+                top_p=kwargs.get("top_p"),
+                top_k=kwargs.get("top_k"),
+                **kwargs,
+            )


Bug: Duplicate keyword arguments in VLLMPolicy instantiation

The vllm_policy_factory function passes top_p and top_k both explicitly (extracted from kwargs at lines 233-234) and as part of **kwargs at line 235. If kwargs contains top_p or top_k keys, Python will raise a TypeError for receiving multiple values for the same keyword argument when instantiating VLLMPolicy. The explicit parameters should be removed from kwargs before unpacking, or the explicit extraction should be removed.

Shrey Modi added 2 commits November 17, 2025 20:58

openenvrolloutprocessor

0092494

openenvrolloutprocessor

ed93cb0

cursor bot reviewed Nov 17, 2025

View reviewed changes

eval_protocol/pytest/openenv_rollout_processor.py Show resolved Hide resolved

trl integration

7e71e03