Conversation
|
Love seeing this PR! |
| """Process a single row with OpenEnv rollout.""" | ||
| start_time = time.perf_counter() | ||
|
|
||
| print(f"\n[OpenEnvRolloutProcessor] Starting rollout for row...", flush=True) |
There was a problem hiding this comment.
its best practice to use logger, not print statements. Log level is an important part of debugging that you should callout yourself.
dphuang2
left a comment
There was a problem hiding this comment.
shouldn't there be dependency changes for openenv in pyproject.toml?
| pytestmark = pytest.mark.skipif(os.getenv("CI") == "true", reason="Skip OpenEnv integration tests on CI") | ||
|
|
||
|
|
||
| @pytest.mark.integration |
There was a problem hiding this comment.
why integration? How long do these tests take to run? If they take a while then we should run this as part of e2e-smoke-test.yml
| def _extract_goal_url_title(observation: Any) -> tuple[str, str, str]: | ||
| goal = getattr(observation, "goal", "") or "" | ||
| url = getattr(observation, "url", "") or "" | ||
| title = "" | ||
| metadata = getattr(observation, "metadata", {}) or {} | ||
| obs_dict = metadata.get("browsergym_obs", {}) or {} | ||
| if not goal: | ||
| goal = obs_dict.get("goal") or "" | ||
| if not url: | ||
| url = obs_dict.get("url") or "" | ||
| titles = obs_dict.get("open_pages_titles") or () | ||
| active_idx = _as_scalar(obs_dict.get("active_page_index")) | ||
| try: | ||
| active_idx = int(active_idx) | ||
| except Exception: | ||
| active_idx = 0 | ||
| if isinstance(titles, (list, tuple)) and 0 <= active_idx < len(titles): | ||
| title = titles[active_idx] or "" | ||
| return goal, url, title |
There was a problem hiding this comment.
did you write all this glue code? If you copy-pasted it, can we import directly from openenv instead?
| def prompt_builder(observation: Any, step: int, history: List[str]) -> str: | ||
| """ | ||
| Echo env is very simple; we just send a short instruction. | ||
| """ | ||
| return "Please repeat back the next message exactly." |
There was a problem hiding this comment.
same here, its not a ton of code. But if you copy-pasted it—can we import the implementation directly?
| """Process evaluation rows and return async tasks.""" | ||
|
|
||
| semaphore = config.semaphore | ||
| max_steps = config.steps or 8 |
There was a problem hiding this comment.
steps already has a default (30) why did you add 8 here?
| "completion_ids": episode_completion_ids, # List[List[int]] - tokens per episode | ||
| "logprobs": episode_logprobs, # List[List[float]] - logprobs per episode | ||
| "eval_score": eval_scores, | ||
| } |
There was a problem hiding this comment.
Bug: Missing rewards in rollout function return value
The rollout_func computes total_rewards at line 439 (summing step rewards per episode for GRPO training) but doesn't include it in the return dictionary. The function returns prompt_ids, completion_ids, logprobs, and eval_score, but GRPO training requires rewards to update the policy. The computed total_rewards variable is never used, causing the training loop to lack the reward signal needed for reinforcement learning.
| top_p=kwargs.get("top_p"), | ||
| top_k=kwargs.get("top_k"), | ||
| **kwargs, | ||
| ) |
There was a problem hiding this comment.
Bug: Duplicate keyword arguments in VLLMPolicy instantiation
The vllm_policy_factory function passes top_p and top_k both explicitly (extracted from kwargs at lines 233-234) and as part of **kwargs at line 235. If kwargs contains top_p or top_k keys, Python will raise a TypeError for receiving multiple values for the same keyword argument when instantiating VLLMPolicy. The explicit parameters should be removed from kwargs before unpacking, or the explicit extraction should be removed.
name: Pull Request
about: Propose changes to the codebase
title: "Brief description of changes"
labels: ''
assignees: ''
Description
Please include a summary of the change and which issue is fixed or feature is implemented. Please also include relevant motivation and context. List any dependencies that are required for this change.
Fixes # (issue)
Implements # (issue)
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.
Test Configuration:
Checklist:
black .,isort .,flake8 .)Screenshots (if applicable)
If applicable, add screenshots to help showcase your changes.
Additional context
Add any other context about the PR here.
Note
Adds a generic OpenEnv rollout processor with vLLM/TRL integration, a VLLMPolicy adapter, an execution metadata bag, and new integration tests plus optional OpenEnv deps.
OpenEnvRolloutProcessor(eval_protocol/pytest/openenv_rollout_processor.py): generic processor for OpenEnv HTTPEnvClient; runs rollout loops, builds prompts, calls policy (defaultLiteLLMPolicyor injected), tracks token usage, collects per-step rewards, and storesprompt_ids/completion_idsinexecution_metadata.extra.create_openenv_vllm_rollout_func(eval_protocol/pytest/integrations/openenv_trl_vllm.py): bridges TRL with OpenEnv usingVLLMPolicy; supports task rotation; returns GRPO-styleprompt_ids/completion_ids/eval_score.VLLMPolicy(eval_protocol/mcp/execution/vllm_policy.py): converts chat messages to a prompt, calls TRL vLLM (server or colocated), decodes output, and returns OpenAI-compatible responses including raw token IDs.ExecutionMetadatawithextra: Dict[str, Any]for integration-specific data (e.g., step rewards, token IDs).tests/pytest/*(skipped on CI).openenvoptional dependency group inpyproject.tomlfor OpenEnv packages.Written by Cursor Bugbot for commit 707f7cd. This will update automatically on new commits. Configure here.