Skip to content

Initial skeleton#1

Merged
Darktex merged 6 commits into
mainfrom
skeleton
Oct 6, 2025
Merged

Initial skeleton#1
Darktex merged 6 commits into
mainfrom
skeleton

Conversation

@Darktex
Copy link
Copy Markdown
Contributor

@Darktex Darktex commented Oct 3, 2025

No description provided.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 3, 2025
Comment thread src/__pycache__/types.cpython-310.pyc Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we need to check in cpython files?

Comment thread src/types.py


@dataclass
class ExecutionResult:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few points to consider:

  1. stdout and stderr could be long streams and may cause env container to OOM if we store it in memory. Let's discuss on how the policy would leverage this information. Better to minimize the context sharing from inside and outside of the container.
  2. One of the paradigms we are seeing with SWE agent training is that exit_code, failure reason are generally a good starting point for execution result. Lets discuss whether this paradigm can be applied here too.

@Darktex Darktex closed this Oct 6, 2025
@Darktex Darktex reopened this Oct 6, 2025
@Darktex Darktex merged commit 1b6e3ff into main Oct 6, 2025
1 check passed
jspisak pushed a commit that referenced this pull request Oct 22, 2025
Updating list of supporters with LastMile AI
pankit-eng pushed a commit that referenced this pull request Nov 3, 2025
FIX: Handle double-nested observation in client parser
rycerzes referenced this pull request in rycerzes/OpenEnv Nov 19, 2025
rycerzes referenced this pull request in rycerzes/OpenEnv Nov 19, 2025
Updating list of supporters with LastMile AI
rycerzes referenced this pull request in rycerzes/OpenEnv Nov 19, 2025
FIX: Handle double-nested observation in client parser
burtenshaw pushed a commit that referenced this pull request Dec 8, 2025
burtenshaw pushed a commit that referenced this pull request Jan 13, 2026
* Upload current REPL state

* use official prompt

* unify REPLEnv api

* Update default model in server side

* Updated example using IP

* Updated with prompt

* inject final answer

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
akashkathole7 added a commit to akashkathole7/OpenEnv that referenced this pull request Apr 23, 2026
…x rank

Two config fixes surfaced by Daniel Han's "LoRA Without Regret" guidance
at the Scaler workshop 2026-04-22:

1. LORA_TARGETS was attention-only (q/k/v/o). Adding MLP projections
   (gate_proj, up_proj, down_proj) covers the MLP block. Per Daniel, MLP
   adapters materially close the gap with full fine-tuning at near-zero
   VRAM cost and were flagged as the huggingface#1 silent underperformance in
   attention-only LoRA setups.

2. lora_alpha was LORA_RANK (naive PEFT default = alpha equals rank).
   New LORA_ALPHA = LORA_RANK * 2 follows the 2x-rank convention that
   Thinking Machines documented as the regime where LoRA closes the gap
   with full fine-tuning on small-to-medium models.

Both scripts share constants via train_grpo_real.py -> train_sft_warmstart.py
import, so the SFT checkpoint slots cleanly into the GRPO phase without
re-init.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
akashkathole7 added a commit to akashkathole7/OpenEnv that referenced this pull request Apr 24, 2026
Grepped src/openenv/core/rubrics/ and confirmed the Rubric base class +
container set (WeightedSum, Sequential, Gate, RubricList, LLMJudge)
already exist per RFC 004. Updated the README section to show exactly
which container our rewards.py functional composition maps to, one row
per component in a new mapping table.

Does NOT refactor rewards.py (invariant huggingface#1 per ONSITE_BRIEFING.md).
The narrative is: functional composition honors the composable-rubrics
philosophy in component independence + per-component audit trail + CI
contract over multi-component defense-in-depth, even though the
class-inheritance refactor is deferred to avoid regressing the 6
red-team ceiling tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants