Skip to content

Fix: robust response parsing, JSON patching, Weave init side-effects, E2 packaging, reward & tempfile fixes#52

Merged
intertwine merged 4 commits intomainfrom
codex/analyze-and-fix-open-issues-in-repo
Jan 23, 2026
Merged

Fix: robust response parsing, JSON patching, Weave init side-effects, E2 packaging, reward & tempfile fixes#52
intertwine merged 4 commits intomainfrom
codex/analyze-and-fix-open-issues-in-repo

Conversation

@intertwine
Copy link
Copy Markdown
Owner

Motivation

  • Harden environment code so evals and training don't crash on common edge cases (None content, non-dict completions, JSON Pointer array indices, tool failures).
  • Prevent import-time side effects from auto-initializing tracing backends so downstream users can control Weave themselves.
  • Fix correctness and packaging issues that cause reward inflation or broken installs for E2 so benchmarks and CI are reliable.

Description

  • Make response normalization robust in sv_shared.utils.get_response_text() to return "" for None and handle non-dict list tails, and update sv_shared/utils_test.py with corresponding tests.
  • Rework JSON Patch application in environments/sv-env-config-verification/patching.py to support JSON Pointer array indices, root-level ops, append (-), inserts, replaces and removals against lists and dicts.
  • Ensure temporary files created during patch verification are always cleaned up by wrapping tool scans in a try/finally in environments/sv-env-config-verification/sv_env_config_verification.py.
  • Fix severity-weighted scoring in environments/sv-env-config-verification/reward.py so true-positive weighting uses the oracle severity (preventing inflated scores).
  • Remove invalid parallelize_scoring kwarg from the Red Team Attack vf.Rubric construction so load_environment() no longer raises a TypeError.
  • Stop import-time Weave initialization: make sv_shared.initialize_weave_if_enabled() a lazy wrapper and enhance sv_shared/weave_init.py to detect existing external Weave client before calling weave.init(); update environments/tests/test_weave_init.py to mock get_client.
  • Preserve all HuggingFace dataset fields when coercing E1 ClassLabel answers to strings by constructing a Features dict from the existing features in sv_shared/dataset_loader.py.
  • Expand environments/sv-env-config-verification/pyproject.toml Hatch include patterns so built wheels include code, adapters, policies, data and docs.

Testing

  • No full test suite was executed locally as part of this change; CI is expected to run pytest and the repo's existing workflows.
  • Unit tests updated/added to cover the new behavior include sv_shared/utils_test.py (new edge cases for get_response_text) and environments/tests/test_weave_init.py (mocks weave.get_client to validate lazy init behavior).
  • Changes are small and focused; please run make check / uv run pytest -q in CI or locally to validate the full test matrix after merging.

Codex Task

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6ddd07f286

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread environments/sv-env-config-verification/patching.py
intertwine and others added 2 commits January 23, 2026 09:34
Tests accessing env.rubric.reward_funcs fail in CI with verifiers
0.1.9.post3 which wraps rubrics in RubricGroup. Update tests to be
compatible with both API versions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…dict

When a JSON Patch path traverses a key that exists but is null,
raise PatchError instead of silently coercing to {}. This prevents
misapplying patches (e.g., /items/0 on {"items": null} would have
become {"items": {"0": ...}}) and ensures try_apply_patch correctly
falls back to original content, avoiding skewed post-patch rewards.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant