Fix: robust response parsing, JSON patching, Weave init side-effects, E2 packaging, reward & tempfile fixes#52
Merged
intertwine merged 4 commits intomainfrom Jan 23, 2026
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6ddd07f286
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Tests accessing env.rubric.reward_funcs fail in CI with verifiers 0.1.9.post3 which wraps rubrics in RubricGroup. Update tests to be compatible with both API versions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…dict
When a JSON Patch path traverses a key that exists but is null,
raise PatchError instead of silently coercing to {}. This prevents
misapplying patches (e.g., /items/0 on {"items": null} would have
become {"items": {"0": ...}}) and ensures try_apply_patch correctly
falls back to original content, avoiding skewed post-patch rewards.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This was referenced Jan 23, 2026
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Description
sv_shared.utils.get_response_text()to return""forNoneand handle non-dict list tails, and updatesv_shared/utils_test.pywith corresponding tests.environments/sv-env-config-verification/patching.pyto support JSON Pointer array indices, root-level ops, append (-), inserts, replaces and removals against lists and dicts.try/finallyinenvironments/sv-env-config-verification/sv_env_config_verification.py.environments/sv-env-config-verification/reward.pyso true-positive weighting uses the oracle severity (preventing inflated scores).parallelize_scoringkwarg from the Red Team Attackvf.Rubricconstruction soload_environment()no longer raises aTypeError.sv_shared.initialize_weave_if_enabled()a lazy wrapper and enhancesv_shared/weave_init.pyto detect existing external Weave client before callingweave.init(); updateenvironments/tests/test_weave_init.pyto mockget_client.ClassLabelanswers to strings by constructing a Features dict from the existing features insv_shared/dataset_loader.py.environments/sv-env-config-verification/pyproject.tomlHatch include patterns so built wheels include code, adapters, policies, data and docs.Testing
pytestand the repo's existing workflows.sv_shared/utils_test.py(new edge cases forget_response_text) andenvironments/tests/test_weave_init.py(mocksweave.get_clientto validate lazy init behavior).make check/uv run pytest -qin CI or locally to validate the full test matrix after merging.Codex Task