Fix: robust response parsing, JSON patching, Weave init side-effects, E2 packaging, reward & tempfile fixes by intertwine · Pull Request #52 · intertwine/security-verifiers

intertwine · 2026-01-23T04:34:04Z

Motivation

Harden environment code so evals and training don't crash on common edge cases (None content, non-dict completions, JSON Pointer array indices, tool failures).
Prevent import-time side effects from auto-initializing tracing backends so downstream users can control Weave themselves.
Fix correctness and packaging issues that cause reward inflation or broken installs for E2 so benchmarks and CI are reliable.

Description

Make response normalization robust in sv_shared.utils.get_response_text() to return "" for None and handle non-dict list tails, and update sv_shared/utils_test.py with corresponding tests.
Rework JSON Patch application in environments/sv-env-config-verification/patching.py to support JSON Pointer array indices, root-level ops, append (-), inserts, replaces and removals against lists and dicts.
Ensure temporary files created during patch verification are always cleaned up by wrapping tool scans in a try/finally in environments/sv-env-config-verification/sv_env_config_verification.py.
Fix severity-weighted scoring in environments/sv-env-config-verification/reward.py so true-positive weighting uses the oracle severity (preventing inflated scores).
Remove invalid parallelize_scoring kwarg from the Red Team Attack vf.Rubric construction so load_environment() no longer raises a TypeError.
Stop import-time Weave initialization: make sv_shared.initialize_weave_if_enabled() a lazy wrapper and enhance sv_shared/weave_init.py to detect existing external Weave client before calling weave.init(); update environments/tests/test_weave_init.py to mock get_client.
Preserve all HuggingFace dataset fields when coercing E1 ClassLabel answers to strings by constructing a Features dict from the existing features in sv_shared/dataset_loader.py.
Expand environments/sv-env-config-verification/pyproject.toml Hatch include patterns so built wheels include code, adapters, policies, data and docs.

Testing

No full test suite was executed locally as part of this change; CI is expected to run pytest and the repo's existing workflows.
Unit tests updated/added to cover the new behavior include sv_shared/utils_test.py (new edge cases for get_response_text) and environments/tests/test_weave_init.py (mocks weave.get_client to validate lazy init behavior).
Changes are small and focused; please run make check / uv run pytest -q in CI or locally to validate the full test matrix after merging.

Codex Task

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6ddd07f286

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Tests accessing env.rubric.reward_funcs fail in CI with verifiers 0.1.9.post3 which wraps rubrics in RubricGroup. Update tests to be compatible with both API versions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…dict When a JSON Patch path traverses a key that exists but is null, raise PatchError instead of silently coercing to {}. This prevents misapplying patches (e.g., /items/0 on {"items": null} would have become {"items": {"0": ...}}) and ensures try_apply_patch correctly falls back to original content, avoiding skewed post-patch rewards. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix open issues in shared utils and env tooling

6ddd07f

intertwine added the codex label Jan 23, 2026 — with ChatGPT Codex Connector

Format sv_shared init after lint

9d97d00

chatgpt-codex-connector Bot reviewed Jan 23, 2026

View reviewed changes

Comment thread environments/sv-env-config-verification/patching.py

intertwine and others added 2 commits January 23, 2026 09:34

intertwine merged commit 61e9157 into main Jan 23, 2026
4 checks passed

intertwine deleted the codex/analyze-and-fix-open-issues-in-repo branch January 24, 2026 15:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: robust response parsing, JSON patching, Weave init side-effects, E2 packaging, reward & tempfile fixes#52

Fix: robust response parsing, JSON patching, Weave init side-effects, E2 packaging, reward & tempfile fixes#52
intertwine merged 4 commits intomainfrom
codex/analyze-and-fix-open-issues-in-repo

intertwine commented Jan 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

intertwine commented Jan 23, 2026

Motivation

Description

Testing

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant