Skip to content

[codex] Harden public eval runtime#25

Closed
scoootscooob wants to merge 13 commits into
mainfrom
codex/stabilized-eval-suite
Closed

[codex] Harden public eval runtime#25
scoootscooob wants to merge 13 commits into
mainfrom
codex/stabilized-eval-suite

Conversation

@scoootscooob
Copy link
Copy Markdown
Collaborator

Summary

  • harden the ClawBench evaluator path with verifier-owned snapshots, protected asset hashing, trace archiving, and no-output runtime failure handling
  • update OpenClaw container/runtime setup for Codex/OpenRouter lanes, plugin runtime dependencies, health probe skipping, and browser lane isolation
  • align README, Space README, and public task docs with the evaluator snapshot model while ignoring local 100-suite artifacts and personal runtime state

Validation

  • .venv/bin/python -m pytest -q -> 270 passed, 1 skipped
  • git diff --check

Notes

The PR intentionally excludes local 100-suite task/result artifacts, Crabbox outputs, private tooling, payload tarballs, and the untracked partnership governance draft.

@scoootscooob scoootscooob marked this pull request as ready for review May 17, 2026 07:51
@scoootscooob scoootscooob requested a review from a team as a code owner May 17, 2026 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant