Skip to content

fix(agents): make loop breaker session-global#79267

Open
TurboTheTurtle wants to merge 3 commits intoopenclaw:mainfrom
TurboTheTurtle:codex/global-loop-breaker
Open

fix(agents): make loop breaker session-global#79267
TurboTheTurtle wants to merge 3 commits intoopenclaw:mainfrom
TurboTheTurtle:codex/global-loop-breaker

Conversation

@TurboTheTurtle
Copy link
Copy Markdown
Contributor

@TurboTheTurtle TurboTheTurtle commented May 8, 2026

Closes #79252.

Summary

  • Make the global circuit breaker count stable no-progress outcomes across repeated tool-call patterns in the session, instead of only the current tool signature.
  • Keep changing-output calls and unrelated one-off tools from contributing to the global breaker.
  • Add detector-level and before-tool-call wrapper coverage proving a different tool is blocked once the session-global threshold is reached.
  • Add an Unreleased changelog entry for the agent safety fix.

Real behavior proof

  • Behavior or issue addressed: a session that repeatedly produces stable no-progress tool outcomes should trip the global breaker and block a different next tool, instead of continuing the loop because the next tool has a different name/argument signature.
  • Real environment tested: local macOS 26.4.1 (25E253) arm64 OpenClaw checkout, Node v25.8.2, pnpm 10.33.2, branch codex/global-loop-breaker at commit 9e76446e98d08d6ccb8963329bca5a09ce8db06e.
  • Exact steps or command run after this patch:
    • pnpm install --frozen-lockfile --offline
    • pnpm exec vitest run src/agents/tool-loop-detection.test.ts --reporter=dot
    • pnpm test:e2e src/agents/pi-tools.before-tool-call.e2e.test.ts --reporter=dot
    • pnpm exec oxfmt --check --threads=1 CHANGELOG.md src/agents/tool-loop-detection.ts src/agents/tool-loop-detection.test.ts src/agents/pi-tools.before-tool-call.e2e.test.ts
    • git diff --check
    • pnpm check:changelog-attributions
    • pnpm tsgo:core
    • pnpm tsgo:core:test
    • pnpm exec tsx -e '<production before-tool-call smoke: run 30 stable read tool outcomes in one session, then invoke memory_search through the wrapped tool path>'
  • Evidence after fix: console output after fix from the production before-tool-call smoke:
threshold=30
readExecutions=30
searchExecutions=0
blockedDeniedReason=tool-loop
blockedStatus=blocked
blockedContent=CRITICAL: Session tool calls have repeated identical no-progress outcomes 30 times across tools. Session execution blocked by global circuit breaker to prevent runaway loops.

Detector tests also passed with Test Files 1 passed (1) and Tests 45 passed (45). Before-tool-call e2e tests passed with Test Files 1 passed (1) and Tests 38 passed (38). Formatting, whitespace, changelog attribution, core typecheck, and core test typecheck all passed.

  • Observed result after fix: after 30 stable no-progress read outcomes in the same session, the next memory_search call was blocked by global_circuit_breaker before its execute function ran (searchExecutions=0), matching the intended cross-tool session-global behavior.
  • What was not tested: a live model-driven Gateway conversation with external services; the real-behavior smoke used production OpenClaw before-tool-call wrapping and diagnostic session state with local in-process tools.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 8, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 8, 2026

Codex review: needs maintainer review before merge.

Summary
The PR changes agent tool-loop detection to aggregate stable no-progress outcomes across a session/run, adds detector and before-tool-call regression tests, and adds a changelog entry.

Reproducibility: yes. Source inspection on current main gives a high-confidence path: record stable no-progress outcomes for one tool signature, then call a different tool and the existing global breaker reads only the different tool's current-signature streak.

Real behavior proof
Sufficient (terminal): The PR body includes terminal output from an after-fix production before-tool-call smoke showing the different next tool blocked before execution.

Next step before merge
No repair lane is needed because the PR already contains a focused implementation, changelog entry, supplied real behavior proof, and relevant green exact-head checks.

Security
Cleared: The diff is limited to agent loop-detection logic, tests, and changelog text, with no dependency, workflow, secret, permission, or package-resolution changes.

Review details

Best possible solution:

Land the session/run-scoped aggregate breaker with the regression tests and changelog after normal maintainer review.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection on current main gives a high-confidence path: record stable no-progress outcomes for one tool signature, then call a different tool and the existing global breaker reads only the different tool's current-signature streak.

Is this the best way to solve the issue?

Yes. The PR fixes the detector where the contract is implemented, keeps run scoping and same-pattern detectors intact, and covers both the pure detector and the production before-tool-call wrapper path.

What I checked:

  • Current main breaker is current-signature scoped: detectToolCallLoop computes noProgressStreak from getNoProgressStreak(history, toolName, currentHash) and compares that per-signature count to globalCircuitBreakerThreshold, so a different next tool does not inherit the prior stable no-progress streak. (src/agents/tool-loop-detection.ts:512, 7c31a9aafc1c)
  • Docs define global behavior: The public loop-detection docs describe globalCircuitBreakerThreshold as the global no-progress breaker threshold across all detectors, making the reported behavior a contract bug rather than a new feature request. Public docs: docs/tools/loop-detection.md. (docs/tools/loop-detection.md:82, 7c31a9aafc1c)
  • Current main coverage only proves same-pattern blocking: The existing main test covers repeated no-progress calls for one tool pattern, but does not cover a different next tool after the session-level threshold is reached. (src/agents/tool-loop-detection.test.ts:518, 7c31a9aafc1c)
  • PR implements aggregate session counting: The PR adds getGlobalNoProgressStreak and switches the global breaker branch to compare the aggregate session/run no-progress count, while preserving the existing same-pattern count for poll warnings and critical polling loops. (src/agents/tool-loop-detection.ts:388, 11667aafe5e9)
  • PR adds focused regression coverage: The detector test proves a different tool is blocked after the session-global threshold, and the before-tool-call e2e test proves the wrapped memory_search execute function is not called after repeated stable read outcomes. (src/agents/pi-tools.before-tool-call.e2e.test.ts:273, 11667aafe5e9)
  • Real behavior proof and exact-head checks: The PR body includes terminal output from a production before-tool-call smoke showing readExecutions=30, searchExecutions=0, blockedDeniedReason=tool-loop, and exact-head GitHub check runs include a successful Real behavior proof check plus successful lint/type/test/security shards. (11667aafe5e9)

Likely related people:

  • steipete: Peter Steinberger introduced configurable tool-loop detection and recently maintained exec outcome hashing and loop behavior in the affected detector files. (role: feature-history owner and recent maintainer; confidence: high; commits: 076df941a326, e9bce3f81c3a, 9b2f10dcf8d5; files: src/agents/tool-loop-detection.ts, src/agents/tool-loop-detection.test.ts, docs/tools/loop-detection.md)
  • vincentkoc: Vincent Koc recently worked on diagnostic event and before-tool-call surfaces that this fix exercises through the wrapped tool path. (role: adjacent diagnostics and hook-path maintainer; confidence: medium; commits: cead2ea4b106, bcdacfa1b3df, 8bff73cfb07f; files: src/agents/pi-tools.before-tool-call.ts, src/agents/pi-tools.before-tool-call.e2e.test.ts, src/infra/diagnostic-events.ts)
  • akramcodez: Earlier stuck-loop detection and backoff infrastructure history touched the current loop-detection and before-tool-call files. (role: earlier loop/backoff infrastructure contributor; confidence: medium; commits: e5eb5b3e43d7; files: src/agents/tool-loop-detection.ts, src/agents/tool-loop-detection.test.ts, src/agents/pi-tools.before-tool-call.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 7c31a9aafc1c.

@TurboTheTurtle TurboTheTurtle changed the title [codex] fix(agents): make loop breaker session-global fix(agents): make loop breaker session-global May 8, 2026
@openclaw-barnacle openclaw-barnacle Bot added triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 8, 2026
@TurboTheTurtle
Copy link
Copy Markdown
Contributor Author

TurboTheTurtle commented May 8, 2026

Addressed the remaining review item in 9e76446e by adding the missing changelog entry.

I also updated the PR body with structured real behavior proof from a production before-tool-call smoke: after 30 stable no-progress read outcomes in one session, the next memory_search call was blocked by the global circuit breaker before its execute function ran (searchExecutions=0). Detector tests, before-tool-call e2e, formatting, diff check, changelog attribution, and tsgo core/core:test all passed. The Real behavior proof check is now passing.

Remaining red check: check-additional-runtime-topology-architecture is failing in pnpm check:architecture on the existing madge cycle through src/agents/agent-scope.ts -> src/agents/model-ref-shared.ts -> ... -> src/plugins/types.ts. That cycle is outside this loop-breaker diff.

@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 8, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 8, 2026
@TurboTheTurtle TurboTheTurtle force-pushed the codex/global-loop-breaker branch from 9e76446 to 8fb3478 Compare May 8, 2026 08:06
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 8, 2026
Copy link
Copy Markdown
Contributor Author

Rebased this branch onto current upstream/main (29689c62d0) to pick up the upstream import-cycle cleanup that was making check-additional-runtime-topology-architecture fail.

Local verification after the rebase:

  • pnpm check:architecture

That check now passes locally with 0 runtime value cycle(s) and 0 Madge cycles.

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 8, 2026
@TurboTheTurtle TurboTheTurtle force-pushed the codex/global-loop-breaker branch from 8fb3478 to c67ae76 Compare May 8, 2026 14:06
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 8, 2026
Copy link
Copy Markdown
Contributor Author

Rebased this branch onto current upstream/main (2008873be6) to clear the stale old-base CI state; new head is c67ae76e44.

Post-rebase validation:

  • pnpm vitest run src/agents/tool-loop-detection.test.ts src/agents/pi-tools.before-tool-call.e2e.test.ts --reporter=dot — 45 tests passing
  • pnpm exec oxfmt --check --threads=1 CHANGELOG.md src/agents/pi-tools.before-tool-call.e2e.test.ts src/agents/tool-loop-detection.test.ts src/agents/tool-loop-detection.ts
  • git diff --check upstream/main

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 8, 2026
@TurboTheTurtle TurboTheTurtle force-pushed the codex/global-loop-breaker branch from c67ae76 to aa69920 Compare May 8, 2026 14:16
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 8, 2026
Copy link
Copy Markdown
Contributor Author

Follow-up: main advanced again with an upstream UTC timestamp-label test fix, so I rebased this branch once more onto upstream/main (f6476140d2). I also pushed an empty rerun commit after the OpenGrep changed-path scan failed while fetching OpenGrep versions from GitHub and GitHub denied a direct failed-job rerun. New head is 11667aafe5.

Validation after the final rebase: git diff --check upstream/main. CI has restarted on the new head.

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 8, 2026
@martingarramon
Copy link
Copy Markdown
Contributor

The session-global scope change is clean. getGlobalNoProgressStreak accumulates stable no-progress counts across different argsHash patterns and fires when the session total crosses globalCircuitBreakerThreshold. The break on a result-change for any prior-seen pattern is intentionally conservative — a single tool showing progress stops the backward count for all tools. Both edge cases are covered in the new tests (blocks a different tool after the session-global threshold + does not count unrelated one-off tools). LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

globalCircuitBreakerThreshold is per-tool-type, not session-global — allows cross-tool loop evasion

2 participants