Skip to content

fix(plugins): add default timeout for before_compaction/after_compaction hooks#84153

Merged
jalehman merged 2 commits into
openclaw:mainfrom
100yenadmin:fix/compaction-hook-default-timeout
May 19, 2026
Merged

fix(plugins): add default timeout for before_compaction/after_compaction hooks#84153
jalehman merged 2 commits into
openclaw:mainfrom
100yenadmin:fix/compaction-hook-default-timeout

Conversation

@100yenadmin
Copy link
Copy Markdown
Contributor

@100yenadmin 100yenadmin commented May 19, 2026

Bug

DEFAULT_VOID_HOOK_TIMEOUT_MS_BY_HOOK in src/plugins/hooks.ts listed only
agent_end. The before_compaction and after_compaction plugin hooks are
void hooks (runBeforeCompaction / runAfterCompactionrunVoidHook(...)),
and runVoidHook applies a timeout only when getVoidHookTimeoutMs returns
a value. With no table entry and no plugin-supplied hook.timeoutMs, both
hooks ran fully unbounded.

In the codex agent harness these hooks fire on the serialized notification
queue — extensions/codex/src/app-server/event-projector.ts handleItemStarted
awaits runAgentHarnessBeforeCompactionHook (and the matching
after_compaction call site) for a contextCompaction item. A slow or hung
compaction hook therefore freezes processing of every later codex notification —
including turn/completed — so the whole agent turn hangs.

Fix

Add before_compaction and after_compaction entries to
DEFAULT_VOID_HOOK_TIMEOUT_MS_BY_HOOK, mirroring the defensive-default pattern
already applied to agent_end. The budget is 30s, matching agent_end
(the closest like-for-like precedent — a void lifecycle hook) rather than the
tighter 15s modifying-hook defaults, because compaction hooks can legitimately
do real work such as a memory flush. The runner is fail-open for void hooks,
so a timed-out handler is logged and compaction proceeds without it.

A plugin can still override the default per-registration via hook.timeoutMs.

Related

Related to #84077 — the same compaction-stall investigation. That issue covers
the host's missing safety timeout on plugin-owned compact(); this is a
separate, independent mechanism (unbounded void hooks freezing the codex
notification queue), hence "Related" rather than "Fixes".

Test plan

  • pnpm tsgo:core and pnpm tsgo:core:test — typecheck passes
  • oxlint on changed files — 0 warnings / 0 errors
  • New src/plugins/hooks.compaction-timeout.test.ts:
    • a never-settling before_compaction handler is bounded by the default timeout (no override) and logged, rather than hanging
    • same for after_compaction
    • a fast before_compaction handler completes without a false-positive timeout
  • vitest (plugins project): new tests + hooks.correlation.test.ts + wired-hooks-compaction.test.ts — 15/15 pass

Real behavior proof

Behavior addressed: hung plugin before_compaction and after_compaction handlers no longer stall compaction forever. The hook runner now applies the default fail-open timeout when no per-hook override is configured.

Real environment tested: local OpenClaw checkout /Volumes/LEXAR/repos/openclaw-fix-compaction-hook-timeout, branch fix/compaction-hook-default-timeout, Node via node --import tsx, actual src/plugins/hooks.ts plugin hook runner.

Exact steps or command run after the patch: ran a live script that registers a plugin whose before_compaction and after_compaction hooks never settle, then invokes runHookTimeout for both hooks without per-hook timeout overrides.

Evidence after fix:

{
  "branch": "fix/compaction-hook-default-timeout",
  "behavior": "hung before_compaction and after_compaction hooks return via default fail-open timeout",
  "elapsedMs": 30033,
  "eventCount": 2,
  "events": [
    "[hooks] before_compaction handler from live-proof-plugin failed: timed out after 30000ms",
    "[hooks] after_compaction handler from live-proof-plugin failed: timed out after 30000ms"
  ]
}

Observed result after fix: both hung hook handlers returned through the default timeout path in about 30 seconds each, emitted fail-open timeout events, and did not block the caller indefinitely.

What was not tested: a real external plugin process wedged in production; the proof uses the in-process hook runner with intentionally never-settling handlers.

@openclaw-barnacle openclaw-barnacle Bot added size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 19, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 19, 2026

Codex review: passed.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
This PR adds 30-second default fail-open timeouts for before_compaction and after_compaction plugin hooks, focused regression tests, and a changelog entry.

Reproducibility: yes. from source inspection: current main has no defaults for these two void hooks, and awaited runVoidHook calls can wait forever on a never-settling handler. I did not run a live hang because this review is read-only.

PR rating
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Summary: Strong live-output proof and a focused patch make this mergeable once maintainers accept the documented compatibility/session-state risk.

Rank-up moves:

  • none
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

PR egg
✨ Hatched: 🥚 common Tiny Merge Sprite

        /\     /\            
      _/  \___/  \_          
     /  ( o   o )  \         
    |      \_/      |        
    |   /\  ===  /\ |        
     \_/  \_____/  \_/       
        _/|_| |_|\_          
       /__| | | |__\         
          ' ' ' '            
         /_/     \_\         
       .-----------.         
      '-------------'        

Rarity: 🥚 common.
Trait: keeps receipts.
Image traits: location flaky test forest; accessory lint brush; palette rose quartz and slate; mood watchful; pose sitting proudly on a smooth stone; shell starlit enamel shell; lighting soft studio lighting; background miniature CI buoys.
How to hatch it: once this PR reaches status: 👀 ready for maintainer look or status: 🚀 automerge armed, the PR author or a maintainer can comment @clawsweeper hatch to turn this ASCII egg into its generated creature image.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Tiny Merge Sprite in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchable usually means sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

Real behavior proof
Sufficient (live_output): The PR body includes after-fix live output from the actual hook runner showing never-settling before/after compaction handlers returning through the 30000 ms fail-open timeout path.

Risk before merge
Why this matters: - Existing plugins that intentionally take longer than 30 seconds without declaring hook.timeoutMs will now be logged as timed out and skipped while compaction/session progression continues.

  • The timeout race does not cancel the underlying plugin promise, so timed-out hook work can keep running in the background and may observe later session state.
  • The supplied proof uses the in-process hook runner rather than a wedged external plugin process, so exact-head CI and maintainer acceptance of the timeout tradeoff should remain the merge gate.

Maintainer options:

  1. Accept bounded fail-open lifecycle hooks (recommended)
    Merge as-is if maintainers accept trading unbounded hangs for a 30-second default because plugins can still request a larger per-hook timeoutMs.
  2. Tune the default budget before merge
    Change the default timeout or add clearer docs first if maintainers believe compaction hook work commonly needs more than 30 seconds before session progression.
  3. Pause for a hook contract decision
    Pause or close this PR only if maintainers want a broader cancellation/abort contract instead of a fail-open default timeout.

Next step before merge
No ClawSweeper repair is needed; this is already a focused automerge-opted PR and should remain gated on exact-head CI plus maintainer acceptance of the timeout behavior.

Security
Cleared: The diff only changes plugin hook timeout defaults, focused tests, and changelog text; I found no concrete security or supply-chain regression.

Review details

Best possible solution:

Land the central hook-runner default if maintainers accept 30 seconds as the lifecycle hook budget, while preserving per-hook timeoutMs overrides and keeping the separate plugin-owned compact() timeout work in its own issue.

Do we have a high-confidence way to reproduce the issue?

Yes, from source inspection: current main has no defaults for these two void hooks, and awaited runVoidHook calls can wait forever on a never-settling handler. I did not run a live hang because this review is read-only.

Is this the best way to solve the issue?

Yes, the central default table is the narrowest maintainable place to bound all callers while preserving the existing per-registration override. The only open solution-fit decision is whether maintainers accept 30 seconds as the fail-open default budget.

Label justifications:

  • P1: The PR addresses an unbounded awaited plugin hook path that can hang a Codex agent turn and block later notifications.
  • merge-risk: 🚨 compatibility: Plugins that relied on the previous unbounded default can now time out after 30 seconds unless they opt into a longer registration timeout.
  • merge-risk: 🚨 session-state: Timed-out hook promises are not cancelled, so hook work can continue after compaction and session progression have moved on.

What I checked:

  • Current main has no default timeout for these void hooks: At current main, DEFAULT_VOID_HOOK_TIMEOUT_MS_BY_HOOK contains only agent_end; getVoidHookTimeoutMs returns hook.timeoutMs or that table value, and runVoidHook awaits the handler promise directly when no timeout resolves. (src/plugins/hooks.ts:190, 3bc728eaa993)
  • Codex compaction notifications await both hook helpers: The Codex event projector awaits runAgentHarnessBeforeCompactionHook for contextCompaction item start and awaits runAgentHarnessAfterCompactionHook for item completion, so an unbounded hook can block serialized notification processing. (extensions/codex/src/app-server/event-projector.ts:477, 3bc728eaa993)
  • Harness helpers pass the hooks through the global hook runner: The harness helpers check for before_compaction/after_compaction and then await hookRunner.runBeforeCompaction and hookRunner.runAfterCompaction, confirming the PR changes the central runner path used by Codex. (src/agents/harness/prompt-compaction-hook-helpers.ts:87, 3bc728eaa993)
  • Patch adds the two default timeout entries: The PR implementation adds before_compaction: 30_000 and after_compaction: 30_000 to the existing void-hook default table, preserving per-registration hook.timeoutMs override behavior in the existing runner. (src/plugins/hooks.ts:190, 5ad27bd629f6)
  • Patch adds focused timeout regression tests: The new test file covers never-settling before_compaction and after_compaction handlers resolving through the 30-second default timeout path, plus a fast handler that does not false-positive timeout. (src/plugins/hooks.compaction-timeout.test.ts:28, e85c9d6bb05f)
  • Latest shipped release still has the unbounded behavior: The latest release tag v2026.5.18 points at 50a2481, and that release has the same agent_end-only default table plus direct await fallback for unbudgeted void hooks. (src/plugins/hooks.ts:190, 50a2481652b6)

Likely related people:

  • solstead: Commit ab71fdf8 added the compaction/reset plugin hooks and evolved their blocking/session-file semantics, making this the most direct feature-history trail for the affected hook names. (role: introduced compaction hook API; confidence: high; commits: ab71fdf821b2; files: src/plugins/hooks.ts, src/plugins/hook-types.ts, docs/plugins/hooks.md)
  • radek-paclt: Commit ebfeb7a6 added the plugin lifecycle hook runner foundation that the default timeout table and runVoidHook behavior build on. (role: introduced hook runner infrastructure; confidence: medium; commits: ebfeb7a6bf53; files: src/plugins/hooks.ts)
  • steipete: History shows Peter created/moved the Codex app-server event projector path and release commit 50a2481 carried the current hook/harness implementation into v2026.5.18. (role: recent Codex harness and release-area contributor; confidence: medium; commits: d5698038d71c, 50a2481652b6; files: extensions/codex/src/app-server/event-projector.ts, src/agents/harness/prompt-compaction-hook-helpers.ts, src/plugins/hooks.ts)
  • vincentkoc: Commit 7f5a5a34 split the plugin hook contract types that define before_compaction and after_compaction, so he is a useful adjacent reviewer for hook API shape. (role: recent hook type refactor owner; confidence: medium; commits: 7f5a5a34dbf8; files: src/plugins/hook-types.ts, src/plugins/types.ts, src/plugins/hooks.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 3bc728eaa993.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 19, 2026
@100yenadmin 100yenadmin marked this pull request as ready for review May 19, 2026 14:08
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 19, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 19, 2026
@Takhoffman
Copy link
Copy Markdown
Contributor

@clawsweeper automerge

@clawsweeper clawsweeper Bot added the clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge label May 19, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 19, 2026

🦞🧹
ClawSweeper automerge is enabled.

  • Head: e85c9d6bb05f
  • Label: clawsweeper:automerge
  • Action: exact-head review queued (workflow sweep.yml, event repository_dispatch).
  • Flow: review this head, repair/rebase only if needed, then re-review the exact repaired head before merge.

Draft PRs stay fix-only until GitHub marks them ready for review. Pause with /clawsweeper stop.

Automerge progress:

  • 2026-05-19 17:43:38 UTC review queued 8297a290b88d (queued)
  • 2026-05-19 20:45:19 UTC review queued e85c9d6bb05f (after repair)
  • 2026-05-19 20:52:19 UTC review passed e85c9d6bb05f (structured ClawSweeper verdict: pass (sha=e85c9d6bb05f4e95b8a51c583e3a13a40a7e3...)
  • 2026-05-19 21:20:57 UTC review queued e85c9d6bb05f (queued)

Copy link
Copy Markdown
Contributor

@jalehman jalehman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed locally with the maintainer review workflow. No actionable findings from me; approving.

@clawsweeper clawsweeper Bot force-pushed the fix/compaction-hook-default-timeout branch from 8297a29 to e85c9d6 Compare May 19, 2026 20:45
@clawsweeper clawsweeper Bot added status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane. and removed status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 19, 2026
@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation agents Agent runtime and tooling channel: twitch Channel integration: twitch extensions: codex size: XL and removed size: S labels May 19, 2026
@jalehman jalehman force-pushed the fix/compaction-hook-default-timeout branch from 76cf75d to 1a046cc Compare May 19, 2026 22:09
@openclaw-barnacle openclaw-barnacle Bot added size: S and removed docs Improvements or additions to documentation agents Agent runtime and tooling channel: twitch Channel integration: twitch extensions: codex size: XL labels May 19, 2026
Eva (agent) and others added 2 commits May 19, 2026 15:10
…efault timeout

DEFAULT_VOID_HOOK_TIMEOUT_MS_BY_HOOK only listed agent_end, so the
before_compaction and after_compaction void hooks ran fully unbounded
when a plugin supplied no hook.timeoutMs. In the codex agent harness
these hooks fire on the serialized notification queue, so a slow or hung
handler froze processing of every later codex notification — including
turn/completed — hanging the whole agent turn.

Add defensive default timeout entries for both hooks, mirroring the
existing agent_end pattern. The budget matches agent_end's 30s rather
than the tighter modifying-hook defaults because compaction hooks can
legitimately do real work (e.g. a memory flush). The runner is fail-open
for void hooks, so a timed-out handler is logged and compaction proceeds.
@jalehman jalehman force-pushed the fix/compaction-hook-default-timeout branch from 1a046cc to 41fa5fe Compare May 19, 2026 22:13
@jalehman jalehman merged commit 5c9a8f3 into openclaw:main May 19, 2026
24 checks passed
@jalehman
Copy link
Copy Markdown
Contributor

Merged via squash.

Thanks @100yenadmin!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: S status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants