Skip to content

fix(agents): bound plugin system context#87341

Closed
MonkeyLeeT wants to merge 1 commit into
openclaw:mainfrom
MonkeyLeeT:codex/87045-plugin-context-boundary
Closed

fix(agents): bound plugin system context#87341
MonkeyLeeT wants to merge 1 commit into
openclaw:mainfrom
MonkeyLeeT:codex/87045-plugin-context-boundary

Conversation

@MonkeyLeeT
Copy link
Copy Markdown
Contributor

@MonkeyLeeT MonkeyLeeT commented May 27, 2026

Summary

Fixes #87045 by wrapping plugin-emitted system context with a stable boundary and attribution note before it is joined into model-visible system/developer instructions.

The wrapper is applied only at plugin hook merge points for before_prompt_build / legacy before_agent_start system context. The lower-level system prompt composer remains unchanged, so non-plugin runtime context is not mislabeled as plugin content.

Root Cause

Plugin hook fields such as appendSystemContext were joined next to the base system prompt with only blank-line separation. When the base prompt ended with Codex workspace-file Markdown and the plugin emitted another ## heading, the model could treat the plugin block as another section of the preceding workspace file.

Behavior Change

Plugin-provided prependSystemContext and appendSystemContext now render as a separate # OpenClaw Plugin System Context block bounded by ---, with a note that the content came from OpenClaw plugins and is not part of a workspace file or project document.

Real behavior proof

Behavior addressed: Plugin hook system context no longer appears as an unbounded continuation of workspace-file Markdown across harness, embedded runner, CLI prompt preparation, and Codex app-server prompt preparation paths.

Real environment tested: Local OpenClaw checkout on macOS, branch codex/87045-plugin-context-boundary, current head 0b42c568795839cb13f8494e6ca356f635161028, rebased onto origin/main before rerunning verification on May 27, 2026.

Exact steps or command run after this patch:

git rev-parse HEAD

node --import tsx --input-type=module <<'EOF'
import fs from "node:fs";
import os from "node:os";
import path from "node:path";
import { CURRENT_SESSION_VERSION } from "openclaw/plugin-sdk/agent-sessions";
import {
  getGlobalHookRunner,
  initializeGlobalHookRunner,
  resetGlobalHookRunner,
} from "./src/plugins/hook-runner-global.ts";
import { prepareCliRunContext } from "./src/agents/cli-runner/prepare.ts";

const workspaceDir = fs.mkdtempSync(path.join(os.tmpdir(), "openclaw-plugin-context-proof-"));
process.env.OPENCLAW_STATE_DIR = workspaceDir;
const sessionFile = path.join(workspaceDir, "agents", "main", "sessions", "proof-session.jsonl");
fs.mkdirSync(path.dirname(sessionFile), { recursive: true });
fs.writeFileSync(
  sessionFile,
  `${JSON.stringify({
    type: "session",
    version: CURRENT_SESSION_VERSION,
    id: "proof-session",
    timestamp: new Date(0).toISOString(),
    cwd: workspaceDir,
  })}\n`,
  "utf8",
);

initializeGlobalHookRunner({
  plugins: [{ id: "proof-plugin", status: "loaded" }],
  hooks: [],
  typedHooks: [
    {
      pluginId: "proof-plugin",
      hookName: "before_prompt_build",
      source: "local-proof-plugin",
      priority: 0,
      handler: async () => ({ appendSystemContext: "## My Custom Rules\n\nFoo bar baz." }),
    },
  ],
});

const context = await prepareCliRunContext({
  sessionId: "proof-session",
  sessionKey: "agent:main:proof",
  agentId: "main",
  trigger: "user",
  sessionFile,
  workspaceDir,
  prompt: "where is Foo bar baz documented?",
  provider: "proof-cli",
  model: "proof-model",
  timeoutMs: 1000,
  runId: "proof-run",
  config: {
    agents: {
      defaults: {
        systemPromptOverride: "## TOOLS.md\n\n## Workspace Heading\n\nWorkspace guidance.",
        cliBackends: {
          "proof-cli": {
            command: "proof-cli",
            args: ["--print"],
            systemPromptArg: "--system-prompt",
            systemPromptWhen: "first",
            sessionMode: "existing",
            output: "text",
            input: "arg",
          },
        },
      },
    },
  },
});

console.log(`global hook runner has before_prompt_build: ${getGlobalHookRunner()?.hasHooks("before_prompt_build") === true}`);
console.log("compiled system prompt:");
console.log(context.systemPrompt.replaceAll(workspaceDir, "$WORKSPACE"));
resetGlobalHookRunner();
fs.rmSync(workspaceDir, { recursive: true, force: true });
EOF

node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts src/agents/harness/prompt-compaction-hook-helpers.test.ts src/agents/embedded-agent-runner/run/attempt.test.ts src/agents/cli-runner/prepare.test.ts
git diff --check origin/main...HEAD

Evidence after fix: Terminal output from the real local plugin smoke on current head showed the compiled system prompt with a plugin boundary and attribution note before the plugin-supplied rules. Current head was verified as 0b42c568795839cb13f8494e6ca356f635161028; focused regression proof passed 7 test files / 439 tests, including the Codex app-server sibling coverage requested by review, and git diff --check origin/main...HEAD completed cleanly. The copied terminal output is included below.

Observed result after fix: The real local plugin hook returned appendSystemContext: "## My Custom Rules\n\nFoo bar baz."; the compiled system prompt places the workspace-style ## TOOLS.md content first, then inserts ---, # OpenClaw Plugin System Context, the plugin attribution note, the plugin rules, and a closing --- before the model identity text. The Codex app-server sibling test now also asserts wrapped pre- and post-system plugin context around the custom Codex system prompt.

What was not tested: Live model attribution smoke. Focused oxlint was attempted, but the wrapper is currently blocked before linting by unrelated plugin-sdk boundary DTS errors in current main under src/media/* and src/gateway/managed-image-attachments.ts.

Before evidence (optional but encouraged): Not captured; the proof here is after-fix terminal output plus focused regression coverage.

Current head verified:

0b42c568795839cb13f8494e6ca356f635161028

Terminal output copied from the smoke:

global hook runner has before_prompt_build: true
compiled system prompt:
## TOOLS.md

## Workspace Heading

Workspace guidance.

---
# OpenClaw Plugin System Context

The following instructions were supplied by OpenClaw plugins. They are not part of any workspace file or project document.

## My Custom Rules

Foo bar baz.

---

Current model identity: proof-cli/proof-model. If asked what model you are, answer with this value for the current run.

Focused regression proof also passed:

Test Files  7 passed (7)
Tests       439 passed (439)

git diff --check origin/main...HEAD completed cleanly.

Verification

  • Current head verified: 0b42c568795839cb13f8494e6ca356f635161028
  • Local plugin smoke through initializeGlobalHookRunner + prepareCliRunContext, with redacted compiled context.systemPrompt shown above
  • node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts src/agents/harness/prompt-compaction-hook-helpers.test.ts src/agents/embedded-agent-runner/run/attempt.test.ts src/agents/cli-runner/prepare.test.ts
  • git diff --check origin/main...HEAD

Fixes #87045

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S proof: supplied External PR includes structured after-fix real behavior proof. labels May 27, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 27, 2026

Codex review: needs maintainer review before merge. Reviewed May 27, 2026, 3:38 PM ET / 19:38 UTC.

Summary
The PR wraps plugin-provided prepend/append system context in an attributed Markdown boundary at harness and embedded prompt-build merge points, with CLI, embedded-runner, harness, and Codex app-server coverage.

PR surface: Source +31, Tests +131. Total +162 across 7 files.

Reproducibility: yes. Current main source shows plugin hook system context is concatenated with blank-line separators, and the PR body includes a real local plugin smoke showing the compiled prompt after the fix.

Review metrics: 1 noteworthy metric.

  • Plugin hook context fields: 2 existing fields now wrapped. prependSystemContext and appendSystemContext are public plugin hook outputs, so maintainers should notice the model-visible format change before merge.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • none

Risk before merge

  • This intentionally changes model-visible prompt text for every plugin that uses prependSystemContext or appendSystemContext, so existing plugins may see changed attribution, token use, or instruction hierarchy after upgrade.
  • No live model attribution smoke was supplied; confidence rests on compiled prompt output and focused regression tests.

Maintainer options:

  1. Land with prompt-format acceptance (recommended)
    If maintainers accept the extra attributed Markdown wrapper for plugin system context, this PR is the narrow path that preserves the existing hook API while bounding model attribution.
  2. Tune wrapper copy before merge
    If the heading or attribution note should be treated as a durable product contract, adjust that static copy now while keeping the same call-site coverage.
  3. Pause for plugin contract direction
    If core should not alter plugin-emitted system context automatically, pause this PR and decide whether the responsibility belongs in plugin docs or a new opt-in hook contract.

Next step before merge
Manual maintainer review should decide whether to accept the plugin-facing prompt-format change; there is no narrow ClawSweeper repair left after the sibling test update.

Security
Cleared: The patch changes prompt assembly and tests only; it does not touch dependencies, CI, credentials, permissions, downloads, or package execution surfaces.

Review details

Best possible solution:

Land the narrow hook-merge wrapper once maintainers accept the plugin prompt-format compatibility tradeoff and normal checks pass; keep the lower-level composer plugin-agnostic.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main source shows plugin hook system context is concatenated with blank-line separators, and the PR body includes a real local plugin smoke showing the compiled prompt after the fix.

Is this the best way to solve the issue?

Yes, with maintainer compatibility approval. Wrapping only the plugin hook merge outputs is narrower than changing the lower-level system prompt composer or requiring each plugin author to add their own boundary.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 90f30075aa72.

Label changes

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix terminal output from a real local plugin hook through prepareCliRunContext plus focused regression test output for the touched sibling paths.
  • add rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes after-fix terminal output from a real local plugin hook through prepareCliRunContext plus focused regression test output for the touched sibling paths.
  • remove rating: 🦐 gold shrimp: Current PR rating is rating: 🐚 platinum hermit, so this older rating label is no longer current.
  • remove status: ⏳ waiting on author: Current PR status label is status: 👀 ready for maintainer look.

Label justifications:

  • P2: This is a normal-priority bug fix for plugin system-context attribution with limited blast radius but real agent prompt impact.
  • merge-risk: 🚨 compatibility: The PR changes the rendered prompt format for existing plugins that already return prependSystemContext or appendSystemContext.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes after-fix terminal output from a real local plugin hook through prepareCliRunContext plus focused regression test output for the touched sibling paths.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix terminal output from a real local plugin hook through prepareCliRunContext plus focused regression test output for the touched sibling paths.
Evidence reviewed

PR surface:

Source +31, Tests +131. Total +162 across 7 files.

View PR surface stats
Area Files Added Removed Net
Source 3 39 8 +31
Tests 4 140 9 +131
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 7 179 17 +162

What I checked:

Likely related people:

  • vincentkoc: Current-main blame points the raw prepend/append system-context merge lines in the harness and embedded prompt helper back to commit 5c20ff9. (role: introduced behavior; confidence: high; commits: 5c20ff93e07d; files: src/agents/harness/prompt-compaction-hook-helpers.ts, src/agents/embedded-agent-runner/run/attempt.prompt-helpers.ts)
  • steipete: Commit bb46b79 refactored the agent runtime and carried the legacy before_agent_start system-context merge path now touched by this PR. (role: recent area contributor; confidence: high; commits: bb46b79d3c14; files: src/agents/harness/prompt-compaction-hook-helpers.ts, src/agents/embedded-agent-runner/run/attempt.prompt-helpers.ts)
  • Alex Knight: Recent history on the Codex app-server path touched by the sibling test includes commit 42e9504, making this a useful routing signal for Codex harness behavior. (role: adjacent Codex app-server contributor; confidence: medium; commits: 42e9504114f3; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/run-attempt.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. labels May 27, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 27, 2026

ClawSweeper PR egg

✨ Hatched: 🥚 common Clockwork Shellbean

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: watches the merge queue.
Image traits: location review cove; accessory tiny test log scroll; palette amber, ink, and glacier blue; mood sparkly; pose nestled inside a glowing shell; shell woven fiber shell; lighting bright celebratory glints; background small review tokens.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Clockwork Shellbean in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@MonkeyLeeT MonkeyLeeT force-pushed the codex/87045-plugin-context-boundary branch from db0c6ec to 552367e Compare May 27, 2026 16:09
@openclaw-barnacle openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. and removed proof: supplied External PR includes structured after-fix real behavior proof. labels May 27, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 27, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 27, 2026
@MonkeyLeeT MonkeyLeeT marked this pull request as ready for review May 27, 2026 17:04
@MonkeyLeeT MonkeyLeeT force-pushed the codex/87045-plugin-context-boundary branch from 552367e to 65c9d6e Compare May 27, 2026 17:39
@openclaw-barnacle openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. labels May 27, 2026
@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 27, 2026
Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 27, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. labels May 27, 2026
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 27, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. proof: sufficient ClawSweeper judged the real behavior proof convincing. labels May 27, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 27, 2026
@MonkeyLeeT MonkeyLeeT force-pushed the codex/87045-plugin-context-boundary branch from 65c9d6e to 999fde4 Compare May 27, 2026 18:59
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 27, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 27, 2026
@MonkeyLeeT MonkeyLeeT force-pushed the codex/87045-plugin-context-boundary branch from 999fde4 to 0b42c56 Compare May 27, 2026 19:28
Copy link
Copy Markdown
Contributor Author

Addressed the Codex app-server sibling test request and updated the PR body verification for current head 0b42c568795839cb13f8494e6ca356f635161028.

@clawsweeper re-review

@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 27, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 27, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 27, 2026
Copy link
Copy Markdown
Contributor Author

Closing this PR as superseded by the equivalent fix already on main in f4329fe0d636fdfbaddd299157c34416de683141.

That commit adds the plugin hook system-context boundary, applies it in the harness and embedded/CLI prompt hook merge paths, and updates the Codex app-server sibling expectation. Keeping this PR open would only conflict with the landed canonical implementation.

@MonkeyLeeT MonkeyLeeT closed this May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling extensions: codex merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: S status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

appendSystemContext / prependSystemContext concatenation has no boundary marker, causing model to attribute injected rules to the last workspace file

1 participant