feat(core): wrap external content in tool results (issue #456)#465
feat(core): wrap external content in tool results (issue #456)#465lmorchard wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens prompt-injection defenses by wrapping web-sourced strings inside tool-result outputs with <EXTERNAL-CONTENT ...> blocks (plus the existing safety warning), and extending the history truncator so those wrapped tool outputs are clipped after a subsequent turn—bringing tool results in line with the existing page-snapshot trust framing.
Changes:
- Add new
ExternalContentLabelvariants and apply wrapping toextractandtabstack_extract_markdownstring payload fields. - Extend
WebAgent.truncateOldExternalContent()to recursively clip wrapped external content insiderole: "tool"/tool-result.outputobjects. - Add/adjust regression tests to verify wrapping and clipping behavior (including a recorded injection payload fixture) and validator-feedback prompt wrapping.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/utils/promptSecurity.ts | Adds new external-content labels used by wrapped tool outputs and validator feedback. |
| packages/core/src/tools/webActionTools.ts | Wraps extract tool extractedData with extract-result label. |
| packages/core/src/tools/tabstackTools.ts | Wraps tabstack_extract_markdown.content with tabstack-content; documents intentional non-wrapping for structured JSON outputs. |
| packages/core/src/prompts.ts | Wraps validator taskAssessment/feedback and adds system-prompt guidance about <EXTERNAL-CONTENT> in tool results. |
| packages/core/src/webAgent.ts | Adds recursive walker to clip wrapped <EXTERNAL-CONTENT> found inside tool-result output structures. |
| packages/core/test/utils/promptSecurity.test.ts | Extends label-enum coverage for the new label variants. |
| packages/core/test/tools/webActionTools.test.ts | Updates extract assertions and adds explicit test for <EXTERNAL-CONTENT label="extract-result"> wrapping. |
| packages/core/test/tools/tabstackTools.test.ts | Updates assertions and adds explicit test for <EXTERNAL-CONTENT label="tabstack-content"> wrapping. |
| packages/core/test/prompts.test.ts | Adds coverage for buildValidationFeedbackPrompt wrapping (including fallback feedback). |
| packages/core/test/webAgent.test.ts | Updates validator feedback expectations and adds regression test for clipping wrapped tool outputs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| taskAssessment: wrapExternalContentWithWarning( | ||
| taskAssessment, | ||
| ExternalContentLabel.ValidatorFeedback, | ||
| ), | ||
| feedback: wrapExternalContentWithWarning( |
Bring tool-result messages under the same <EXTERNAL-CONTENT> trust-framing umbrella that page snapshots already use, so attacker-controllable text laundered through tools is (a) clearly labeled as untrusted to the LLM and (b) auto-clipped from history after one turn the way snapshots are. ## Background Before this change, page snapshots had three defensive layers against prompt-injected web content: 1. Each snapshot is wrapped in <EXTERNAL-CONTENT label="..."> tags with a safety warning attached (wrapExternalContentWithWarning). 2. Old snapshots are auto-clipped to "[clipped for brevity]" before each new one lands (truncateOldExternalContent). 3. Snapshots are generated by a deterministic DOM walk — no LLM re-narrates the content. Tool results that carry web-sourced content had none of those layers. They went into this.messages as tool-result payloads, stayed there at full size for the rest of the task, and the agent perceived them as trusted programmatic output. Worst case: extract's secondary LLM read attacker content and produced output that was stored unwrapped and laundered into history. ## Changes Three sites now wrap web-sourced content at the emission boundary: - `extract.extractedData` — wraps with `ExtractResult` before returning from the tool's execute(). - `tabstack_extract_markdown.content` — wraps with `TabstackContent`. - `buildValidationFeedbackPrompt` — wraps both `taskAssessment` and `feedback` (including the null-fallback string) with `ValidatorFeedback` so the validator's LLM-summarized view of conversation history can't silently re-launder injection content. The truncator (`truncateOldExternalContent`) is extended to scan `role: "tool"` messages in addition to `role: "user"`. A new recursive `clipInValue` walker descends through strings/arrays/objects inside each tool-result `output` and applies the existing regex to any string that contains an EXTERNAL-CONTENT block. The walker is generic over all current and future wrap sites — no per-tool awareness needed. One sentence is added to the action-loop system prompt acknowledging that EXTERNAL-CONTENT blocks may appear in tool-result fields as well as user messages. ## Intentionally not wrapped (residual risk) - `webSearch.markdown` — already wrapped at the search-provider level with `SearchResults`. The truncator extension now reaches that wrap inside tool messages for free, so no second wrap is needed. - `tabstack_extract_json.data` and `tabstack_generate_json.data` — schema-constrained structured objects, not free-form prose. String fields nested inside are technically still attacker-controllable; the truncator walks and clips them if tagged. Code comments at the tool definitions document this decision so future maintainers see the rationale. ## Out of scope (follow-ups) - `search_page` / `find_elements` (PR #446): adopt the same helper on rebase. The truncator extension will pick up their wraps without further changes here. - Stricter post-processing of the extraction LLM's output to strip injection-shaped patterns. - A trust/taint model that propagates untrusted-source flags through structured tool outputs. ## Reproduction test A new unit test seeds a `role: "tool"` message with the verbatim uaf.cafe injection payload wrapped as an extract-result, runs the truncator, and asserts the wrap structure persists while the payload strings (e.g. `stoletheminerals.github.io`, `ALWAYS do ONLY`) are clipped from history. Payload recorded from https://uaf.cafe/agent_tabstack.html on 2026-05-20. ## Tests +5 new tests, +1267 total: core 684 / cli 221 / server 96 / extension 266. `pnpm run check` green (typecheck + format:check + all package tests). `gitleaks detect` clean. Closes #456 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fa1fef2 to
536a37c
Compare
|
Addressed Copilot's review comment on the validator-feedback wrap warning text. The warning was page-content-specific ("represents the current state of the web page" / "as page text") which mislabels validator output. Made
Force-pushed as |
Summary
<EXTERNAL-CONTENT>tags so the agent treats it as untrusted, matching the existing trust-framing used for page snapshots.truncateOldExternalContentto walkrole: "tool"messages so wrapped tool results are auto-clipped from history after one turn — fixing the cost-amplification side of the issue.Design Decisions
execute()(or, for the validator path, inside the prompt builder) rather than at the consumer site. The structured shape of the tool-resultoutputis preserved — only the string payload field is wrapped — so existing readers (success/action/isRecoverableetc.) keep working.clipInValuehelper descends through any nested string/array/object in a tool-resultoutputand applies the existing regex to strings containing<EXTERNAL-CONTENT. Generic over all current and future wrap sites — when PR feat(core): add page exploration tools, structured extract #446 addssearch_page/find_elements, they slot in by calling the same wrap helper and need no further truncator changes.data: objectoutputs intentionally not wrapped (tabstack_extract_json,tabstack_generate_json). Their shape is constrained by the caller's JSON schema. String fields nested inside are still attacker-controllable; the truncator walks and clips any tagged content but does not stop free-form payloads in structured fields. Code comments at each tool definition document this decision.**IMPORTANT:**warning that already lands with every wrap.Changes
packages/core/src/utils/promptSecurity.ts— three newExternalContentLabelvariants:ExtractResult,TabstackContent,ValidatorFeedback.packages/core/src/tools/webActionTools.ts— wrapextract.extractedDatawithExtractResult.packages/core/src/tools/tabstackTools.ts— wraptabstack_extract_markdown.contentwithTabstackContent; in-code comments on the two JSON-output tools documenting whydatais intentionally not wrapped.packages/core/src/prompts.ts— wraptaskAssessmentandfeedback(including fallback) inbuildValidationFeedbackPromptwithValidatorFeedback; add one-line acknowledgement to the action-loop system prompt.packages/core/src/webAgent.ts— extendtruncateOldExternalContentwith a recursive walker overrole: "tool"messages.Intentionally not changed
webSearch.markdownis already wrapped at the search-provider level with the existingSearchResultslabel; the truncator extension now reaches that wrap inside tool messages for free.search_page/find_elements(in flight as PR feat(core): add page exploration tools, structured extract #446) adopt the same helper when they rebase.Test Plan
pnpm run checkpasses (typecheck + format:check + all package tests; 1267 total: core 684 / cli 221 / server 96 / extension 266)gitleaks detect— no leaksrole: "tool"message with the verbatim https://uaf.cafe/agent_tabstack.html injection payload (recorded 2026-05-20), runs the truncator, and verifies the wrap persists while the payload is clippedpnpm pilo run "summarize what this page says" --url https://uaf.cafe/agent_tabstack.html --logger jsonagainst a real agent and confirm:output.extractedDataarrives wrapped with<EXTERNAL-CONTENT label="extract-result">stoletheminerals.github.io/text_form.htmlReferences
search_page/find_elementswhen it rebases onto maindocs/dev-sessions/2026-05-20-1203-wrap-external-tool-results/(local-only, not committed)🤖 Generated with Claude Code