Skip to content

[Feat] Gateway: offload non-image attachments on chat.send#67572

Merged
frankekn merged 9 commits intoopenclaw:mainfrom
samzong:feat/gateway-attachment-offload-all-mimes
Apr 28, 2026
Merged

[Feat] Gateway: offload non-image attachments on chat.send#67572
frankekn merged 9 commits intoopenclaw:mainfrom
samzong:feat/gateway-attachment-offload-all-mimes

Conversation

@samzong
Copy link
Copy Markdown
Contributor

@samzong samzong commented Apr 16, 2026

Summary

  • Problem: Gateway RPC attachments (chat.send, agent.run, server-node-events) dropped every non-image payload silently with a warn-log, even though channel inbound paths already deliver any MIME to the agent via ctx.MediaPaths. Large image MIMEs outside a narrow JPEG/PNG/WebP/GIF/HEIC/HEIF allowlist also threw mid-parse.
  • Why it matters: RPC clients (desktop app, CLI, third-party Gateway consumers) cannot attach PDFs, docx, xlsx, zip, etc., so they are not at parity with the channel path; worse, the silent drop surfaces as "the agent just ignored my file".
  • What changed: parseMessageWithAttachments now accepts any MIME, offloads everything over the 2 MB threshold to the media store (every MIME, not just the image allowlist), and returns structured offloadedRefs for both images and non-images. chat.send routes non-image offloads into ctx.MediaPaths + ctx.MediaTypes so the existing channel staging pipeline reaches them. Silent drops become explicit UnsupportedAttachmentError (4xx) with closed reason union empty-payload | text-only-image | unsupported-non-image; storage faults stay classified as MediaOffloadError (5xx).
  • What did NOT change (scope boundary): agent.run and the node-events path keep acceptNonImage: false because they do not wire ctx.MediaPaths yet; inline image behavior for images under the offload threshold is unchanged; media-store layout, media://inbound/<id> URI scheme, and OOM threshold are unchanged.

Change Type (select all)

  • Feature
  • Refactor required for the fix

Scope (select all touched areas)

  • Gateway / orchestration
  • API / contracts

Linked Issue/PR

Root Cause (if applicable)

N/A — feature work, not a bug fix.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
  • Target test or file: src/gateway/chat-attachments.test.ts, src/gateway/server-methods/chat.directive-tags.test.ts, src/gateway/server-node-events.test.ts
  • Scenario the test should lock in:
    • Non-image MIMEs offload via the media store and surface through offloadedRefs (PDF, docx with generic-zip sniff, xlsx via filename extension, zip as itself, opaque octet-stream).
    • supportsInlineImages: false rejects images with text-only-image reason but still offloads non-images so text-only models can read them via Read/Bash.
    • acceptNonImage: false rejects non-images with unsupported-non-image reason (agent.run + node-events).
    • chat.send routes non-image offloadedRefs into ctx.MediaPaths/MediaTypes and sets ctx.MediaStaged=true so the dispatch pipeline skips its own stageSandboxMedia pass.
    • OOXML containers (docx/xlsx) keep their declared specific MIME even when the sniffer returns application/zip.

User-visible / Behavior Changes

  • chat.send now accepts non-image attachments (PDF, docx, zip, etc.). They are saved to ~/.openclaw/media/inbound/ and staged into the agent workspace so the agent can Read them.
  • agent.run and node-events RPC still refuse non-image attachments, but with an explicit 4xx UnsupportedAttachmentError(reason: "unsupported-non-image") instead of silently dropping them.
  • Text-only models now receive an explicit 4xx text-only-image refusal when the client sends an image attachment, instead of silent drop.

Diagram (if applicable)

Before (chat.send):
[client attaches report.pdf]
  -> parseMessageWithAttachments
     -> log.warn "non-image, dropping" and return empty images
  -> agent receives message with no attachment
  -> user: "why did it ignore my file?"

After (chat.send):
[client attaches report.pdf]
  -> parseMessageWithAttachments
     -> saveMediaBuffer -> offloadedRefs[{path, mimeType: "application/pdf"}]
  -> prestageNonImageOffloads (synchronous; surfaces 5xx before respond())
     -> stageSandboxMedia into <workspace>/media/inbound/report.pdf
  -> ctx.MediaPaths = ["media/inbound/report.pdf"]; ctx.MediaStaged = true
  -> dispatch pipeline honors MediaStaged, skips re-stage
  -> agent prompt gets buildInboundMediaNote line; agent Reads the file

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No — agent reads files via existing Read/Bash tool surface; no new execution path.
  • Data access scope changed? Yes, scoped: chat.send now persists arbitrary user-supplied MIMEs into ~/.openclaw/media/inbound/<uuid>, then into the agent sandbox workspace. Mitigation: (1) existing maxBytes cap (5 MB) is unchanged; (2) saveMediaBuffer assigns a UUID filename so the original filename is untrusted input; (3) assertSavedMedia verifies the returned ID contains no path separators or null bytes before use; (4) stageSandboxMedia validates the source path through isAllowedSourcePath and assertSandboxPath before copying.

Repro + Verification

Environment

  • OS: macOS 15.4 (darwin arm64)
  • Runtime/container: Node 22, local openclaw gateway run loopback on 18789
  • Model/provider: any vision-capable model (catalog input: ["text","image"]); any text-only model
  • Integration/channel: Gateway WS RPC (chat.send, agent.run)
  • Relevant config (redacted): default

Steps

  1. pnpm check (format+lint+tsgo+boundary scripts).
  2. pnpm test src/gateway/chat-attachments.test.ts src/gateway/server-methods/chat.directive-tags.test.ts src/gateway/server-node-events.test.ts.
  3. Manual: send a PDF via chat.send to a vision model and confirm the agent sees [media attached: media/inbound/report.pdf] in the prompt body.
  4. Manual: send the same PDF via agent.run and confirm the client receives a 4xx INVALID_REQUEST carrying text-only / unsupported-non-image in the message.

Expected

  • PDF / docx / xlsx / zip reach the agent via chat.send workspace staging.
  • agent.run rejects non-image attachments with a structured 4xx rather than silently dropping them.

Actual

  • Matches expected across scoped tests.

Evidence

  • Failing test/log before + passing after

Three specs updated to lock the new contract:

  • chat-attachments.test.ts — offload on PDF/docx/xlsx/zip/octet-stream, OOXML sniff recovery, UnsupportedAttachmentError on empty payload / text-only image / non-image when acceptNonImage=false, and no saveMediaBuffer call on refusal.
  • chat.directive-tags.test.ts — text-only model rejects images with a 4xx, agent-scoped text-only session same behavior, chat.send routes non-image refs into ctx.MediaPaths/MediaTypes and sets MediaStaged=true with no media:// leak in Body/BodyForAgent.
  • server-node-events.test.ts — renamed supportsImages -> supportsInlineImages; UnsupportedAttachmentError from parse causes log-and-return (no agentCommand dispatch).

Human Verification (required)

  • Verified scenarios: pnpm check and the three affected specs local-pass with the pre-commit hook. Parse-side offload + error classification exercised directly via specs.
  • Edge cases checked: OOXML sniffed as application/zip still ends up with the caller-declared specific MIME; sniff absent + provided MIME absent falls back to application/octet-stream; empty payload throws empty-payload before any disk write; partial-stage in prestageNonImageOffloads triggers full media-store cleanup of every ref (image + non-image) before the 5xx.
  • What you did not verify: end-to-end chat.send from a real desktop client against a real vision model was not run in this branch; sandbox-disabled runtime path (ensureSandboxWorkspaceForSession returns null and we hand absolute paths through) was not exercised live.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Partial.
    • Additive on chat.send: previously-rejected non-image attachments now work.
    • Behavior change: silent drops on text-only-model image attachments and on non-image agent.run attachments become explicit 4xx errors. Clients that ignored the old warning and expected a silent success will now see a INVALID_REQUEST. This is intentional — silent drops were the reported bug.
  • Config/env changes? No
  • Migration needed? No — server-side only. Clients that want to attach non-image files to agent.run should migrate to chat.send until agent.run wires ctx.MediaPaths.

Risks and Mitigations

  • Risk: prestageNonImageOffloads uses !path.isAbsolute(p) to infer that stageSandboxMedia actually rewrote every path. If the stager's output format changes in the future, this check would misclassify a success as a 5xx.
    • Mitigation: contained to one helper; structured return from the stager is a follow-up refactor and documented in the code comment above the guard.
  • Risk: MediaStaged is a flag-based handshake between chat.ts and get-reply.ts; future callers that forget to set it would double-stage.
    • Mitigation: only one new caller introduced in this PR; a later change should make stageSandboxMedia idempotent (skip when paths are already workspace-relative) and drop the flag.
  • Risk: agent.run now returns 4xx instead of a silent no-op for non-image attachments.
    • Mitigation: clear reason code (unsupported-non-image) in the error; user-visible changelog entry flags the new refusal; follow-up ticket tracks wiring ctx.MediaPaths for agent.run parity.

@openclaw-barnacle openclaw-barnacle Bot added app: web-ui App: web-ui gateway Gateway runtime size: L labels Apr 16, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7339ea2338

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/server-methods/chat.ts
Comment thread src/gateway/server-methods/chat.ts Outdated
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 16, 2026

Greptile Summary

This PR adds non-image attachment support to chat.send by extending parseMessageWithAttachments to accept any MIME type and offloading all attachments above the 2 MB threshold via saveMediaBuffer. A new synchronous prestageNonImageOffloads helper in chat.ts stages non-image files into the agent sandbox before respond() is called, converting infrastructure errors to MediaOffloadError (5xx) so clients can retry. Two previous P1 issues — infrastructure-error misclassification and orphaned media-store files on staging failure — are directly addressed by the new prestageNonImageOffloads try/catch and confirmed by the new tests at lines 1370 and 1429 in chat.directive-tags.test.ts.

Confidence Score: 5/5

Safe to merge; previous P1 findings are addressed and well-tested, no new P0/P1 issues found.

All previously flagged P1 issues (infrastructure-error cleanup, 5xx misclassification) are resolved in prestageNonImageOffloads and covered by new integration tests. MIME resolution, empty-payload handling, OOXML sniff recovery, partial-staging detection, and sandbox-oversize rejection all have direct test coverage. No new P0 or P1 issues identified.

No files require special attention.

Reviews (3): Last reviewed commit: "fix(gateway): reconcile attachment offlo..." | Re-trigger Greptile

Comment thread src/gateway/chat-attachments.ts Outdated
Comment thread src/gateway/server-methods/chat.ts Outdated
@samzong samzong force-pushed the feat/gateway-attachment-offload-all-mimes branch 2 times, most recently from 12a0dd8 to 7cc0aa4 Compare April 16, 2026 08:26
@samzong
Copy link
Copy Markdown
Contributor Author

samzong commented Apr 16, 2026

@codex review

@samzong
Copy link
Copy Markdown
Contributor Author

samzong commented Apr 16, 2026

@greptileai review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7cc0aa4c0f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/server-methods/chat.ts Outdated
@samzong
Copy link
Copy Markdown
Contributor Author

samzong commented Apr 16, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ff4ac04462

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/chat-attachments.ts
Comment thread src/gateway/chat-attachments.ts Outdated
@samzong
Copy link
Copy Markdown
Contributor Author

samzong commented Apr 16, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d72c489e45

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/server-methods/chat.ts Outdated
@samzong samzong force-pushed the feat/gateway-attachment-offload-all-mimes branch from d72c489 to 7274d91 Compare April 16, 2026 14:54
@samzong
Copy link
Copy Markdown
Contributor Author

samzong commented Apr 16, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7274d9145f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/server-methods/chat.ts
@samzong
Copy link
Copy Markdown
Contributor Author

samzong commented Apr 16, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 27, 2026

Codex automated review: keeping this open.

Keep open. This is an active contributor implementation candidate paired with open #48123, and current main only partially overlaps it: non-image chat.send attachments are saved as media refs and transcript media, but ordinary dispatch still does not populate the ctx.MediaPaths/MediaTypes workspace-staging contract this PR is trying to add. The latest PR discussion also has an unresolved P1 review on MIME precedence that should be addressed before merge.

Best possible solution:

Keep this PR open as the implementation candidate for #48123. The best maintainer path is to review or revise it against current main, reconcile the shipped media://inbound behavior with the proposed ctx.MediaPaths/MediaTypes workspace-staging contract, and address the latest P1 MIME-precedence review before deciding whether to merge, split, or supersede the paired issue and PR together.

What I checked:

  • paired_open_work: Provided GitHub context shows this PR is authored by samzong and uses closing syntax for open Gateway silently drops non-image attachments in chat.send #48123, also authored by samzong. Cleanup rules keep paired contributor issue/PR work open until the pair is resolved or maintainers explicitly split it. (048dffda6cd2)
  • current_main_partial_overlap: Current main accepts non-image attachments by saving them as media://inbound refs, appending a marker to the message, recording offloadedRefs, and adding an offloaded imageOrder slot. This overlaps the PR but is not the same workspace-staging contract. (src/gateway/chat-attachments.ts:368, 0835f9409aac)
  • ordinary_chat_send_lacks_media_context: Ordinary chat.send builds MsgContext with body/session/origin fields and only spreads pluginBoundMediaFields for the plugin-bound special case; it does not set MediaPath, MediaPaths, MediaTypes, MediaWorkspaceDir, or MediaStaged for ordinary non-image attachments. (src/gateway/server-methods/chat.ts:1926, 0835f9409aac)
  • downstream_staging_requires_media_context: The reply pipeline stages sandbox media only when hasInboundMedia(ctx) is true, and hasInboundMedia checks MediaPath, MediaUrl, MediaPaths, MediaUrls, MediaTypes, and sticker fields. A Body-only media:// marker does not trigger this path. (src/auto-reply/reply/inbound-media.ts:17, 0835f9409aac)
  • current_tests_lock_partial_behavior: The current regression test expects no dispatch images, imageOrder ['offloaded'], a Body media://inbound marker, and transcript MediaPaths/MediaTypes. It does not assert ctx.MediaPaths/MediaTypes on the live dispatch context for ordinary chat.send. (src/gateway/server-methods/chat.directive-tags.test.ts:1922, 0835f9409aac)
  • security_review_pass: The provided PR file list is limited to gateway, auto-reply, media-understanding source and tests, with no workflow, lockfile, package script, dependency source, release, or publishing metadata changes. The security-sensitive surface is the intended data-access change: persisting arbitrary non-image user payloads and staging them into the agent workspace. (048dffda6cd2)

Remaining risk / open question:

  • The PR intentionally broadens data access by persisting arbitrary non-image user payloads and staging them into the agent workspace; even with existing media-store and sandbox guards, this needs maintainer security review.
  • Current main already shipped overlapping WebChat/media-ref behavior, so a rebase must reconcile the shipped media://inbound path with this PR's ctx.MediaPaths workspace-staging path.
  • The latest PR review identifies a possible MIME-spoofing path that can preserve silent-drop behavior for mislabeled non-image payloads.

Codex Review notes: model gpt-5.5, reasoning high; reviewed against 0835f9409aac.

@samzong samzong force-pushed the feat/gateway-attachment-offload-all-mimes branch from f948e41 to 048dffd Compare April 27, 2026 17:29
@samzong
Copy link
Copy Markdown
Contributor Author

samzong commented Apr 28, 2026

@clawsweeper review
@codex review
@greptileai review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 048dffda6c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/chat-attachments.ts
@frankekn
Copy link
Copy Markdown
Member

@codex review

@frankekn frankekn force-pushed the feat/gateway-attachment-offload-all-mimes branch from 86a0451 to 0dd77f0 Compare April 28, 2026 03:36
@frankekn
Copy link
Copy Markdown
Member

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@frankekn frankekn force-pushed the feat/gateway-attachment-offload-all-mimes branch 3 times, most recently from ad30bd2 to cb8d7e0 Compare April 28, 2026 04:52
samzong and others added 9 commits April 28, 2026 12:52
Signed-off-by: samzong <samzong.lu@gmail.com>
Signed-off-by: samzong <samzong.lu@gmail.com>
…text

Root cause: ctx.MediaPaths was overloaded with two incompatible meanings —
sandbox-relative for the agent runtime, host-absolute for host-side
media-understanding. The previous "absolutize in chat.send + set
MediaStaged=true" path made media-understanding work but shipped an
unreadable host path to the agent inside the sandbox.

- Keep ctx.MediaPaths sandbox-relative after prestage; carry a separate
  ctx.MediaWorkspaceDir so host-side media-understanding can still resolve
  the staged files via localPathRoots / attachment cache.
- stageSandboxMedia returns an authoritative {source -> relpath} map so
  prestageNonImageOffloads detects partial staging failures (files admitted
  by the 20MB RPC cap but rejected by the 5MB staging cap) and surfaces
  them as 5xx MediaOffloadError UNAVAILABLE.
- Reject images above MAX_IMAGE_BYTES at parse time: the agent-side
  hydration path drops them silently otherwise, producing a successful
  response with a missing image.
- Scope imageOrder to image offloads only and split persistChatSendImages
  offloaded refs by mime so non-image files append to the transcript tail
  instead of consuming image slots in mixed batches.

Signed-off-by: samzong <samzong.lu@gmail.com>
@frankekn frankekn force-pushed the feat/gateway-attachment-offload-all-mimes branch from cb8d7e0 to ecbd27f Compare April 28, 2026 04:54
@frankekn frankekn merged commit 25ef9c0 into openclaw:main Apr 28, 2026
10 checks passed
@frankekn
Copy link
Copy Markdown
Member

Merged via squash.

Thanks @samzong!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app: web-ui App: web-ui gateway Gateway runtime size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gateway silently drops non-image attachments in chat.send

2 participants