[Feat] Gateway: offload non-image attachments on chat.send#67572
[Feat] Gateway: offload non-image attachments on chat.send#67572frankekn merged 9 commits intoopenclaw:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7339ea2338
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Greptile SummaryThis PR adds non-image attachment support to Confidence Score: 5/5Safe to merge; previous P1 findings are addressed and well-tested, no new P0/P1 issues found. All previously flagged P1 issues (infrastructure-error cleanup, 5xx misclassification) are resolved in No files require special attention. Reviews (3): Last reviewed commit: "fix(gateway): reconcile attachment offlo..." | Re-trigger Greptile |
12a0dd8 to
7cc0aa4
Compare
|
@codex review |
|
@greptileai review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7cc0aa4c0f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ff4ac04462
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d72c489e45
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
d72c489 to
7274d91
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7274d9145f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
|
Codex Review: Didn't find any major issues. Chef's kiss. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
Codex automated review: keeping this open. Keep open. This is an active contributor implementation candidate paired with open #48123, and current main only partially overlaps it: non-image chat.send attachments are saved as media refs and transcript media, but ordinary dispatch still does not populate the ctx.MediaPaths/MediaTypes workspace-staging contract this PR is trying to add. The latest PR discussion also has an unresolved P1 review on MIME precedence that should be addressed before merge. Best possible solution: Keep this PR open as the implementation candidate for #48123. The best maintainer path is to review or revise it against current main, reconcile the shipped media://inbound behavior with the proposed ctx.MediaPaths/MediaTypes workspace-staging contract, and address the latest P1 MIME-precedence review before deciding whether to merge, split, or supersede the paired issue and PR together. What I checked:
Remaining risk / open question:
Codex Review notes: model gpt-5.5, reasoning high; reviewed against 0835f9409aac. |
f948e41 to
048dffd
Compare
|
@clawsweeper review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 048dffda6c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
86a0451 to
0dd77f0
Compare
|
@codex review |
|
Codex Review: Didn't find any major issues. Hooray! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
ad30bd2 to
cb8d7e0
Compare
Signed-off-by: samzong <samzong.lu@gmail.com>
Signed-off-by: samzong <samzong.lu@gmail.com>
…text
Root cause: ctx.MediaPaths was overloaded with two incompatible meanings —
sandbox-relative for the agent runtime, host-absolute for host-side
media-understanding. The previous "absolutize in chat.send + set
MediaStaged=true" path made media-understanding work but shipped an
unreadable host path to the agent inside the sandbox.
- Keep ctx.MediaPaths sandbox-relative after prestage; carry a separate
ctx.MediaWorkspaceDir so host-side media-understanding can still resolve
the staged files via localPathRoots / attachment cache.
- stageSandboxMedia returns an authoritative {source -> relpath} map so
prestageNonImageOffloads detects partial staging failures (files admitted
by the 20MB RPC cap but rejected by the 5MB staging cap) and surfaces
them as 5xx MediaOffloadError UNAVAILABLE.
- Reject images above MAX_IMAGE_BYTES at parse time: the agent-side
hydration path drops them silently otherwise, producing a successful
response with a missing image.
- Scope imageOrder to image offloads only and split persistChatSendImages
offloaded refs by mime so non-image files append to the transcript tail
instead of consuming image slots in mixed batches.
Signed-off-by: samzong <samzong.lu@gmail.com>
cb8d7e0 to
ecbd27f
Compare
|
Merged via squash.
Thanks @samzong! |
Summary
chat.send,agent.run,server-node-events) dropped every non-image payload silently with a warn-log, even though channel inbound paths already deliver any MIME to the agent viactx.MediaPaths. Large image MIMEs outside a narrow JPEG/PNG/WebP/GIF/HEIC/HEIF allowlist also threw mid-parse.parseMessageWithAttachmentsnow accepts any MIME, offloads everything over the 2 MB threshold to the media store (every MIME, not just the image allowlist), and returns structuredoffloadedRefsfor both images and non-images.chat.sendroutes non-image offloads intoctx.MediaPaths+ctx.MediaTypesso the existing channel staging pipeline reaches them. Silent drops become explicitUnsupportedAttachmentError(4xx) with closed reason unionempty-payload | text-only-image | unsupported-non-image; storage faults stay classified asMediaOffloadError(5xx).agent.runand the node-events path keepacceptNonImage: falsebecause they do not wirectx.MediaPathsyet; inline image behavior for images under the offload threshold is unchanged; media-store layout,media://inbound/<id>URI scheme, and OOM threshold are unchanged.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Root Cause (if applicable)
N/A — feature work, not a bug fix.
Regression Test Plan (if applicable)
src/gateway/chat-attachments.test.ts,src/gateway/server-methods/chat.directive-tags.test.ts,src/gateway/server-node-events.test.tsoffloadedRefs(PDF, docx with generic-zip sniff, xlsx via filename extension, zip as itself, opaque octet-stream).supportsInlineImages: falserejects images withtext-only-imagereason but still offloads non-images so text-only models can read them via Read/Bash.acceptNonImage: falserejects non-images withunsupported-non-imagereason (agent.run + node-events).chat.sendroutes non-imageoffloadedRefsintoctx.MediaPaths/MediaTypesand setsctx.MediaStaged=trueso the dispatch pipeline skips its ownstageSandboxMediapass.application/zip.User-visible / Behavior Changes
chat.sendnow accepts non-image attachments (PDF, docx, zip, etc.). They are saved to~/.openclaw/media/inbound/and staged into the agent workspace so the agent can Read them.agent.runand node-events RPC still refuse non-image attachments, but with an explicit 4xxUnsupportedAttachmentError(reason: "unsupported-non-image")instead of silently dropping them.text-only-imagerefusal when the client sends an image attachment, instead of silent drop.Diagram (if applicable)
Security Impact (required)
chat.sendnow persists arbitrary user-supplied MIMEs into~/.openclaw/media/inbound/<uuid>, then into the agent sandbox workspace. Mitigation: (1) existingmaxBytescap (5 MB) is unchanged; (2)saveMediaBufferassigns a UUID filename so the original filename is untrusted input; (3)assertSavedMediaverifies the returned ID contains no path separators or null bytes before use; (4)stageSandboxMediavalidates the source path throughisAllowedSourcePathandassertSandboxPathbefore copying.Repro + Verification
Environment
openclaw gateway runloopback on 18789input: ["text","image"]); any text-only modelchat.send,agent.run)Steps
pnpm check(format+lint+tsgo+boundary scripts).pnpm test src/gateway/chat-attachments.test.ts src/gateway/server-methods/chat.directive-tags.test.ts src/gateway/server-node-events.test.ts.chat.sendto a vision model and confirm the agent sees[media attached: media/inbound/report.pdf]in the prompt body.agent.runand confirm the client receives a 4xxINVALID_REQUESTcarryingtext-only/unsupported-non-imagein the message.Expected
chat.sendworkspace staging.agent.runrejects non-image attachments with a structured 4xx rather than silently dropping them.Actual
Evidence
Three specs updated to lock the new contract:
chat-attachments.test.ts— offload on PDF/docx/xlsx/zip/octet-stream, OOXML sniff recovery,UnsupportedAttachmentErroron empty payload / text-only image / non-image whenacceptNonImage=false, and nosaveMediaBuffercall on refusal.chat.directive-tags.test.ts— text-only model rejects images with a 4xx, agent-scoped text-only session same behavior,chat.sendroutes non-image refs intoctx.MediaPaths/MediaTypesand setsMediaStaged=truewith nomedia://leak inBody/BodyForAgent.server-node-events.test.ts— renamedsupportsImages->supportsInlineImages;UnsupportedAttachmentErrorfrom parse causes log-and-return (noagentCommanddispatch).Human Verification (required)
pnpm checkand the three affected specs local-pass with the pre-commit hook. Parse-side offload + error classification exercised directly via specs.application/zipstill ends up with the caller-declared specific MIME; sniff absent + provided MIME absent falls back toapplication/octet-stream; empty payload throwsempty-payloadbefore any disk write; partial-stage inprestageNonImageOffloadstriggers full media-store cleanup of every ref (image + non-image) before the 5xx.chat.sendfrom a real desktop client against a real vision model was not run in this branch; sandbox-disabled runtime path (ensureSandboxWorkspaceForSessionreturns null and we hand absolute paths through) was not exercised live.Review Conversations
Compatibility / Migration
chat.send: previously-rejected non-image attachments now work.agent.runattachments become explicit 4xx errors. Clients that ignored the old warning and expected a silent success will now see aINVALID_REQUEST. This is intentional — silent drops were the reported bug.agent.runshould migrate tochat.senduntilagent.runwiresctx.MediaPaths.Risks and Mitigations
prestageNonImageOffloadsuses!path.isAbsolute(p)to infer thatstageSandboxMediaactually rewrote every path. If the stager's output format changes in the future, this check would misclassify a success as a 5xx.MediaStagedis a flag-based handshake betweenchat.tsandget-reply.ts; future callers that forget to set it would double-stage.stageSandboxMediaidempotent (skip when paths are already workspace-relative) and drop the flag.agent.runnow returns 4xx instead of a silent no-op for non-image attachments.unsupported-non-image) in the error; user-visible changelog entry flags the new refusal; follow-up ticket tracks wiringctx.MediaPathsforagent.runparity.