Skip to content

feat(gateway): feishu image and text file attachment support#731

Merged
thepagent merged 3 commits intoopenabdev:mainfrom
wangyuyan-agent:feat/gateway-feishu-media
May 5, 2026
Merged

feat(gateway): feishu image and text file attachment support#731
thepagent merged 3 commits intoopenabdev:mainfrom
wangyuyan-agent:feat/gateway-feishu-media

Conversation

@wangyuyan-agent
Copy link
Copy Markdown
Contributor

@wangyuyan-agent wangyuyan-agent commented May 4, 2026

Summary

Adds image and text file attachment support for the Feishu gateway adapter. Images are downloaded, resized, compressed, and forwarded to the AI agent as ContentBlock::Image. Text files are downloaded and forwarded as ContentBlock::Text. This brings Feishu to feature parity with Discord's attachment handling.

Feishu user sends image
  │
  ├─ Gateway receives event (msg_type: image/post)
  │   parse_message_event extracts image_key → MediaRef
  │
  ├─ Gateway downloads image
  │   GET /im/v1/messages/{id}/resources/{key}?type=image
  │   → resize (max 1200px) → JPEG compress → base64
  │
  ├─ GatewayEvent.content.attachments = [{type:"image", data:"base64..."}]
  │   → sent over WebSocket to OAB core
  │
  └─ Core decodes attachment → ContentBlock::Image → extra_blocks
      → forwarded to AI agent (vision-capable models see the image)

Changes

File Layer Description
gateway/src/schema.rs Gateway Content gains attachments: Vec<Attachment>. New Attachment struct with type/filename/mime_type/data/size. Backward compatible via #[serde(default)].
gateway/Cargo.toml Gateway Add image crate for resize/compress
gateway/src/adapters/feishu.rs Gateway resize_and_compress(): 1200px max, JPEG quality 75. download_feishu_image(): resources API + compress + base64. download_feishu_file(): text files only (512KB cap). parse_message_event() returns (GatewayEvent, Vec<MediaRef>), accepts text/image/file/post types. Callers (WS + webhook) do async download after parse. Empty text + empty attachments → event not sent.
gateway/src/main.rs Gateway Updated test fixture for new Content.attachments field
src/gateway.rs ⚠️ OAB core Core Deserialize attachments from GatewayEvent. Convert imageContentBlock::Image, text_file → base64 decode → ContentBlock::Text wrapped in code fence. Pass as extra_blocks to handle_message().
docs/feishu.md Docs New "Image & File Attachments" section

⚠️ Core change note: ~20 lines in src/gateway.rs — adds GwAttachment struct + attachment→ContentBlock conversion loop before handle_message(). No changes to handle_message itself.

Design decisions

  1. Gateway-side download — Feishu attachments require tenant_access_token (gateway has it, core doesn't). Gateway downloads, compresses, and base64 encodes. Core just decodes. Same principle as Discord/Slack (whoever holds the auth token does the download).

  2. Compress before transmitresize_and_compress (1200px, JPEG 75) reduces typical images from 2-5MB to 200-400KB. Base64 overhead (~33%) is negligible at this size. No WebSocket pressure.

  3. post type support — Feishu sends @mention + pasted image as msg_type: "post" (rich text). Parser extracts text nodes as prompt and img nodes as image attachments. This is the only way to send @mention + image in a group chat.

  4. Text files only for file type — Only known text extensions (.txt, .py, .rs, .md, .json, etc.) are downloaded, capped at 512KB. Binary files (.pdf, .zip) are silently ignored to avoid sending garbage to the model.

  5. Graceful degradation — If image download fails, text portion is still forwarded. If both text and attachments are empty (e.g. unsupported file type), event is not sent.

  6. Schema backward compatibleattachments uses #[serde(default)]. Old gateway (no attachments) works with new core. New gateway works with old core (attachments ignored).

Known limitations

  • Group chat: image upload cannot include @mention. Feishu's image upload UI does not allow simultaneous @mention. Workaround: @mention first, then paste (Ctrl+V) the image — Feishu sends this as a post message containing both.
  • Binary files ignored. PDF, ZIP, DOCX etc. are silently dropped. Future work could add PDF text extraction.
  • No outbound image support. Bot cannot send images back to Feishu yet (text/post only).

Testing

Scenario Result
Private chat: send image → agent describes image PASS
Private chat: send .txt file → agent reads content PASS
Private chat: send .pdf → silently ignored PASS
Private chat: text + image separately → both work PASS
Group: @bot + paste image (post format) → agent sees image PASS
Group: upload image (no @mention possible) → known limitation PASS (documented)
Private chat: image again → stable PASS
cargo test gateway — 96 passed PASS
cargo test core — 197 passed PASS

End-to-end tested on Feishu with vision-capable model.

Breaking Changes

None. attachments field is additive with #[serde(default)]. Existing text-only messages are unaffected.

Prior Art

OpenClaw Hermes Agent OAB Discord OAB Gateway (this PR)
Inbound image Outbound only (skills) ✅ Gateway-level download ✅ download + resize + base64 ✅ download + resize + base64
Inbound text file Not documented ✅ Gateway-level download ✅ download + inline (5 files, 1MB cap) ✅ download + inline (512KB cap)
Image compression N/A Not documented resize 1200px, JPEG 75 resize 1200px, JPEG 75 (same)
Download API /im/v1/messages/{id}/resources/{key} /im/v1/messages/{id}/resources/{key} Direct URL + Bearer token /im/v1/messages/{id}/resources/{key}
Mixed @mention + image Not documented Not documented N/A (Discord allows both) ✅ Handled via post msg_type parsing
Binary files (.pdf, .zip) Not documented Not documented Skipped Skipped

Discord Discussion URL

https://discord.com/channels/1491295327620169908/1500160821567684660

- Gateway downloads images via /im/v1/messages/{id}/resources/{key}?type=image
- resize_and_compress: max 1200px, JPEG quality 75, GIF pass-through
- Text files: whitelist extensions, 512KB cap, base64 encoded
- parse_message_event supports text/image/file/post message types
- post type: extracts text + img nodes (for @mention + paste image)
- GatewayEvent.content.attachments: backward compatible via serde(default)
- Core: decode attachments to ContentBlock::Image / ContentBlock::Text
- Empty text + empty attachments events are not forwarded
- Updated docs/feishu.md with Image & File Attachments section
@wangyuyan-agent wangyuyan-agent requested a review from thepagent as a code owner May 4, 2026 14:39
@github-actions github-actions Bot added pending-screening PR awaiting automated screening closing-soon PR missing Discord Discussion URL — will auto-close in 3 days and removed closing-soon PR missing Discord Discussion URL — will auto-close in 3 days labels May 4, 2026
@shaun-agent
Copy link
Copy Markdown
Contributor

OpenAB PR Screening

This is auto-generated by the OpenAB project-screening flow for context collection and reviewer handoff.
Click 👍 if you find this useful. Human review will be done within 24 hours. We appreciate your support and contribution 🙏

Screening report ## Intent

PR #731 adds Feishu inbound attachment handling so Feishu users can send images and supported text files to OpenAB agents. The operator-visible problem is that Feishu currently lags Discord attachment behavior: image/file messages either cannot reach the model as usable content or require separate manual workarounds.

The PR also handles Feishu post messages, which matters because group-chat image + mention workflows arrive as rich text rather than plain image messages.

Feat

Feature: Feishu gateway support for inbound image and text file attachments.

Behavioral changes:

  • Feishu image resources are downloaded by the gateway, resized, JPEG-compressed, base64 encoded, and sent as gateway attachments.
  • Supported text files are downloaded with a 512KB cap and forwarded as text content blocks.
  • Feishu post messages are parsed for both text nodes and image nodes.
  • Core deserializes gateway attachments and converts them into agent ContentBlocks.
  • Unsupported binary files are ignored.
  • Attachment failures degrade gracefully so text can still be processed.

Who It Serves

Primary beneficiaries:

  • Feishu end users who want to send screenshots, photos, pasted images, or text files to agents.
  • Deployers operating OpenAB in Feishu-heavy environments.
  • Maintainers seeking feature parity between Feishu and Discord gateway behavior.

Rewritten Prompt

Implement inbound attachment support for the Feishu gateway.

Add an additive attachments field to gateway message content with backward-compatible serde defaults. In the Feishu adapter, parse text, image, file, and post message events, extracting image resource keys and supported text file references. Download attachments using Feishu APIs from the gateway, since the gateway owns the Feishu tenant token.

For images, resize to a maximum dimension of 1200px, JPEG-compress at quality 75, base64 encode, and forward as image attachments. For text files, only accept known text extensions, enforce a 512KB limit, base64 encode, and forward as text-file attachments. If attachment download fails, preserve any usable text content. If a message has neither text nor valid attachments, do not send an event.

In core gateway handling, deserialize attachments and convert them into model content blocks without changing the handle_message API. Add or update tests covering plain text, image, text file, unsupported binary file, Feishu post with text + image, and backward compatibility for events without attachments. Update Feishu docs with supported behavior and limitations.

Merge Pitch

This is worth advancing because it closes a real Feishu usability gap and brings the gateway closer to Discord parity. Screenshots and pasted images are common agent inputs, especially in chatops and support workflows.

Risk profile is moderate. The main risk is not the schema addition, which is backward-compatible, but gateway responsibility expanding into media download, image processing, size control, and error handling. Reviewers will likely focus on resource limits, dependency impact from the image crate, Feishu API correctness, and whether the core attachment conversion is generic enough for future gateways.

Best-Practice Comparison

Relevant OpenClaw principles:

  • Explicit delivery routing is relevant: the gateway owns Feishu auth and should be responsible for downloading Feishu-protected resources before sending normalized content onward.
  • Isolated executions are partly relevant: media processing should be bounded and failure-tolerant so one bad attachment does not break the message path.
  • Run logs and retry/backoff are only partially relevant: attachment download failures should be observable, but this PR does not need durable job scheduling.
  • Gateway-owned scheduling and durable job persistence are not directly relevant because this is synchronous inbound message enrichment, not a scheduled job system.

Relevant Hermes Agent principles:

  • Gateway daemon ownership is relevant: Feishu-specific media retrieval belongs in the gateway layer.
  • Atomic persisted state and file locking are not relevant unless attachments are later persisted to disk.
  • Fresh session per scheduled run is not relevant.
  • Self-contained prompts are indirectly relevant: text files should be included in a clear, bounded content block so the model receives usable context without relying on side channels.

Overall, the PR follows the strongest relevant principle from both systems: platform-specific gateways should normalize platform-specific inputs before handing them to core agent execution.

Implementation Options

Option 1: Conservative gateway-only image support
Support only Feishu images, not files, and only for simple image or post messages. Keep the schema additive and core conversion minimal. This ships the highest-value user workflow with less parsing and fewer file safety concerns.

Option 2: Balanced attachment parity
Proceed with the current design: image support, text-file support, post parsing, bounded compression, size caps, graceful degradation, and core conversion into content blocks. This matches Discord behavior closely while keeping binary files out of scope.

Option 3: Generic cross-gateway attachment pipeline
Introduce a shared attachment normalization layer used by Discord, Slack, Feishu, and future gateways. Define common attachment structs, size policies, MIME validation, compression utilities, observability, and test fixtures across adapters.

Option 4: Durable media ingestion service
Move downloads and processing into a queued gateway-side ingestion path with retry/backoff, logs, persisted metadata, and async delivery once processing completes. This is more aligned with durable job principles but likely too heavy for this PR.

Comparison Table

Option Speed to ship Complexity Reliability Maintainability User impact Fit for OpenAB right now
Conservative image-only High Low Medium Medium Medium Good if reviewers want reduced scope
Balanced attachment parity Medium Medium Good Good High Best fit
Generic cross-gateway pipeline Low High Good High long-term High Better as follow-up
Durable media ingestion service Low Very high Very high Medium Medium Too large for this PR

Recommendation

Advance the balanced attachment parity path, with careful review around limits, error logging, MIME/extension checks, and dependency impact.

The current PR appears scoped well for merge discussion because it solves a concrete Feishu gap without requiring a broader gateway redesign. Any generic cross-gateway attachment abstraction should be split into follow-up work after Feishu behavior is proven and reviewed against Discord’s existing implementation.

@chaodu-agent

This comment has been minimized.

Add pre-download size check via Content-Length header in both
download_feishu_image and download_feishu_file to avoid buffering
oversized responses before rejection. Post-download fallback check
retained for cases where Content-Length is absent or misreported.
@wangyuyan-agent
Copy link
Copy Markdown
Contributor Author

Added Content-Length early gate to both download_feishu_image and download_feishu_file per @chaodu-agent's suggestion — rejects oversized downloads before buffering the full body. Post-download size check retained as fallback for cases where Content-Length is absent or misreported.

No behavior change from the user's perspective. cargo check passes.

@chaodu-agent

This comment has been minimized.

- GIF filename: use .gif extension when format is GIF (was always .jpg)
- WS path: align token error handling with webhook (if-let-Ok pattern)
- Post parser: explicit 'at' tag arm with comment (mentions via envelope)
@chaodu-agent
Copy link
Copy Markdown
Collaborator

LGTM ✅ — Well-structured feature addition bringing Feishu to parity with Discord attachment handling. All NITs addressed in bee3fa8. Ready for merge.

四問框架 Review

1. What problem does this solve?

Feishu users could only send text messages to the bot. Images, text files, and rich-text posts (with pasted images) were silently dropped (parse_message_event returned None for non-text msg_type). This PR brings Feishu to feature parity with Discord's attachment handling.

2. How does it solve it?

Architecture: Deferred download via MediaRef enum.

parse_message_event() → (GatewayEvent, Vec<MediaRef>)
                                          ↓
                         async download (token required)
                                          ↓
                         resize/compress → base64 → Attachment
                                          ↓
                         GatewayEvent.content.attachments[]
                                          ↓
                         Core: decode → ContentBlock::Image/Text

Key implementation choices:

  • Gateway-side download — Feishu API requires tenant_access_token (gateway has it, core doesn't)
  • Compress before transmit — 1200px max, JPEG 75 reduces 2-5MB → 200-400KB, no WebSocket pressure
  • post type parsing — Extracts text nodes + img nodes from rich text (the only way to @mention + image in groups)
  • Text file whitelist — Only known text extensions downloaded, 512KB cap, binary files silently skipped
  • Content-Length early gate — Rejects oversized responses before buffering

3. What was considered?

  • PR description includes thorough prior art comparison (OpenClaw, Hermes Agent, OAB Discord)
  • Known limitations documented: group chat image-only limitation, binary files ignored, no outbound image
  • Schema backward compatibility via #[serde(default)] — old gateway works with new core and vice versa
  • Graceful degradation: download failures don't block text delivery

4. Is this the best approach?

Yes. The design mirrors the existing Discord attachment pattern exactly. The MediaRef abstraction cleanly separates parsing (sync) from downloading (async, requires token).

🟢 INFO — Things done well
  1. Clean MediaRef abstraction — Separates "what to download" from "how to download", making parse_message_event testable without network calls
  2. Backward-compatible schema#[serde(default, skip_serializing_if = "Vec::is_empty")] means zero breaking changes
  3. Content-Length early gate — Avoids buffering 10MB+ responses before rejecting
  4. Comprehensive test updates — All existing tests updated for new return type (96 gateway + 197 core pass)
  5. Thorough PR description — Prior art table, architecture diagram, testing matrix, known limitations
🟡 NITs — All resolved in bee3fa8
  1. GIF filename extension mismatch → Fixed: uses .gif when format is GIF, .jpg otherwise
  2. WS path token error handling inconsistency → Fixed: aligned with webhook using if let Ok(token) pattern
  3. Post parser at tag implicit handling → Fixed: explicit Some("at") => {} arm with comment

@thepagent thepagent merged commit cebba71 into openabdev:main May 5, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pending-contributor pending-screening PR awaiting automated screening

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants