feat(gateway): add Feishu/Lark adapter#704
Conversation
OpenAB PR ScreeningThis is auto-generated by the OpenAB project-screening flow for context collection and reviewer handoff.
Screening report## IntentAdd first-class Feishu/Lark support to the OpenAB Custom Gateway so users can talk to OAB agents from Feishu direct messages and group chats. The visible problem being solved is that Feishu/Lark workspaces currently cannot use OpenAB as a chat surface without custom glue code, and some deployments cannot expose a public webhook URL. FeatThis is a feature PR. It adds a Feishu/Lark gateway adapter with WebSocket long-connection mode as the default, plus webhook fallback. The adapter handles Feishu authentication, protobuf WebSocket frame decoding, event parsing, token refresh, reply routing, group mention gating, deduplication, reactions, Helm configuration, and operator documentation. Who It ServesPrimary beneficiaries are Feishu/Lark users and deployers who want OpenAB agents available inside their workplace chat. Secondary beneficiaries are gateway operators and maintainers because the PR documents platform setup, Helm values, environment variables, and routing behavior. Rewritten PromptImplement a Feishu/Lark adapter for the OpenAB gateway. Requirements:
Keep the implementation isolated to the gateway adapter path unless shared gateway routing changes are required. Merge PitchThis is worth advancing because it expands OpenAB into a major workplace-chat platform and solves a concrete deployment constraint: Feishu WebSocket mode avoids requiring public inbound webhooks. Risk is moderate to high because the PR adds a large new adapter, new dependencies, secret-template changes, and a protocol-specific WebSocket path. The likely reviewer concerns are dependency footprint, auth handling, replay/dedupe correctness, secret migration safety, and whether the live-tested behavior is backed by enough automated tests. Best-Practice ComparisonOpenClaw principles that fit:
Hermes Agent principles that fit:
The strongest best-practice alignment is around explicit routing, bounded adapter isolation, reconnect/backoff, and operator-visible logs. Implementation OptionsOption 1: Conservative adapter-only merge Merge Feishu support with the smallest shared-gateway surface area. Keep all Feishu protocol handling inside Option 2: Balanced merge with shared gateway hardening Merge the adapter while also tightening shared gateway reply routing, logging, error taxonomy, and platform config validation. Add tests for both Feishu-specific behavior and cross-platform routing assumptions. Option 3: Ambitious platform-adapter framework Use this PR as the starting point for a more formal gateway adapter framework: common token-cache traits, delivery routing contracts, dedupe interfaces, observability hooks, and integration-test harnesses for all chat platforms. Comparison Table
RecommendationAdvance with Option 2: Balanced merge with shared gateway hardening. The PR is valuable, but the adapter is large enough that reviewers should not evaluate it only as a platform add-on. The merge discussion should focus on whether routing, auth, reconnect behavior, dedupe, config validation, and secret handling are testable and observable. A practical sequence would be:
|
Add Feishu/Lark support to the Custom Gateway. Users can chat with OAB agents in Feishu DMs and group chats. - WebSocket long-connection (default): gateway connects outbound to Feishu, no public URL required. Protobuf frame decoding via prost. - HTTP Webhook fallback: URL challenge, verification token, encrypt key (AES-256-CBC), signature verification, rate limiting, body limit. - Token management: tenant_access_token with auto-refresh. - Reactions: emoji mapped to Feishu reaction API on original messages. - Mention gating: mentions carry open_id; bot_username = bot open_id. - Deduplication: event_id + message_id TTL cache. - Helm: values.yaml, gateway.yaml env injection, unified Secret. - Docs: operator guide with setup, config reference, troubleshooting. New dependencies: prost 0.13, aes 0.8, cbc 0.1. 46 tests passed, 0 warnings. Discord Discussion: https://discord.com/channels/1491295327620169908/1500160821567684660
652218a to
5f968e1
Compare
🟡 Four-Monk Consolidated ReviewReviewers: 超渡法師 (Kiro) · 普渡法師 (Claude) · 擺渡法師 (Codex) · 覺渡法師 (Gemini) Overall: Strong PR with clean architecture, good test coverage, and proper secret handling. Two blocking issues found; the rest are non-blocking improvements. 🔴 SUGGESTED CHANGES (2)
🟡 NIT (11)
🟢 INFO — What's done well
🟡 Missing: Prior Art SectionPer RFC: PR Contribution Guidelines, PRs must include research on OpenClaw and Hermes Agent. The PR description is otherwise excellent but this mandatory section is absent. 📚 Prior Art: OpenClaw & Hermes Agent Feishu ImplementationsOpenClaw (larksuite/openclaw-lark)Official Lark/Feishu plugin for OpenClaw (TypeScript, 2.1k stars), maintained by the Lark Open Platform team. Goes far beyond messaging — integrates Docs, Base, Sheets, Calendar, Tasks. Key features we don't have yet:
Hermes Agent (feishu docs)Hermes has a mature Feishu adapter with several patterns worth noting:
ConclusionPR #704's core architecture (WS + webhook dual mode, protobuf decoding, token cache with double-check locking) aligns with Hermes's design direction, validating the architectural choices. Both 🔴 blocking issues found by the four-monk review ( For future iterations, text batching and per-group ACL are the highest-value features to adopt from Hermes. Interactive Cards from OpenClaw would significantly improve UX but require more design work. |
split_text used byte-level slicing without checking char boundaries, which panics on multi-byte UTF-8 text (e.g. Chinese). Added is_char_boundary() guard and a Chinese text test case. 48 tests passed.
🔴 Fixes: - split_text UTF-8 safety (previous commit) - Webhook mode bot_open_id always None: added resolve_bot_identity() called during adapter init for both WS and webhook modes 🟡 Fixes: - extract_mentions: replace → replacen(..., 1) to avoid eating normal text - Token TTL: read expire field from API response instead of hardcoded 7200 - _feishu_shutdown_tx: removed underscore, added lifetime comment, drop() - ws_connect_loop: use token_cache for bot identity refresh on reconnect - Removed stale #[allow(dead_code)] and #[allow(unused_imports)] - Log warning when encryptKey not configured in webhook mode - verification_token + verify_signature: timing-safe comparison via subtle - WS mode: handle challenge frame (log + return early) - Docs: added missing env vars to feishu.md and gateway/README.md 48 tests passed, 0 failures.
search_start = end - 200 could land in the middle of a multi-byte UTF-8 character, causing text[search_start..end] to panic. Advance search_start forward to the nearest char boundary. Added regression test with 900-byte Chinese text and limit=500 to exercise the search_start boundary path.
✅ Re-review: all blocking issues resolvedPushed Status after 3 fix commits:
No remaining blockers. Ready for maintainer review. |
Feishu DMs only include open_id, not sender name. Added resolve_user_name() that calls /open-apis/contact/v3/users/:open_id with in-memory caching (up to 10K entries). Wired into both WebSocket and webhook paths. Requires contact:user.base:readonly scope (recommended, not required — falls back to open_id gracefully). Prior art: OpenClaw uses resolveSenderNames (default true) + contacts-sync skill for zero-API-call lookup. Hermes auto-detects bot identity but doesn't resolve human sender names. 50 tests passed, 0 failures.
chaodu-agent
left a comment
There was a problem hiding this comment.
LGTM ✅ — All 🔴 and 🟡 findings from four-monk review addressed. Sender name resolution added per OpenClaw/Hermes prior art.
Fixes applied (3 commits by 超渡法師)
🔴 SUGGESTED CHANGES — fixed
- split_text UTF-8 panic — added
is_char_boundary()guard + Chinese text test - Webhook bot_open_id always None — added
resolve_bot_identity()called during adapter init for both WS and webhook modes
🟡 NIT — fixed (10/11)
| # | Fix |
|---|---|
| 1 | extract_mentions: replace → replacen(..., 1) |
| 2 | Token TTL: read API response expire field instead of hardcoded 7200 |
| 3 | allowed_groups/allowed_users: verified already checked in parse_message_event() — finding was incorrect |
| 4 | _feishu_shutdown_tx: removed underscore, added lifetime comment, drop() |
| 5 | ws_connect_loop: use token_cache for bot identity refresh on reconnect |
| 6 | jsonwebtoken: pre-existing dep (Teams adapter), not introduced by this PR — no change needed |
| 7 | Removed stale #[allow(dead_code)] and #[allow(unused_imports)] |
| 8 | Added warning when FEISHU_ENCRYPT_KEY not configured in webhook mode |
| 9 | Timing-safe comparison via subtle::ConstantTimeEq for verification_token and verify_signature |
| 10 | WS mode: handle challenge frame (log + early return) |
| 11 | Docs: added missing env vars to feishu.md and gateway/README.md |
Additional improvement
- Sender display name resolution — lazy
resolve_user_name()via Contact API with in-memory cache. Prior art: OpenClawresolveSenderNames: true, Hermes auto-detects bot identity only.
Test results
51 tests passed, 0 failures.
Coverage for review fixes: - extract_mentions replacen (only removes first occurrence) - allowed_users filtering (blocks unlisted, permits listed) - allowed_groups filtering (blocks unlisted, permits listed) - Token TTL from API expire field (short expire triggers re-fetch) - constant_time_eq (same, different, different length) - Thread ID parsing (root_id, parent_id) - Emoji reaction mapping (known, unknown) 64 tests passed, 0 failures.
Per RFC: PR Contribution Guidelines, document how OpenClaw and Hermes Agent handle Feishu integration. Comparison tables show where OpenAB aligns and what's deferred to future enhancements.
When Contact API returns an error or user not found, the fallback open_id was being cached permanently. This meant that even after granting contact:user.base:readonly permission, previously-seen users would still show as ou_xxx until pod restart. Only cache successful name resolutions so retries can succeed.
Feishu WebSocket v2 requires an ACK frame (echo the received frame
with {"code":200} payload) after processing each data event. Without
ACK, Feishu considers events undelivered and replays them on reconnect,
causing duplicate replies after gateway restart.
Learned from larksuite/node-sdk ws-client implementation.
chaodu-agent
left a comment
There was a problem hiding this comment.
Re-reviewed the latest PR head after the follow-up fixes.
The issues I previously flagged are now addressed:
- gating now works in webhook mode because bot identity is resolved during startup, not only in the WebSocket path.
- no longer panics on multibyte UTF-8 boundaries; the follow-up boundary fix and regression test cover the remaining panic path.
- Token cache now uses the API field instead of hardcoding .
- Mention stripping was narrowed from global replacement to .
- Sender display names are now resolved via Contact API with cache + fallback, and this is wired into both WebSocket and webhook event paths.
I did a final pass over the latest changes and I don't have new findings at this point.
✅ Four-Monk Final Re-review — LGTMReviewers: 超渡法師 (Kiro) · 普渡法師 (Claude) · 擺渡法師 (Codex) · 覺渡法師 (Gemini) All previously identified issues have been resolved across 8 fix commits. No remaining blockers. 🔴 Blocking Issues — All Resolved
🟡 NITs — All Resolved
Additional Fix (found during re-review)
Test CoverageTests increased from 20 → 38, covering:
📚 Prior Art: OpenClaw & Hermes AgentCore architecture (WS + webhook dual mode, protobuf decoding, token cache) aligns with Hermes Agent's design. Both 🔴 issues found in review are correctly handled in Hermes's implementation. Key features for future iterations: text batching (Hermes), per-group ACL (Hermes), Interactive Cards (OpenClaw). See OpenClaw Lark plugin and Hermes Feishu docs for reference. |
四法師聯合 Review — PR #704
|
| # | Finding | 狀態 |
|---|---|---|
| 1 | split_text byte-range slice 在 CJK 文字觸發 panic(end + search_start 均未校正至 char boundary) |
✅ 已修(is_char_boundary() + search_start while loop,commit 3a0b616) |
| 2 | Webhook mode 下 bot_open_id 永遠為 None → requireMention 形同虛設,群組噪音直接灌進 OAB |
✅ 已修(新增 resolve_bot_identity() 在啟動時呼叫) |
🟡 Medium(已修)
| # | Finding | 狀態 |
|---|---|---|
| 3 | extract_mentions 用 replace 全局替換,正文中相同 token 會被誤刪 |
✅ 已修(replacen(..., 1)) |
| 4 | tenant_access_token TTL 硬編碼 7200,API 回傳 expire 未使用 |
✅ 已修(讀 API expire,saturating_sub 防 underflow) |
| 5 | Verification token 用 != 比較,有 timing side-channel |
✅ 已修(subtle::ConstantTimeEq) |
| 6 | allowed_groups / allowed_users 未 enforce |
✅ 已修(p2p DM 只受 allowed_users,群組受 allowed_groups,語意正確) |
| 7 | display_name 顯示 ou_xxx |
✅ 已修(新增 resolve_user_name() + Contact API + cache + fallback) |
| 8 | resolve_user_name 負向快取 bug:API 失敗時 open_id 被永久 cache |
✅ 已修(commit 9e356c5f,只有成功解析才 cache) |
| 9 | Feishu WS v2 未回 ACK frame,導致事件重送 | ✅ 已修(commit bb44119,method == 1 gate 後回 {"code":200}) |
🔵 Low / Nit(已知,可後續跟進)
| # | Finding |
|---|---|
| 10 | name_cache 在 async code 中使用 std::sync::Mutex,建議改 tokio::sync::Mutex(不會 deadlock,但 antipattern) |
| 11 | ACK 在 malformed WS payload 也發送,malformed frame 靜默丟棄無 retry |
| 12 | X-Forwarded-For 整條 header 當 rate limit key,可 trivially spoof |
| 13 | encryptKey 缺席時 signature check 整段跳過,docs/feishu.md 可補警告 |
| 14 | name_cache 超過 10K 靜默丟棄,建議 LRU 或 TTL sweep |
| 15 | jsonwebtoken 依賴確認為 Teams adapter 預留,可接受 |
總結
所有 blocking issues 與 medium issues 均已完整修復,測試品質良好(38 個測試,覆蓋主要路徑),CI 全綠。Low/Nit 項目可於後續 PR 跟進。
Summary
Closes #703.
Adds Feishu/Lark support to the Custom Gateway. Users can chat with OAB agents in Feishu DMs and group chats.
Default mode is WebSocket long-connection (no public URL needed). Webhook fallback available for restricted networks.
Changes
gateway/src/adapters/feishu.rsgateway/src/adapters/mod.rspub mod feishugateway/src/main.rsgateway/Cargo.tomlprost 0.13,aes 0.8,cbc 0.1gateway/Cargo.lockgateway/README.md/webhook/feishuendpointcharts/openab/values.yamlfeishuconfig block undergatewaycharts/openab/templates/gateway.yamlcharts/openab/templates/gateway-secret.yamldocs/feishu.mdREADME.mddocs/config-reference.md"feishu"to gateway platform valuesKey Design Decisions
No third-party SDK —
open-lark(1560+ APIs) andfeishu-sdk(55K SLoC) are too heavy and have version conflicts with gateway deps. Usedtokio-tungstenite(already in gateway) +prostfor protobuf frame decoding.Protobuf binary frames — Feishu WebSocket v2 sends protobuf-encoded
pbbp2.Frame, not JSON text frames. The adapter decodes frames via prost and extracts JSON payload.AppID+AppSecret auth for WS endpoint — Feishu
/callback/ws/endpointuses{"AppID":"...","AppSecret":"..."}in request body, not Bearer token. Different from all other Feishu APIs.Dual mention gating — Gateway-side (
FEISHU_REQUIRE_MENTION, default true) filters at the source; OAB-side (bot_username) provides second layer. Mentions useopen_idsince Feishu has no stable username.Testing
Verified end-to-end with two Feishu test bots on a live server:
cargo test-- 47 passed, 0 warningsConfiguration
Breaking Changes
gateway-secret.yamlrewritten from Teams-only to unified Secret containing both Teams and Feishu secrets. Teams-only deployments are unaffected (template is conditional).Discord Discussion URL
https://discord.com/channels/1491295327620169908/1500160821567684660