Skip to content

fix(feishu): route /btw through out-of-band lanes#64324

Merged
ngutman merged 3 commits intomainfrom
fix/feishu-btw-out-of-band
Apr 10, 2026
Merged

fix(feishu): route /btw through out-of-band lanes#64324
ngutman merged 3 commits intomainfrom
fix/feishu-btw-out-of-band

Conversation

@ngutman
Copy link
Copy Markdown
Member

@ngutman ngutman commented Apr 10, 2026

Summary

  • Problem: Feishu serializes all same-chat inbound work through a single per-chat queue, so /btw waits behind an active normal reply instead of behaving like an out-of-band side question.
  • Why it matters: the BTW UX contract is that side questions should return immediately without waiting for the busy main session to finish.
  • What changed: replaced Feishu's chat-only serializer with a sequential-key serializer and routed /btw plus abort-style requests onto dedicated out-of-band keys.
  • What did NOT change (scope boundary): core BTW execution, session routing, and normal Feishu same-chat FIFO behavior all stay intact.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Root Cause (if applicable)

  • Root cause: Feishu keyed all inbound work for a chat onto one serializer lane in extensions/feishu/src/monitor.account.ts, so /btw never reached the existing fast inline-command path until the prior message finished.
  • Missing detection / guardrail: Feishu had no explicit out-of-band sequencing policy for /btw or /stop, unlike channels that already special-case those commands.
  • Contributing context (if known): the existing Feishu FIFO was intentionally added to prevent dropped same-chat messages, so the fix needs to preserve normal ordering while carving out only the truly out-of-band command lanes.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/feishu/src/sequential-key.test.ts, extensions/feishu/src/sequential-queue.test.ts
  • Scenario the test should lock in: normal Feishu traffic still serializes on one key, while /btw and /stop resolve to separate keys that can run concurrently with the busy main chat lane.
  • Why this is the smallest reliable guardrail: the regression lives in the Feishu ingress scheduler, so key classification plus per-key queue behavior is the narrowest layer that proves the intended bypass without re-testing core BTW logic.
  • Existing test that already covers this (if any): src/gateway/server.chat.gateway-server-chat.test.ts already covers core /btw side-result behavior outside Feishu.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

  • In Feishu, /btw ... no longer waits behind a busy normal message in the same chat.
  • In Feishu, /stop also bypasses the normal same-chat lane instead of waiting behind ordinary traffic.
  • Normal Feishu same-chat message ordering remains FIFO.

Diagram (if applicable)

Before:
normal message -> Feishu per-chat lane
/btw arrives   -> same Feishu per-chat lane -> waits for normal reply -> delayed side answer

After:
normal message -> Feishu base chat lane
/btw arrives   -> Feishu btw lane -> runs immediately -> side answer

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: local Node/pnpm repo workflow
  • Model/provider: N/A
  • Integration/channel (if any): Feishu
  • Relevant config (redacted): default local test config

Steps

  1. Send a normal Feishu message that starts a long-running reply.
  2. Send /btw <question> in the same Feishu chat while the first reply is still running.
  3. Observe whether the side question shares the busy normal chat lane or runs on its own out-of-band lane.

Expected

  • /btw runs immediately without waiting behind the normal chat lane.

Actual

  • Before this fix, Feishu queued /btw behind the active normal chat task.
  • After this fix, Feishu resolves /btw to a dedicated sequential key and no longer blocks it behind ordinary traffic.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: pnpm test extensions/feishu/src/sequential-key.test.ts extensions/feishu/src/sequential-queue.test.ts; pnpm build
  • Edge cases checked: /btw missing-message-id fallback key; /stop stays out-of-band; ordinary slash commands like /status remain on the normal chat lane.
  • What you did not verify: live Feishu end-to-end behavior against a real account.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: widening the bypass too far could accidentally let stateful slash commands skip intended same-chat ordering.
    • Mitigation: only /btw and abort-style requests get dedicated out-of-band keys; normal messages and ordinary control commands keep the base chat key.

@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot bot commented Apr 10, 2026

🔒 Aisle Security Analysis

We found 1 potential security issue(s) in this PR:

# Severity Title
1 🟠 High Feishu /btw and /stop bypass per-chat FIFO and debounce, enabling parallel expensive work (DoS/cost amplification)
1. 🟠 Feishu /btw and /stop bypass per-chat FIFO and debounce, enabling parallel expensive work (DoS/cost amplification)
Property Value
Severity High
CWE CWE-400
Location extensions/feishu/src/monitor.account.ts:381-416

Description

The Feishu message dispatcher now routes /btw and /stop onto separate sequential keys, allowing them to run concurrently with the normal per-chat lane.

This introduces a cost/DoS amplification vector:

  • Per-chat concurrency increases from a single FIFO lane to multiple lanes (base, :btw, :control).
  • Control commands bypass the inbound debouncer (shouldDebounce returns false for hasControlCommand(text, cfg)), so an attacker can spam /btw and create repeated expensive runs.
  • The queue implementation (createSequentialQueue) only serializes within a key, so alternating normal messages and /btw can keep two expensive handleFeishuMessage() executions in-flight for the same chat.

Vulnerable behavior (new routing):

// monitor.account.ts
const enqueue = createSequentialQueue();
...
const sequentialKey = getFeishuSequentialKey({ accountId, event, ... });
...
await enqueue(sequentialKey, task);
// sequential-key.ts
if (isAbortRequestText(text)) return `${baseKey}:control`;
if (isBtwRequestText(text)) return `${baseKey}:btw`;
return baseKey;

If handleFeishuMessage() triggers LLM/tool calls, an attacker can drive higher parallel usage and exhaust rate limits/quotas/CPU by rapid alternation (normal + /btw) or spamming control commands.

Recommendation

Add explicit throttling/backpressure so /btw//stop cannot increase concurrent expensive work per chat (or globally).

Options (pick one or combine):

  1. Keep a single per-chat semaphore across lanes (still allow priority, but not parallelism):
// Pseudocode
const perChatLimiter = new Map<string, Semaphore>(/* max=1 */);
await perChatLimiter.get(chatId).runExclusive(task);
  1. Bound control lane concurrency (e.g., allow :control to preempt but not overlap):
  • Cancel/abort the in-flight normal run instead of starting a second run.
  1. Rate-limit control commands per sender/chat (token bucket):
if (isBtwRequestText(text) && !btwLimiter.allow(chatId)) {
  return; // or reply with "Too many /btw requests"
}
  1. Apply the existing inbound debouncer to /btw (or at least to /btw duplicates) so spamming does not create multiple runs.

Analyzed PR: #64324 at commit 4fbde69

Last updated on: 2026-04-10T14:51:46Z

@openclaw-barnacle openclaw-barnacle bot added channel: feishu Channel integration: feishu size: S maintainer Maintainer-authored PR labels Apr 10, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 10, 2026

Greptile Summary

This PR fixes a Feishu-specific scheduling bug where /btw and /stop commands were queued behind active normal replies because all same-chat events shared a single serialization lane. The fix extracts a generic createSequentialQueue and adds getFeishuSequentialKey to route abort-style commands onto a dedicated :control lane and each /btw onto its own per-message-ID lane, while keeping normal same-chat traffic FIFO-serialized as before.

Confidence Score: 5/5

Safe to merge — change is well-scoped, logically equivalent for normal traffic, and directly tested.

The createSequentialQueue extraction is byte-for-byte identical in semantics to the removed createChatQueue. The key classifier correctly covers all three lanes (normal, control, btw) with a per-message-id btw key that prevents any same-key collision between concurrent side questions. Both new modules have focused unit tests that prove the FIFO invariant and the concurrency bypass. No P0 or P1 findings.

No files require special attention.

Reviews (1): Last reviewed commit: "fix(feishu): route /btw through out-of-b..." | Re-trigger Greptile

@ngutman
Copy link
Copy Markdown
Member Author

ngutman commented Apr 10, 2026

Thanks — good catch.

I agreed with the core concern: the original /btw key shape allowed unbounded per-message out-of-band lanes, which removed the normal same-chat backpressure and could fan out concurrent BTW work.

I pushed a follow-up fix in 7886b99 that bounds BTW concurrency per chat instead of per message:

  • normal traffic stays on feishu:<accountId>:<chatId>
  • /stop stays on feishu:<accountId>:<chatId>:control
  • /btw now uses a stable feishu:<accountId>:<chatId>:btw lane

That preserves the original bug fix for #60843 (BTW no longer waits behind the busy normal chat lane) while restoring bounded lane cardinality and backpressure.

Also added coverage to lock in that multiple BTW messages in the same chat resolve to the same stable lane.

Verified again with:

  • pnpm test extensions/feishu/src/sequential-key.test.ts extensions/feishu/src/sequential-queue.test.ts
  • pnpm build

@aisle-research-bot please re-review the latest commit.

@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot bot commented Apr 10, 2026

🔒 Aisle Security Analysis

We found 1 potential security issue(s) in this PR:

# Severity Title
1 🟡 Medium Queue key collision/DoS via constant fallback sequential key when chat_id is missing
1. 🟡 Queue key collision/DoS via constant fallback sequential key when chat_id is missing
Property Value
Severity Medium
CWE CWE-400
Location extensions/feishu/src/sequential-key.ts:11-12

Description

getFeishuSequentialKey derives the per-chat sequential queue lane from event.message.chat_id, but falls back to a constant string when the field is missing/blank.

  • Input: event is derived from an inbound webhook/WebSocket payload.
  • Processing: if chat_id is undefined, empty, or whitespace, it becomes "unknown".
  • Sink/impact: createSequentialQueue() serializes all tasks sharing the same key, so all such events end up in the shared lane feishu:${accountId}:unknown (and :unknown:control / :unknown:btw).

This creates a reliability/security risk:

  • Denial of service / head-of-line blocking: a stream of malformed/edge-case events without chat_id will block unrelated events that also hit the fallback lane.
  • Cross-chat interference: unrelated conversations that hit the fallback will be ordered and throttled together, potentially impacting control paths (/stop) and out-of-band (/btw) commands.

Vulnerable code:

const chatId = event.message.chat_id?.trim() || "unknown";
const baseKey = `feishu:${accountId}:${chatId}`;

Recommendation

Avoid a constant shared fallback key for missing/blank chat_id.

Preferred fixes (choose one):

  1. Reject/skip events that lack a valid chat_id (fail fast):
const chatId = event.message.chat_id?.trim();
if (!chatId) {
  throw new Error("Missing chat_id in Feishu message event");
}
const baseKey = `feishu:${accountId}:${chatId}`;
  1. If you must process them, fall back to a high-cardinality unique identifier to avoid cross-event collisions (e.g., message_id, or as a last resort a hash of the raw payload):
const chatId = event.message.chat_id?.trim();
const fallback = event.message.message_id?.trim();
const keyId = chatId ?? (fallback ? `nochat:${fallback}` : `nochat:${crypto.randomUUID()}`);
const baseKey = `feishu:${accountId}:${keyId}`;

Also consider logging/metrics for missing chat_id occurrences to detect upstream issues.


Analyzed PR: #64324 at commit 7886b99

Last updated on: 2026-04-10T14:14:33Z

@ngutman ngutman self-assigned this Apr 10, 2026
@ngutman ngutman force-pushed the fix/feishu-btw-out-of-band branch from 7886b99 to 4fbde69 Compare April 10, 2026 14:48
@ngutman ngutman merged commit 4b4ec4d into main Apr 10, 2026
10 checks passed
@ngutman ngutman deleted the fix/feishu-btw-out-of-band branch April 10, 2026 14:48
@ngutman
Copy link
Copy Markdown
Member Author

ngutman commented Apr 10, 2026

Landed via temp rebase onto main.

  • Gate run before landing:
    • pnpm lint
    • pnpm build
    • pnpm test was run and hit unrelated failures already present on latest main during landing, including src/commands/daemon-install-helpers.test.ts, src/gateway/server.sessions.gateway-server-sessions-a.test.ts, src/gateway/server.config-patch.test.ts, src/agents/skills*.test.ts, and extensions/acpx/src/runtime.test.ts
  • Land commit: 4fbde69
  • Merge commit: 4b4ec4d

Thanks @ngutman!

steipete pushed a commit to 100yenadmin/openclaw-1 that referenced this pull request Apr 10, 2026
* fix(feishu): route /btw through out-of-band lanes

* fix(feishu): bound btw out-of-band lanes

* fix: route feishu btw out-of-band (openclaw#64324) (thanks @ngutman)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: feishu Channel integration: feishu maintainer Maintainer-authored PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] BTW side questions silently queue instead of returning immediately (Feishu regression)

1 participant