Skip to content

v0.13.0 — SlackGateway harness + adapter decomposition + FIFO inflight queue

Choose a tag to compare

@hanfour hanfour released this 20 May 06:26
· 93 commits to main since this release

v0.13.0 — SlackGateway harness + adapter decomposition + FIFO inflight queue

Why

The v0.7 → v0.12 gateway picked up presence broadcast, per-channel audience, monthly audit logs, msg_too_long hardening, anthropic-api as default, token.usage rollups — all landed on a 1708-line SlackAdapter monolith with zero automated coverage. Every change in that series had to be verified by hand via live-Slack dogfood because instantiating the adapter required real Slack tokens and a live socket connection. Two problems compounded:

  1. No safety net for a refactor. The adapter mixed Slack transport + session + LLM + mra-ask + escalation + atoms + reactions + presence + slash commands. Splitting it into focused modules — the only way to keep adding features without the file becoming write-only — was hostage to live verification per change.
  2. Two latent multi-user bugs were invisible. Rapid follow-up messages from the same user were silently dropped behind a misleading :hourglass: 你上一則訊息還在處理,請稍候 notice that implied queue semantics but never queued; and parallel @-mentions in the same channel raced on a shared chat-session.json, so the second writer's turn was overwritten on disk even when both Slack-side replies landed.

v0.13 closes both: a constructor-injected fake transport + 27-test integration harness lands first, then four tranches incrementally extract focused modules under that safety net, and the two multi-user bugs get fixed with semantics that match what the UX always implied.

Added

SlackGateway integration harness (#52)

packages/cli/test/harness/slack-fakes.ts provides FakeWebClient, FakeSocketModeClient, FakeLlmProvider, FakeMra, and buildHarness(). SlackAdapter constructor now accepts optional web / socket / llm / mraDoctor / runMraAsk overrides — production path is byte-equivalent (defaults to real wiring when none supplied). 27 new integration tests cover the orchestration surface that pre-v0.13 had no automated coverage: DM happy-path, mra-ask escalate, channel @-mention free-chat, channel-with-active-case, /pmk admin slash command, reaction-based atom approval, presence broadcast / restart matrix, msg_too_long retry, and envelope dedup across all three event paths.

Multi-user channel concurrency (#53)

The inFlight lock key changes from channelId to ${channelId}:${userId}. Different users in the same channel can now ask the bot questions in parallel without waiting for each other's 60–90 s mra-ask round. A single user's rapid double-tap stays serialised (the original intent of the lock). Busy-notice text changes from :hourglass: 已有訊息在處理中:hourglass: 你上一則訊息還在處理 to match the now per-user semantics.

Append-only channel message log (#53)

packages/cli/src/gateway/channel-log.ts replaces the read-modify-write chat-session.json with per-channel JSONL: ~/.pmk/gateway/slack/channels/<channelId>/messages.jsonl (and per-thread under threads/<ts>/messages.jsonl). appendChannelTurns uses a single fs.appendFileSync syscall → POSIX-atomic, so concurrent parallel writers from the new per-user-per-channel lock all land instead of last-write-wins dropping turns. One-time migration from legacy chat-session.json on first read.

FIFO inflight queue (#55)

packages/cli/src/gateway/slack/inflight-queue.ts replaces the pre-v0.13 "drop second message + misleading busy notice" model. Rapid follow-up messages behind an in-flight LLM round now actually queue FIFO (default depth 3) and drain in submission order. Key is the caller's: userId for DM, ${channelId}:${userId} for channel @-mention. At-cap submissions get an explicit :no_entry: 你已有多則訊息排隊中(上限 3 則),請等回覆後再發 rejection notice instead of silent drop. Queued submissions get :hourglass: 你上一則還在處理,這則已排入隊伍(會依序處理). Slack envelope handlers now return fast (fire-and-forget) so the 3-second ack window stays comfortable.

Graceful shutdown drain (#55 review)

SlackAdapter.stop({ drainTimeoutMs }) reordered to socket.disconnectqueue.waitForAll (bounded) → presence.offline. The SIGINT/SIGTERM handler passes 25_000 ms (K8s SIGKILL kicks in at 30 s); a stuck LLM round logs drain timed out after 25000ms; abandoning remaining in-flight work and continues so SIGKILL isn't what reaps the process. Without the drain, queued turns died with process.exit(0) — users who saw the "排入隊伍" notice never got a reply.

Also defensive: drain() cleanup wrapped in try/finally and onLog calls wrapped in their own try/catch so a broken logger can't lock a queue key permanently.

Changed

SlackAdapter decomposed across three tranches under the new harness:

Tranche New modules Adapter lines
Baseline (v0.12.0) 1708
1 (#52) slack/presence.ts (146) + slack/envelope-dedup.ts (45) + slack/concurrency.ts (36) 1597 (-111)
2 (#54) slack/escalation.ts (308) 1411 (-229)
3 (#55) slack/free-chat-turn.ts (509) + slack/inflight-queue.ts (141) 1010 (-401)
Net -698 (-41%)

Tranche 4 (dispatcher cleanup, target <800 lines) is deferred. Behaviour byte-equivalent across all three tranches; the harness's tests pass without modification at every step.

Tests

@pmk/cli 312 → 362 (+50). Major additions:

  • Integration harness (27 tests, Phase 1–3) — DM happy-path (3), mra-ask escalate round (3), channel @-mention free-chat + case path (2), /pmk admin slash (3), reaction-based atom approval (4), presence broadcast restart matrix (5), msg_too_long retry hardening (3), envelope dedup across DM / @-mention / slash (4).
  • Multi-user concurrency (#53, 2 tests) — different users in same channel proceed in parallel; same user double-tap blocked with per-user notice.
  • Append-only channel log (#53, 7 tests) — append-on-empty, FIFO ordering, malformed-line skip, sinceMs cutoff, legacy migration round-trip, entriesToMessages shape, multi-channel isolation.
  • FIFO inflight queue (#55, 10 unit tests) — enqueue contract, default maxDepth=3, FIFO order, key independence, error isolation, waitForAll correctness, onLog-throwing-defence (broken logger can't lock a key out forever).
  • Inflight queue adapter integration (#55, rewritten suite) — different users still parallel without queue notice, same-user double-tap queues with correct UX text, 4th submission rejected with cap notice.
  • Graceful shutdown drain (#55 review, 2 tests) — stop() waits for in-flight work before broadcasting offline; stop({ drainTimeoutMs }) honours timeout when work is genuinely stuck and logs the abandon.

Total across the workspace: 362 → 412 pass, 0 fail.

Live-Slack verification (2026-05-20)

Verified end-to-end on a real slack-webhook workspace driven via chrome-devtools. All 5 critical scenarios green:

  1. Rapid-fire DM queue — 5 rapid messages from one user → 1 runs, queue notices fire for in-flight follow-ups: :hourglass: 你上一則還在處理,這則已排入隊伍(會依序處理)。
  2. Queue cap rejection — 6 mra-ask-bound questions in rapid burst → 3 queued + 6 cap-reject notices fire with: :no_entry: 你已有多則訊息排隊中(上限 3 則),請等回覆後再發。
  3. Different users in same channel run parallel — hanfour and H4 @PMK in #新頻道 overlapping in time → both processed, zero queue notice in channel (per-user-per-channel queue key working).
  4. Graceful shutdown drain — sent mra-ask, SIGTERM at 3s while mra was still running → gateway log: received SIGTERM, shutting down… / slack socket disconnected / stop: drain timed out after 25000ms; abandoning remaining in-flight work. Drain budget bounded as designed; SIGKILL never reaped.
  5. Channel-log persistence — both users' parallel @-mention turns persist in ~/.pmk/gateway/slack/channels/C0AVD1XD946/messages.jsonl with userId attribution. Legacy chat-session.json migrated to .aside-*.bak.

Operator note

User-visible Slack text changed

Rapid follow-up DM/@-mentions now see one of three messages instead of the pre-v0.13 single dropped one:

  • 1st follow-up while bot is replying: :hourglass: 你上一則還在處理,這則已排入隊伍(會依序處理) — and the message will be processed, not dropped.
  • 4th submission while 3 are already queued: :no_entry: 你已有多則訊息排隊中(上限 3 則),請等回覆後再發 — this one IS dropped, but explicitly.
  • Channel @-mention from a different user while bot is busy with someone else: no notice at all, both run in parallel.

Zero migration on disk

The new channels/<channelId>/messages.jsonl layout migrates from legacy chat-session.json automatically on first read. Operators can rm legacy .aside-*.bak files manually after observing the migration in events.log, but the reader is idempotent — re-running upgrade is safe.

Shutdown takes longer

Pre-v0.13 graceful shutdown was sub-second; v0.13 now waits up to 25 s for queued / running LLM rounds to complete before exiting. The trade-off: users who saw the "排入隊伍" notice actually get their reply, instead of silent drop on kill. K8s terminationGracePeriodSeconds should be ≥30 (the default).

What's next (v0.13.1 / v0.14)

  • Tranche 4: SlackAdapter dispatcher cleanup → under 800 lines (coding-style max).
  • Per-key work-level LLM timeout (currently a stuck LLM can hold a user out of the queue until the host restarts).
  • Case-file load-modify-write retry for parallel @-mentions in case-open channels (the one remaining race the new lock semantics surfaces).