Skip to content

Tandem v0.4.41

Choose a tag to compare

@github-actions github-actions released this 26 Apr 14:04
· 581 commits to main since this release
b708fa5

See the assets below to download the installer for your platform.

v0.4.41 (Unreleased)

This unreleased build adds chat-native automation drafts for Discord, Telegram, Slack, and direct channel conversations, hardens workflow-planning handoff for explicit planner flows, and lays the foundation for default approval gates and rich channel UX (see docs/internal/approval-gates-and-channel-ux/PLAN.md).

Approval-gate foundation

  • Unified approval data model: New tandem_types::approvals module defines ApprovalRequest, ApprovalDecision, ApprovalSourceKind, ApprovalTenantRef, ApprovalActorRef, ApprovalDecisionInput, and ApprovalListFilter. Every Tandem subsystem (control panel, channel adapters, future surfaces) consumes one shape regardless of which subsystem owns the underlying pending state. The aggregator and decision routing remain authoritative through subsystem-specific handlers (e.g. POST /automations/v2/runs/{run_id}/gate_decide); a unified decide endpoint is intentionally deferred until at least two source subsystems are wired.
  • Cross-subsystem approvals aggregator: New GET /approvals/pending endpoint returns a unified list of pending approval requests, drawn today from automation_v2 mission runs whose checkpoint.awaiting_gate is set. Coder and workflow sources will plug in once their pause/resume paths are wired. Filterable by org_id, workspace_id, source, and limit. Tested against real state via three integration tests covering surface, empty, and source-filter behavior.
  • Why this matters: control panel, Slack/Discord/Telegram channel adapters, and future approval surfaces will all read from the same endpoint. The shape is stable; surface implementations slot on top without re-shaping the data.

Slack interactive approvals (W2 vertical slice)

  • Webhook signing module at tandem-channels/src/signing.rs with three platform verifiers: Slack HMAC-SHA256 with 5-minute replay protection, Telegram per-webhook secret-token constant-time compare, and a Discord Ed25519 stub returning SecretNotConfigured until ed25519-dalek lands in W4. 22 unit tests cover valid signatures, missing/malformed headers, replay window violations, body tampering, secret mismatches, and edge cases.
  • InteractiveCard trait extension: Channel::send_card(InteractiveCard) added to the channel trait as a separate method (not an optional field on SendMessage) so the type system tells callers which adapters have wired rich rendering. Default impl returns InteractiveCardError::NotImplemented. The InteractiveCard shape (title, body markdown, fields, primary/secondary/destructive buttons, optional reason prompt, optional thread key, opaque correlation) is the single source of truth across Slack/Discord/Telegram.
  • Slack Block Kit renderer at tandem-channels/src/slack_blocks.rs: pure functions converting InteractiveCard to Block Kit JSON, fully golden-testable without a Slack workspace. Layout: header → context → body section → fields grid → divider → actions block (chunked at Slack's 5-button cap). Helpers for the post-decision in-place edit (chat.update) and the rework reason modal (views.open). 18 tests assert exact JSON shapes including button styling, confirm dialogs, and correlation round-trips.
  • POST /channels/slack/interactions endpoint: receives Slack button clicks. Verifies HMAC on every request via the new signing module; rejects forged or stale requests with 401 Unauthorized. Acks within Slack's 3-second window. Bounded LRU dedup ring on (action_ts, action_id) absorbs Slack's retry-on-missed-ack. Parses the URL-encoded payload field, extracts the primary action, dispatches approve / cancel directly to automations_v2_run_gate_decide. Rework decisions parse correctly; the modal round-trip lands in W4. New SlackConfigFile.signing_secret field carries the app signing secret.
  • Race UX in gate-decide 409: when two surfaces try to decide the same gate concurrently (Slack click + control-panel click), the loser's 409 response now includes winningDecision { node_id, decision, reason, decided_at_ms } plus currentStatus. Slack/Discord/Telegram deferred replies can render "already decided by @alice" instead of a raw error.
  • Why this matters: this is the discovery slice from docs/internal/approval-gates-and-channel-ux/PLAN.md. The risky parts (auth, idempotency, race UX, type-safe rendering) are now done and tested. The remaining Slack work (in-place edit dispatch, threaded run-status replies, App Home pinned approvals) needs a real Slack workspace to verify end-to-end and is staged for a follow-up real-Slack session. Discord and Telegram adapters in W4 inherit the same signing, trait shape, and interaction patterns.

Default-on approval gates and Approvals Inbox (W3)

The agent-owned-workflows pitch becomes runnable end-to-end on automation_v2 missions. Generate intent → compiler auto-wraps high-stakes steps → run pauses on the gate → operator sees and decides in a single inbox.

  • Tool approval classifier at tandem-tools/src/approval_classifier.rs: pure, table-driven, exhaustively unit-tested (19 tests). Maps every Tandem built-in plus the curated MCP catalog (CRM, payments, outbound communications, public posts, calendar, trackers, Notion, GitHub mutating verbs, coder merge/publish) to one of RequiresApproval | NoApproval | UserConfigurable. Suffix heuristics (.send, .publish, .create, .update, .delete, .merge, .pay, .charge, .refund) catch unknown vendors. Default-deny for ambiguous cases.
  • Default-on compiler gate injection at tandem-plan-compiler/src/mission_runtime.rs: walks every projected workstream node, classifies its tool allowlist, and attaches a default HumanApprovalGate when the node touches an external mutation. Idempotent (preserves explicit blueprint gates), scope-override-aware (honors a future metadata.approval.skip_approval from the ScopeInspector toggle), and skips Approval/Review stages whose gates are blueprint-owned. 7 golden tests cover CRM-write injection, outbound-email injection, pure-read no-injection, wildcard injection, unknown-tool fail-closed, scope override skip, and explicit-blueprint-gate preservation.
  • Planner-prompt teaching: the planner agent is now told the runtime owns gate placement. New "Approval gates:" section in workflow_plan_common_sections() says: don't add gates yourself; describe the workflow as if approvals are present; batch related external actions to minimize approval count; declare stage_kind=Approval only when the gate IS the point of the step. The compiler enforces what the prompt promises.
  • Approvals Inbox at #/approvals in the control panel. Polls GET /approvals/pending every 5s, renders each pending request as a card (workflow name, source, action preview, identifiers, requested-at), with Approve / Rework / Cancel buttons in colors that match each decision's stake. Rework opens an inline reason form. Race-aware: a 409 from gate-decide (W2.6) surfaces as "already decided by another operator" instead of a raw error. Wired into navigation as a top-level route with the shield-check icon.
  • Deferred to a focused TS session: the per-step override toggle inside ScopeInspector.tsx (a 2884-line file). The compiler-side hook (metadata.approval.skip_approval) is already wired and tested, so the toggle plugs in without further server work and does not block W4 (Discord + Telegram) or W5 (notification fan-out + slash commands).
  • Net for the demo plan: every gap that the demo's Tier 0 acceptance test depends on now has shipping code: gates are wired into the demo workflow's compile path, approvals surface as both a control-panel inbox and (via W2's Slack interactions) a chat-native flow, and the run-time governance promise is verifiable on synthetic-data runs.

Discord and Telegram interactive approvals (W4)

Brings Discord and Telegram up to parity with W2's Slack interactive cards. Operators can now approve, reject, or rework a workflow gate from any of the three channels — or the control-panel inbox — and the runtime sees one decision regardless of surface.

  • Discord Ed25519 signature verification is now real (the W2 stub is replaced). The verify_discord_signature function in tandem-channels/src/signing.rs decodes the application public key, reconstructs the signed payload as {timestamp}{body}, and verifies via ed25519-dalek. Discord disables endpoints that fail validation even once, so this is mandatory plumbing — 9 new tests cover valid signatures, missing/malformed headers, wrong-key forgery, wrong-body forgery, wrong-timestamp forgery, and invalid public-key hex.
  • Discord rich-UX renderer at tandem-channels/src/discord_blocks.rs: pure functions converting InteractiveCard to one embed plus chunked action-row JSON, the post-decision in-place edit (with color transitions: amber pending → emerald approved → indigo reworked → red cancelled), modal data for the rework reason flow, and deferred/inline interaction response wrappers. parse_custom_id round-trips the tdm:{action}:{run_id}:{node_id} correlation. allowed_mentions: parse=[] so approval cards never @-ping. 19 golden tests assert exact JSON shapes and button-style mappings.
  • POST /channels/discord/interactions endpoint receives every Discord interaction (PING, button click, modal submit, slash command). Verifies Ed25519 on every request, ack-PINGs, dispatches button clicks to the existing gate-decide handler, and opens a modal on Rework. Bounded LRU dedup on interaction_id absorbs Discord retries. Race UX maps non-200 gate-decide failures to a UPDATE_MESSAGE so Discord stays happy and the user sees the conflict. New DiscordConfigFile.public_key field.
  • Telegram inline-keyboard renderer at tandem-channels/src/telegram_keyboards.rs: pure functions converting InteractiveCard to MarkdownV2 text + inline keyboard JSON, the post-decision editMessageText/editMessageReplyMarkup payloads, and force_reply (Telegram's modal substitute) for the rework reason flow. Emoji prefixes (//none) signal button intent since Telegram has no button-style enum. build_callback_data respects Telegram's 64-byte cap with a truncation marker that the dispatcher resolves via cache (W5 wiring). Full MarkdownV2 escaping. 16 golden tests cover keyboard layout, callback_data round-trip, truncation, MarkdownV2 escaping, force-reply payload, and label truncation.
  • POST /channels/telegram/interactions endpoint receives Telegram callback_query updates. Verifies x-telegram-bot-api-secret-token on every request via the W2 signing module. Bounded dedup on update_id. Dispatches button clicks to the existing gate-decide handler. Truncated callback_data fails closed pending the W5 short-lived cache. New TelegramConfigFile.webhook_secret_token field.
  • What's deferred to W5 (the supporting pieces that need real workspaces or shared dispatcher state):
    • The chat.update/PATCH /channels/.../messages/{id}/editMessageText dispatch against real workspaces (the builders are tested; calling the API end-to-end requires plumbing the original message ID from post-send response back through gate-decide).
    • Telegram force-reply capture (the payload builder is shipped; the dispatcher state machine that intercepts the user's NEXT message and routes it as the rework reason shares plumbing with channel_automation_drafts).
    • Truncated-callback-data resolution via short-lived cache.
    • Threading per workflow run (Discord threads, Telegram supergroup topics).
  • Net: 47 new tests across W4. All four surfaces (control panel + Slack + Discord + Telegram) accept signed interactions, deduplicate retries, and dispatch through the same authoritative automations_v2_run_gate_decide handler. The runtime sees one decision per gate regardless of surface.

Notification fan-out, slash commands, authority resolver (W5)

The runtime side of the agent-owned-workflows pitch is now production-shaped: notifications never get lost, channel users have grown-up commands for managing pending approvals, and every channel-side decision resolves to a stable engine principal for audit.

  • Approval notification fan-out task at tandem-server/src/app/approval_outbound.rs is a polling outbox. The Plan agent flagged the engine event-bus pattern as wrong for approvals — tokio::sync::broadcast::error::RecvError::Lagged(_) drops events and a missed approval means a stuck run. The fan-out instead polls /approvals/pending (an idempotent read of durable state) on a configurable interval (default 5s) and dispatches new requests through a Vec<Arc<dyn ApprovalNotifier>>. In-memory DedupRing (FIFO at 8192) prevents re-dispatch; prune_to evicts decided requests. NotifierError::Transient/Permanent semantics let surface implementations decide their own retry/suppression strategy without blocking the polling loop. run_polling_loop exposes a cooperative Arc<AtomicBool> cancel for deterministic shutdown. 9 unit tests cover the full state machine.
  • Slash commands /pending and /rework in the channel dispatcher. /pending lists outstanding workflow approval gates as a numbered chat-friendly summary (workflow name, run_id, action_kind). /rework <run_id> <feedback> sends a paused gate back for rework with the user's feedback, surfacing the W2.6 race conflict body as "already decided by another operator" instead of a raw error. Both registered in BUILTIN_CHANNEL_COMMANDS so registry-driven /help shows them in operator/trusted-team channels and excludes them from PublicDemo. 5 unit tests on parsing, missing-feedback rejection, registry presence, and PublicDemo disable.
  • Channel authority chain resolver at tandem-server/src/app/state/principals/channel_identity.rs. resolve_channel_user(config, kind, user_id) returns Resolved(RequestPrincipal) / ChannelNotConfigured(kind) / Denied { kind, user_id }. The principal carries actor_id = "channel:{kind}:{user_id}" so distinct channel surfaces never alias the same user ID into the same actor — Slack's U12345 and Discord's U12345 resolve to different principals. Empty allowed_users is deny-all by design (channels must opt users in explicitly). Callers must treat denials as hard rejects — never silently approve as anonymous, because the audit trail would then carry no actor for an external mutation. 12 unit tests cover wildcard, deny-by-default, case insensitivity, Telegram @-prefix matching, missing config/user, and per-kind actor-id distinguishing.
  • Concurrent-race regression test in http/tests/approvals_aggregator.rs. Fires two parallel POST /automations/v2/runs/{run_id}/gate requests against the same pending gate via tokio::spawn. Asserts exactly one 200 + one 409 with winningDecision populated, and that gate_history.len() == 1 post-race. Mandatory before any rollout per the W5 plan — the previous W2.6 test simulated post-race state but did not exercise the per-run mutation lock under real contention.
  • What's deferred to a real-workspace session (the supporting pieces that need live Slack/Discord/Telegram to verify):
    • Actual API-call dispatch for the W2/W4 in-place edits (chat.update, Discord PATCH, editMessageText) and threading.
    • Telegram force-reply capture state machine (the payload builder is in W4; the dispatcher state-machine that intercepts the user's NEXT message shares plumbing with channel_automation_drafts).
    • Truncated callback-data resolution via short-lived cache.
    • Slack App Home pinned approvals (needs OAuth flow setup).
    • Workflow execution pause/resume (W1.4) — still wants the design conversation about routing workflows through the automation_v2 executor vs parallel pause/resume in the workflow layer.
    • ScopeInspector per-step override toggle (W3.3) — focused TS task; the server-side hook (metadata.approval.skip_approval) is wired and tested.
  • Net: every claim about the agent-owned-workflows pitch ("agents own workflows; humans approve boundaries; runtime governs") is now backed by tested code. Across W1–W5: 158 new tests across 7 crates and 2 control-panel surfaces. The remaining work is platform integration, not new architecture.

Chat-native automation drafts

  • Same-channel draft flow: Automation creation requests now create bounded channel drafts that ask one follow-up question at a time in the same chat context instead of forcing users into the experimental workflow planner.
  • Durable draft API: New channel-draft endpoints start or continue drafts, accept answers, return previews, confirm creation, cancel abandoned drafts, and expose pending drafts for diagnostics.
  • Consistent reply capture: When Tandem asks a draft question, the next eligible non-command message from the same sender in the same room, thread, DM, or session answers that draft; other users and other scopes are ignored.
  • Explicit confirmation before activation: Completed drafts return a plain-text preview and only create an active automation after the user replies with confirmation.
  • Channel-bounded automation metadata: Confirmed automations carry source platform, channel, sender, scope, output target, and channel-derived tool/security bounds so chat-created automations stay within the channel's configured permissions.
  • Control-panel guidance: Channel settings now describe next-reply capture, cancellation, confirmation, same-channel output defaults, and pending draft behavior without requiring a separate workflow-editor handoff.

Engine authentication

  • Token auth by default: tandem-engine serve now loads an explicit token, reads TANDEM_API_TOKEN_FILE, or creates a shared engine API token automatically when no token exists.
  • Shared local credential: Desktop, TUI, control-panel, and direct CLI flows can rely on the same keychain-first/file-fallback credential instead of leaving direct engine starts tokenless.
  • Unsafe local opt-out: Advanced trusted-local testing can still disable token auth with --unsafe-no-api-token or TANDEM_UNSAFE_NO_API_TOKEN=1; this mode is not intended for 0.0.0.0, reverse-proxied, hosted, tunneled, or shared deployments.

Workflow planning visibility

  • Explicit planning mode: Planner sessions now persist workflow_planning state plus draft ID, source channel/platform, requesting actor, allowed and blocked tools, known and missing requirements, and validation state.
  • Clarification-first drafting: Missing workflow details now trigger focused follow-up questions about triggers, inputs, outputs, publish behavior, required tools, approval, and memory/files instead of a fake-ready draft.
  • Connector-heavy workflow prompts: Scheduled workflow prompts that mention MCPs or destinations such as Notion now route to workflow planning instead of being mistaken for integration setup.
  • No planner thread hijacking: Linked workflow planner sessions no longer capture ordinary informational chat like "what is ..." or "what do I do?", and planner-model setup pauses now explain the admin action instead of asking for an impossible answer.
  • Structured audit trail: Workflow planning now emits events for start, draft create/update, missing requirements, blocked capabilities, approval requests, validation, docs-MCP usage, and review readiness.

Governance and handoff

  • Explicit blocked capabilities: Required tools or MCPs that are not allowed stay blocked, are surfaced in the draft, and route through the existing approval queue instead of being silently widened.
  • Ordinary automation creation stays chat-native: automation_create intents now use the channel draft flow, while workflow planner handling stays behind the existing experimental gate for explicit planner requests.
  • Workflow planner replies stay in chat: Channel workflow-planning responses now surface planner questions, draft summaries, validation state, and blocked capabilities in the chat thread, with the control-panel link kept for review/apply.
  • Draft-first external channels: Telegram, Discord, and Slack still return a compact preview plus the control-panel review link, and they do not directly activate workflows in V1.1.
  • Control-panel review details: The planner review surface now shows the original request, source platform/channel, actor, draft status, validation status, required capabilities, blocked capabilities, approval requirements, and the generated preview.

Demo readiness

  • Internal runbook: Added docs/internal/CHAT_WORKFLOW_PLANNER_DEMO.md with setup, happy-path, missing-details, blocked-capability, and troubleshooting steps.
  • KB-first channel grounding: Hosted and external knowledgebase MCPs can now be marked as grounding-required, and channel sessions that enable those KB MCP tools must inspect KB evidence before returning factual answers instead of relying on model memory.
  • Strict KB answer mode for channel bots: Channels can now enable strict_kb_grounding, which rewrites the final reply from retrieved KB excerpts only, fails closed with I do not see that in the connected knowledgebase. when the KB has no supported answer, and adds short source footers when KB search results expose document paths.
  • Full-document strict KB evidence: Strict KB mode now follows search hits with full get_document retrieval when KB source identifiers are available, so answers are based on complete source documents instead of truncated search snippets.
  • Safe source receipts: Channel replies now show display-safe source labels such as Company Overview, Sponsor FAQ, Staff Roles And Contacts, and Discord Community Rules instead of local filesystem paths, storage keys, or internal document IDs.
  • Fail-closed snippet handling: If a likely KB document is found but full content cannot be fetched, strict mode now refuses to answer from partial snippets rather than filling gaps with general model knowledge.
  • Policy-only external action answers: Strict KB mode now answers external action requests from policy evidence only, so Discord moderation questions cannot fall back to generic UI/admin instructions unless those steps are actually in the KB.
  • Provider stream error repair: Strict KB turns now repair provider stream decode failures after KB evidence retrieval, retry final synthesis once without streaming, and otherwise return a channel-safe retry message instead of raw ENGINE_ERROR details.
  • Evidence-only final answers: Strict KB channel replies now render final answers from retrieved evidence sentences instead of model-authored helpful prose, preventing invented payout processes, staff-directory advice, escalation channels, and external-platform how-to steps from appearing in the demo bot.
  • Wildcard channel grounding: Strict KB channels with default wildcard tool access now still bind enabled knowledgebase MCPs into the required KB search policy, preventing Telegram/Discord channel bots from bypassing grounding when operators have not explicitly listed each KB MCP tool.
  • MCP-context KB routing: Channel messages with an explicitly selected MCP context now treat factual questions as strict KB turns even when the global channel strict-KB toggle is off, so Telegram DM knowledge bots do not silently fall back to generic model chat.
  • KB admin fail-closed check: The control panel now treats the KB upload/browse surface as available only after /api/knowledgebase/config verifies that the KB admin backend is reachable, preventing /collections and /documents 502s from firing when the admin service is down or misconfigured.
  • Nested KB document deletes: The control-panel KB proxy now forwards nested document slugs with a single encoded slash, so admin deletes work for documents stored below folders such as automation/....

Full Changelog: v0.4.39...v0.4.41