Release Tandem v0.5.5 · frumu-ai/tandem

See the assets below to download the installer for your platform.

v0.5.5 (2026-05-13)

This release lays down the Execution Profiles foundation — a runtime governance toggle (Strict / Guided / YOLO) that will let users keep working while validators and contracts continue to harden, without abandoning Tandem's runtime ownership of state, receipts, replay, spend tracking, and approvals. The motivation is operational: full governance still has a high run-fail rate as bugs are ironed out, and a meaningful share of those failures are over-strict (false-positive validation, missing-but-non-essential sections, recoverable artifact issues) rather than real defects. Execution Profiles are the structured bridge that lets affected runs continue with the relaxation captured in receipts, so the data we collect can drive validator classes back to Strict-by-default once they mature.

The v0.5.5 cut is backend telemetry-only. Strict, Guided, and YOLO runs all produce identical run outcomes today; the only difference is in receipts. This is intentional. The status-downgrade behavior change (where Guided actually warns instead of blocking, and YOLO actually continues as experimental) is gated on the next slice, which can calibrate against the validator-class telemetry collected here. No existing automation changes behavior in this release.

What ships now:

Type foundation (automation_v2::execution_profile): ExecutionProfile enum (strict/guided/yolo), ValidatorClass taxonomy with is_relaxable_in(profile) and a conservative is_critical() allowlist for never-relaxable classes (auth, secret access, destructive-action approval, budget caps, kill switch, deterministic verifier failures). decide_profile_validation is the single chokepoint; augment_output_with_profile_relaxation is the executor-facing helper; classify_unmet_requirement maps existing validator strings to the taxonomy.
Run record and API: AutomationExecutionPolicy.profile is now optional and persisted. Every AutomationV2RunRecord carries typed effective_execution_profile and requested_execution_profile. POST /automations/v2/{id}/run_now accepts an optional execution_profile override (Strict, Guided, or YOLO) that applies for the single run only without mutating the saved automation. resolve_effective_execution_profile enforces a deterministic precedence: run override → workflow policy → Strict.
Lifecycle and event observability: record_automation_lifecycle_event_with_metadata automatically merges the run's effective_execution_profile into every AutomationLifecycleRecord so existing audit, replay, and Bug Monitor surfaces see the profile without per-call-site changes. The automation_v2.run.failed engine event now includes both effective_execution_profile and requested_execution_profile, so Bug Monitor and downstream observers can attribute failures to the active profile.
Executor chokepoint (telemetry-only): The executor invokes augment_output_with_profile_relaxation at the single run-acceptance moment. When every unmet_requirement on a node output is relaxable under the active profile, it writes relaxed_validator_classes (structured), effective_outcome, original_validator_outcome, execution_profile, and experimental: true (YOLO) into the artifact_validation block. Strict runs are unchanged. Critical classes (destructive-action approval, budget cap, etc.) always block; if any classification is unknown, the augmentation conservatively skips so behavior stays Strict-equivalent.
24 unit tests covering serde round-trip, default-to-Strict, critical-class blocking, soft-class relaxation per profile, tenant-denylist enforcement, classifier mapping, augmentation purity, and lifecycle metadata merge semantics.

What is intentionally deferred to follow-up slices and tracked in docs/internal/execution-profiles/KANBAN.md:

Phase 4b: status-downgrade behavior change so Guided actually warns and YOLO actually continues as experimental, gated on telemetry calibration.
Phase 5: wiring the existing effective_repair_budget multiplier (1.0 / 1.5 / 2.0 by profile) into the repair-decision call sites.
Phase 6: control-panel UI (profile selector, run pill, experimental badge).
Phase 7: Tauri desktop UI (matching control panel).
Experimental-input propagation rule for downstream nodes.
Tenant-level relaxation denylist and default-profile administration.

This patch keeps automation-owned runtime sessions out of the user Chat session list without hiding their audit trail from the rest of Tandem.

Sessions now carry explicit source metadata. New interactive sessions default to sourceKind: chat, Automation V2/Bug Monitor worker sessions are classified as automation_v2, and session listing supports filtering by source. The TypeScript client and wire model expose the same fields so control-panel views can ask for the session class they actually need.

The Chat sidebar and Dashboard recent-session list now request only source=chat, so Bug Monitor submissions such as Automation automation-v2-bug-monitor-triage-failure-draft-... / inspect_failure_report no longer appear as conversations. Legacy automation records with the existing title format are classified at the storage/wire boundary, preserving backward compatibility for already-written sessions.

The Tauri desktop Automation Calendar no longer crashes the app while loading. FullCalendar is now isolated into its own lazy bundle and imported only after the WebKit stylesheet host is ready, preventing the Cannot read properties of null (reading 'cssRules') startup failure seen when opening the calendar view.

Bug Monitor GitHub issue creation now uses a persisted pending idempotency claim before calling GitHub. Completion finalization, stale-provider recovery, deadline recovery, and status-sweep recovery can all wake up around the same draft, but only the first caller that claims the create-issue digest is allowed to create the GitHub issue. Concurrent callers now see publish_in_progress or reuse the posted record instead of producing duplicate issues with the same fingerprint and triage run.

Bug Monitor proposal quality gates also recognize the structured handoff shapes that triage nodes actually return, including wrapped objects such as { "bug_monitor_inspection": ... } and array responses containing the artifact followed by a compact status object. Placeholder task specs still fail the gate, but valid completed inspection, research, validation, and fix-proposal artifacts no longer get treated as missing and replaced with broad fallback evidence.

Bug Monitor triage status detection now treats nested status: blocked fields inside structured Bug Monitor handoffs as evidence/limitation data, not as the node's own runtime status. This prevents propose_fix_and_verification from recursively blocking the debugger when it has produced a useful partial fix proposal with acceptance criteria and bounded next steps.

Automation V2 long-running nodes now get to own their timeout path. The stale-run reaper honors the run-registry heartbeat that active node execution already emits every few seconds, so a first task with a 600-second budget is not globally paused as stale_no_provider_activity at the exact timeout boundary before the node can fail or repair normally.

Automation V2 research validation now preserves source URLs from successful websearch and webfetch tool results. If a generated JSON artifact is too sparse and omits raw links, the validator can still see the current web evidence that was actually gathered instead of blocking the node as citations_missing. The prompt and repair guidance also now explicitly tell research agents to include raw URLs in citations or web_sources_reviewed fields.

Connector-backed source research now has to use the selected connector, not merely discover it. A node that says to use Reddit MCP and resolves reddit-gmail can no longer complete after only mcp_list plus a JSON write; it must call a concrete source tool such as mcp.reddit_gmail.reddit_search_across_subreddits or mcp.reddit_gmail.reddit_retrieve_reddit_post, preserving real returned evidence or an actual connector/tool limitation.

The prompt and tool surface now reinforce that rule before validation has to catch it. Connector source prompts list concrete mcp.* tools and state that mcp_list, glob, grep, edit, and apply_patch are not source evidence, while non-code connector source nodes no longer offer edit/patch/bash tools that can distract agents from calling the connector.

Connector-backed delivery nodes now keep their destination MCP tools focused all the way through artifact creation. Notion save/report nodes with explicit mcp.notion.* tool allowlists no longer inherit generic workspace read/glob or mutation tools from upstream input refs, but they still retain the required write tool for the run artifact receipt. The engine loop also narrows prewrite MCP gating to the specific concrete connector tools that have not yet run, steering a Notion publisher from notion_fetch to notion_create_pages instead of letting it loop on already-completed discovery or local inspection.

Required-tool provider calls now fail closed inside Tandem instead of being rejected by the provider when routing filters remove every tool. Write-required connector nodes keep the artifact write tool even when their session allowlist is connector-only, and if a later filter still produces an empty tool set Tandem downgrades the provider request away from tool_choice: required rather than sending an invalid no-tools request.

Transient provider stream decode failures are now treated as recoverable provider infrastructure failures. Stream errors such as error decoding response body, unexpected EOF, and incomplete streamed responses are retried inside the current provider iteration with partial streamed text/tool-call state cleared before retry. The retry budget is bounded by TANDEM_PROVIDER_STREAM_DECODE_RETRY_ATTEMPTS, and each retry emits a provider.call.iteration.retry event for debugging.

Automation V2 governance now gives repair attempts a calmer, more actionable handoff. Attempt verdicts include a calm_teammate_v1 review with a progress score, what the agent completed correctly, what is still needed, why the missing work matters, and the next concrete moves. Repair prompts show that review before the raw expected/observed contract JSON, so retries can keep good evidence and fix the smallest missing piece rather than restarting from a vague validation failure.

Bug Monitor failure reports now preserve both the final failure and the useful prior attempt evidence. Automation V2 failure events carry recent attempt verdict chains and attempt review chains into Bug Monitor submissions, making issue details show earlier contract misses such as missing workspace files, missing concrete connector calls, citation gaps, or required next actions even when the final observed failure is a provider stream/runtime error.

Stale provider/session recovery now retries by default instead of stopping at a pause. When the stale reaper cancels a dead session, the in-progress node is marked needs_repair and the stale-reaped run is automatically requeued while attempt budget remains. The existing auto-resume cap keeps truly wedged providers from looping forever, and operators can disable this behavior with TANDEM_DISABLE_STALE_AUTO_RESUME.

The control panel also avoids presenting active workflow sessions as stalled. A running Automation V2 run with active sessions stays visually running, and background-tab polling gaps are shown as a softer “waiting on active session” detail. The backend stale reaper remains the authority for real stale_no_provider_activity pauses.

The control-panel Chat view now waits for the completed assistant message to materialize in the exact active session before clearing the live thinking/streaming state. This closes the blank-response gap where an answer was saved on the server and appeared after refresh, but the live UI had already removed Thinking... without rendering the final assistant message.

Hosted Files now distinguishes workspace-root configuration from workspace-files API availability. The Files page only enables workspace browsing when capabilities explicitly advertise the API route, so managed-file deployments no longer spam /api/workspace/files/list?dir= 404s.

Chat also preflights active-run cleanup before sending a new prompt. If a stale session run is still registered, the UI cancels and waits for idle before posting prompt_async, with the 409 conflict payload still used as a fallback if a race appears between the preflight and send.

The Coder board now matches ACA's updated GitHub Project intake rules for launchable work. Todo and TODOS lanes are recognized as runnable in the control panel, and planned GitHub tasks are moved into the detected launch lane rather than assuming the project has a Ready status. This fixes projects where the coding agent should accept cards from TODOS but the board UI left them looking unlaunchable or published new tasks into the wrong lane.

Workflow tasks now have first-class per-node tool access. Automation V2 nodes can carry their own tool_policy and mcp_policy, and the runtime treats those policies as a hard session scope rather than a hint layered on top of broader workflow access. This is especially important for approval-gated Gmail draft workflows: the compose and draft-create steps can be scoped away from send tools, while the post-approval step can be scoped to the concrete send-draft MCP tool that should run only after approval.

The control panel exposes this in both Workflow Studio and the existing automation edit dialog. Each node has a default-collapsed Task tool access panel with clear inherit/custom markers, MCP server/tool selectors, and a send-capable marker so operators can quickly spot which task is allowed to send. Saving a workflow preserves node-level built-in tool allowlists/denylists plus exact MCP server/tool choices.

The runtime also understands node MCP policy when computing concrete MCP allowlists and connector discovery behavior. Explicit node policies, including empty custom policies, are treated as intentional constraints. A regression test covers the Gmail approval case by allowing mcp.reddit_gmail.gmail_send_draft on the post-approval node while filtering out gmail_create_email_draft and gmail_send_email.

Channel-level MCP server toggles are now enforced as a hard runtime boundary. If an MCP server is disabled for a channel or conversation scope, agents do not receive tools from that connection, even when stale exact-tool preferences or a route-level allowlist still mention those tools. Exact MCP tool selections only apply while their owning server is enabled; selecting exact tools now narrows access rather than layering on top of a server wildcard. Channel defaults also avoid a broad * tool allowlist so MCP access must be explicitly granted.

The channel settings UI now mirrors that model. Disabling an MCP server clears exact-tool selections for that server on save, exact-tool pickers are visibly inactive until the server is enabled, and the summary counts only active exact MCP tools. Telegram, Discord, and Slack settings also expose a Strict KB grounding toggle so operators can intentionally opt a channel into factual-question KB grounding without confusing that behavior with MCP tool access.

Full Changelog: v0.5.4...v0.5.5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tandem v0.5.5

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

v0.5.5 (2026-05-13)

Uh oh!