Durable execution by anubra266 · Pull Request #2615 · inkeep/agents

anubra266 · 2026-03-10T21:28:26Z

No description provided.

…execution

…and error handling

pullfrog · 2026-03-10T21:28:30Z

Reviewing PR #2615 — Durable execution mode for agent runs with tool approvals and crash recovery. Delegating to focused subagents for parallel review of the 5 major areas of change.

^{｜ View workflow run ｜ Using Claude Code ｜ Triggered by Pullfrog ｜ pullfrog.com ｜ 𝕏}

vercel · 2026-03-10T21:28:31Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agents-api	Ready	Preview, Comment	Mar 16, 2026 6:49pm
agents-manage-ui	Ready	Preview, Comment	Mar 16, 2026 6:49pm

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
agents-docs	Skipped		Mar 16, 2026 6:49pm

changeset-bot · 2026-03-10T21:28:34Z

🦋 Changeset detected

Latest commit: 4ed6f5d

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages

Name	Type
@inkeep/agents-api	Patch
@inkeep/agents-manage-ui	Patch
@inkeep/agents-cli	Patch
@inkeep/agents-core	Patch
@inkeep/agents-email	Patch
@inkeep/agents-mcp	Patch
@inkeep/agents-sdk	Patch
@inkeep/agents-work-apps	Patch
@inkeep/ai-sdk-provider	Patch
@inkeep/create-agents	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

…execution

pullfrog · 2026-03-10T21:29:50Z

Adds a durable execution mode for agent runs, backed by a WDK (Workflow Development Kit) workflow engine, enabling crash recovery and persistent tool-approval loops that survive process restarts.

New executionMode column ('classic' | 'durable') on the agent table (manage-schema.ts) with migration 0013
New workflow_executions runtime table tracking run lifecycle (running → suspended → completed/failed) with migration 0022, plus data-access layer in packages/agents-core/src/data-access/runtime/workflowExecutions.ts
Core workflow definition in agents-api/src/domains/run/workflow/functions/agentExecution.ts — orchestrates an approval loop: run agent → if tool needs approval, suspend → wait on toolApprovalHook → resume with pre-approved decisions
Step implementations in agents-api/src/domains/run/workflow/steps/agentExecutionSteps.ts — each step ('use step') is individually retryable by WDK
New /api/executions route file (agents-api/src/domains/run/routes/executions.ts) with endpoints: create execution, get status, reconnect stream, and approve/deny tool calls
Durable-mode branches added to existing /v1/chat and /api/chat routes — when agent.executionMode === 'durable', starts the workflow and pipes the WDK readable stream to the response
WritableBackedHonoSSEStream and WritableBackedVercelWriter adapters (agents-api/src/domains/run/stream/durable-stream-helper.ts) bridge SSE/Vercel data streams to WDK writable streams
Tool approval flow extended (tool-approval.ts, tool-wrapper.ts) — in durable mode, returns { approved: 'pending' } instead of blocking, sets ctx.pendingDurableApproval, and uses originalToolCallId for stream continuity
PendingToolApprovalManager and stream registry moved to globalThis-backed maps so state is shared across WDK bundle and main app module boundaries
Bumped workflow and @workflow/* dependencies; added new @workflow/ai dependency
UI updates in agents-manage-ui for the new executionMode field in agent config forms and serialization
Durable execution timeout defaults added to execution-limits/defaults.ts (1h LLM generation, 30m MCP/approval, 10m function tool)

^{｜ View workflow run ｜ Using Claude Code ｜ Triggered by Pullfrog ｜ pullfrog.com ｜ 𝕏}

…g approvals and remove global state dependencies. Delete DurableApprovalRequiredError class as it is no longer needed.

claude

PR Review Summary

(8) Total Issues | Risk: High

🔴❗ Critical (3) ❗🔴

Inline Comments:

🔴 Critical: workflowExecutions.ts:62 Query returns oldest execution instead of newest due to missing DESC order
🔴 Critical: executions.ts:303 Missing tenant/project authorization check allows cross-tenant access
🔴 Critical: executions.ts:190 Middleware path /executions doesn't match child routes like /executions/:id

🟠⚠️ Major (3) 🟠⚠️

Inline Comments:

🟠 Major: agentExecution.ts:69 Tool approvals keyed by toolName causes collision when same tool called multiple times
🟠 Major: durable-stream-helper.ts:54-57 Silent .catch(() => {}) swallows write errors making debugging impossible
🟠 Major: executions.ts:328 Request body bypasses Zod validation, allowing malformed input

🟡 Minor (1) 🟡

Inline Comments:

🟡 Minor: agentExecutionSteps.ts:159 Stream errors caught but not communicated to client

💭 Consider (1) 💭

Inline Comments:

💭 Consider: DurableApprovalRequiredError.ts:1-13 Dead code — error class defined but never used

🧹 While You're Here (0) 🧹

None identified.

Additional Observations

Testing Gap: The new workflow code (agentExecution.ts, agentExecutionSteps.ts) has no direct unit tests. The existing generateTaskHandler.test.ts only mocks the new methods without exercising the actual workflow logic. Consider adding tests that cover:

The approval loop behavior (multiple rounds)
Stream namespace switching
Error handling paths
Status transitions (running → suspended → completed/failed)

Documentation: The API reference docs (tools.mdx) were updated with Slack endpoints, but there's no documentation for the new /api/executions endpoints. Customers will need guidance on:

When to use durable vs classic mode
How to reconnect to streams
Tool approval flow

Schema Migration: The migrations look correct, but verify the workflow_executions table index covers the query patterns in getWorkflowExecutionByConversation.

🚫 REQUEST CHANGES

Summary: This PR introduces significant new infrastructure for durable execution, but has three critical issues that must be addressed before merge: (1) the query ordering bug will return stale executions instead of the most recent, (2) the missing authorization check on reconnect enables cross-tenant data access, and (3) the middleware path matching means child routes don't have context validation. The tool approval keying by name instead of call ID is also a blocking issue that will cause incorrect behavior when the same tool is called multiple times. After these are fixed, the architecture looks solid.

Discarded (4)

Location	Issue	Reason Discarded
`chat.ts`	Header extraction duplicated	Intentional code movement to execute before branching on execution mode — not duplication
`PendingToolApprovalManager.ts`	globalThis pattern	Valid pattern for cross-module state sharing, matches stream-registry.ts
`runtime-schema.ts`	Missing foreign key constraint	Runtime tables intentionally avoid FK constraints for performance/flexibility
`agentExecution.ts`	Workflow ID string magic	Standard WDK pattern for workflow identification

Reviewers (10)

Reviewer	Returned	Consider	Inline Comments	Discarded
`pr-review-standards`	6	0	2	4
`pr-review-architecture`	4	0	1	3
`pr-review-sre`	3	0	2	1
`pr-review-security-iam`	2	0	1	1
`pr-review-breaking-changes`	2	0	0	2
`pr-review-tests`	3	0	0	3
`pr-review-types`	2	0	1	1
`pr-review-errors`	3	0	1	2
`pr-review-consistency`	2	0	0	2
`pr-review-precision`	2	1	0	1
Total	29	1	8	20

packages/agents-core/src/data-access/runtime/workflowExecutions.ts

agents-api/src/domains/run/routes/executions.ts

agents-api/src/domains/run/workflow/functions/agentExecution.ts

agents-api/src/domains/run/stream/durable-stream-helper.ts

agents-api/src/domains/run/routes/executions.ts

agents-api/src/domains/run/workflow/errors/DurableApprovalRequiredError.ts

itoqa · 2026-03-10T23:23:02Z

Ito Test Report ❌

16 test cases ran. 14 passed, 2 failed.

✅ Verification found broad durable-flow coverage passing across metadata persistence, stream reconnect behavior, approval resume, authz boundaries, and mobile usability. 🔍 Two failures are likely real product defects in the durable executions API: execution creation can fail before workflow start, and duplicate approval submissions are not handled idempotently.

✅ Passed (14)

Test Case	Summary	Timestamp
ROUTE-1	Agent settings showed exactly Classic and Durable options, and default selection was Classic on initial load.	0:00
ROUTE-2	Execution mode change to Durable persisted after save and hard reload without reverting.	0:00
ROUTE-6	Execution status endpoint returned HTTP 200 across repeated polls with consistent run id and required fields (id, status, conversationId, agentId, createdAt, updatedAt).	20:36
ROUTE-7	Reconnect requests returned HTTP 200 SSE for the same execution, with index 0 replaying earlier events and index 2 resuming at a later boundary without starting a new run.	20:36
ROUTE-8	Execution emitted durable-approval-required, approval POST returned success, and status transitioned to completed on the same run id.	45:15
EDGE-3	Timezone-only and timestamp-only forwarded-header requests both returned functional 200 streaming responses without server failure.	3:24
EDGE-4	Malformed timestamp and invalid timezone input were tolerated and stream contract remained available without 5xx response.	3:24
ADV-1	Unauthenticated execution creation request was denied (404 not_found for missing project context) and returned no execution metadata.	46:22
ADV-2	In-scope status request succeeded for the control run, while the same run queried with foreign tenant/project headers was denied (404) with no execution payload leakage.	47:47
ADV-3	Forged toolCallId request did not resume execution (status remained suspended), and only the real toolCallId transitioned the run to completed.	45:15
ADV-4	Concurrent approve and deny submissions completed without crash, and execution converged to a single deterministic completed terminal state.	45:15
ADV-5	Created durable run, interrupted/reconnected stream from start indexes 0 and 2 on the same runId, then approved pending tool call and confirmed deterministic terminal completion.	49:46
MOBILE-1	At 390x844 viewport, execution mode control remained usable and persisted after save and reload.	0:00
FLOW-1	Direct execution emitted durable-approval-required with toolCallId, approval succeeded, reconnect stream remained valid, and run reached completed.	45:15

❌ Failed (2)

Test Case	Summary	Timestamp	Screenshot
ROUTE-5	Execution creation returned HTTP 500 with backend insert failure, so no durable execution was created.	3:24
EDGE-5	Parallel duplicate approvals produced one internal error (Hook not found) and one success; execution still completed once, but duplicate-submit handling was not cleanly idempotent.	45:15

Create durable execution via dedicated endpoint – Failed

Where: POST /run/api/executions durable execution creation path
Steps to reproduce: Send a durable execution creation request with a new conversationId, then observe server response before any stream output begins.
What failed: The endpoint returns HTTP 500 during user-message persistence (insert into "messages"), instead of starting execution and returning a run id stream.
Code analysis: The execution route writes a message immediately, but unlike the chat routes it does not create or ensure the conversation record first. This can violate message-to-conversation persistence expectations and cause the observed insert failure.

Relevant code:

agents-api/src/domains/run/routes/executions.ts (lines 207-234)

const lastUserMessage = (body.messages as Message[])
  .filter((msg) => msg.role === 'user')
  .slice(-1)[0];

const messageParts = z
  .array(PartSchema)
  .parse(lastUserMessage ? getMessagePartsFromOpenAIContent(lastUserMessage.content) : []);

const userMessage = extractTextFromParts(messageParts);
const messageId = generateId();
const messageContent = await buildPersistedMessageContent(userMessage, messageParts, {
  tenantId,
  projectId,
  conversationId,
  messageId,
});

await createMessage(runDbClient)({
  id: messageId,
  tenantId,
  projectId,
  conversationId,

agents-api/src/domains/run/routes/chat.ts (lines 248-255)

await createOrGetConversation(runDbClient)({
  tenantId,
  projectId,
  id: conversationId,
  agentId: agentId,
  activeSubAgentId: defaultSubAgentId,
  ref: executionContext.resolvedRef,
});

Why this is likely a bug: The dedicated executions route skips conversation initialization that parallel chat flows perform, which directly explains the persistence-time 500 on message insert.
Introduced by this PR: Yes - this PR modified the relevant code.
Timestamp: 3:24

Duplicate approval submissions stay idempotent – Failed

Where: POST /run/api/executions/:executionId/approvals/:toolCallId duplicate approval handling
Steps to reproduce: Submit two approval requests in parallel for the same execution/toolCallId.
What failed: One request succeeds while the other returns HTTP 500 Hook not found, instead of both resolving as idempotent outcomes.
Code analysis: The approvals endpoint resumes the workflow hook directly without handling already-consumed/missing-hook cases. Under races, one resume consumes the hook and the second throws, surfacing as a server error.

Relevant code:

agents-api/src/domains/run/routes/executions.ts (lines 339-346)

const token = `tool-approval:${execution.conversationId}:${executionId}:${toolCallId}`;

await toolApprovalHook.resume(token, {
  approved,
  reason: approved ? undefined : reason,
});

return c.json({ success: true as boolean }, 200);

agents-api/src/domains/run/routes/chatDataStream.ts (lines 225-233)

const ok = approved
  ? pendingToolApprovalManager.approveToolCall(toolCallId)
  : pendingToolApprovalManager.denyToolCall(toolCallId, reason);
return { toolCallId, approved, alreadyProcessed: !ok };

return c.json({ success: true, results });

Why this is likely a bug: The durable approvals endpoint lacks race-safe/idempotent handling for duplicate submissions and leaks an internal hook state error as HTTP 500.
Introduced by this PR: Yes - this PR modified the relevant code.
Timestamp: 45:15

📋 View Recording

Screen Recording

itoqa · 2026-03-11T02:59:48Z

Ito Test Report ❌

6 test cases ran. 5 passed, 1 failed.

🔍 The durable execution rollout is mostly working for metadata persistence, /run/api/chat streaming, reconnect behavior, and stream-index edge handling. One production bug was confirmed in the OpenAI-compatible durable route: it streams SSE frames but returns the wrong response content type.

✅ Passed (5)

Test Case	Summary	Timestamp
ROUTE-1	Execution mode was set to Durable, saved, and rehydrated as Durable after reload in agent settings.	0:00
ROUTE-3	Route returned 200 SSE with both x-workflow-run-id and x-vercel-ai-data-stream: v2 headers and no JSON error payload envelope.	4:05
ROUTE-5	Both reconnect attempts returned 200 SSE responses including retry with x-stream-start-index and showed no 5xx or crash behavior.	4:05
EDGE-5	Reconnect calls with extreme and invalid stream indices remained stable without 5xx or crash; behavior was bounded and SSE contract persisted.	1:08:22
EDGE-6	At iPhone 13 viewport size, execution mode controls remained usable, mode changes saved successfully, and final Durable selection persisted after reload.	0:00

❌ Failed (1)

Test Case	Summary	Timestamp	Screenshot
ROUTE-2	Endpoint returned 200 with x-workflow-run-id and SSE-like frames, but content-type was text/plain instead of text/event-stream.	4:05

Durable branch on /run/v1/chat/completions emits workflow run ID – Failed

Where: Run API route POST /run/v1/chat/completions in durable mode.
Steps to reproduce: Enable durable execution mode for an agent, call /run/v1/chat/completions with streaming, inspect response headers.
What failed: The endpoint returns stream frames (data: ...) but responds with content-type: text/plain; charset=UTF-8 instead of text/event-stream.
Code analysis: I traced the durable branch implementation in the OpenAI-compatible chat route and compared it to the durable implementation in the Vercel chat-data route. The /v1/chat/completions durable branch streams bytes via stream(...) but does not set SSE headers, while /run/api/chat explicitly sets SSE headers before returning stream(...).

Relevant code:

agents-api/src/domains/run/routes/chat.ts (lines 414–445)

if (agent.executionMode === 'durable') {
  const run = await start(agentExecutionWorkflow, [
    {
      tenantId,
      projectId,
      agentId,
      conversationId,
      userMessage,
      messageParts: messageParts.length > 0 ? messageParts : undefined,
      requestId,
      resolvedRef: executionContext.resolvedRef,
    },
  ]);

  c.header('x-workflow-run-id', run.runId);

  return stream(c, async (s) => {

agents-api/src/domains/run/routes/chatDataStream.ts (lines 451–456)

c.header('x-workflow-run-id', run.runId);
c.header('content-type', 'text/event-stream');
c.header('cache-control', 'no-cache');
c.header('connection', 'keep-alive');
c.header('x-vercel-ai-data-stream', 'v2');
c.header('x-accel-buffering', 'no');

Why this is likely a bug: The durable /v1/chat/completions path emits SSE-formatted data but omits the SSE content-type header in production code, causing protocol/header contract mismatch.
Introduced by this PR: Yes – this PR modified the relevant code.
Timestamp: 4:05

📋 View Recording

Screen Recording

…and error handling

…g approvals and remove global state dependencies. Delete DurableApprovalRequiredError class as it is no longer needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…execution

claude

PR Review Summary

(6) Total Issues | Risk: High

🔴❗ Critical (1) ❗🔴

Inline Comments:

🔴 Critical: executions.ts:226 Missing conversation creation before message insert — will cause foreign key violation

🟠⚠️ Major (3) 🟠⚠️

Inline Comments:

🟠 Major: executions.ts:357 Tool approval resume lacks idempotency — duplicate approvals corrupt state
🟠 Major: chat.ts:433-435 Missing SSE headers for durable execution path — breaks streaming
🟠 Major: chatDataStream.ts:187-197 Batch approval Promise.all lacks error handling — one failure loses all

🟡 Minor (2) 🟡

Inline Comments:

🟡 Minor: executions.ts:331-333 Stream close errors not communicated to client
🟡 Minor: executions.ts:197 Request body bypasses Zod validation

💭 Consider (2) 💭

💭 1) tool-approval.ts:41-82 Queue-based approval consumption pattern

Issue: The tool approval system uses array queues keyed by toolName with shift() to consume approvals. This works correctly but relies on strict ordering guarantees.

Why: If approval order ever deviates from tool call order (e.g., parallel tool calls, out-of-order webhook delivery), the wrong approval could be applied to the wrong tool call.

Fix: Consider adding toolCallId matching as a secondary validation, or logging when queue consumption occurs for debugging.

Refs:

💭 2) durable-stream-helper.ts globalThis pattern for cross-module state

Issue: The globalThis.__durableStreamHelpers map is used to share stream state across module boundaries. While functional, this pattern can be fragile in certain bundler configurations.

Why: Some bundlers may create multiple module instances, breaking the singleton assumption. The current implementation handles this correctly with proper cleanup, but it's worth being aware of.

Fix: Consider adding a startup-time sanity check or logging when the registry is first accessed.

Refs:

durable-stream-helper.ts:9-18

🕐 Pending Recommendations (0)

No pending recommendations from prior reviews — all 6 previously raised issues have been addressed in this delta.

🚫 REQUEST CHANGES

Summary: This delta addresses the prior review feedback well (6/6 issues fixed), but introduces new critical issues that must be resolved before merge. The missing conversation creation (CRITICAL) will cause foreign key violations on first use. The SSE header omission and idempotency gap in tool approvals will cause production failures. These are straightforward fixes — see inline comments for suggested resolutions.

Discarded (8)

Location	Issue	Reason Discarded
`executions.ts`	Missing tenant isolation checks	Already handled by `inheritedRunApiKeyAuth()` middleware
`agentExecution.ts`	Unbounded while loop risk	Loop is bounded by workflow engine timeout and approval responses
`workflowExecutions.ts`	Missing transaction for multi-step updates	Single-row updates don't require transactions
`chat.ts`	Race condition in stream initialization	Stream is initialized before response starts
`durable-stream-helper.ts`	Memory leak in registry	Cleanup is handled in finally block
`executions.ts`	Missing rate limiting	Out of scope — rate limiting is handled at gateway level
`tool-approval.ts`	Type safety of approval payload	Validated by Zod schema at route level
`chatDataStream.ts`	Missing timeout for workflow hooks	Timeouts are handled by workflow engine

Reviewers (4)

Reviewer	Returned	Consider	Inline Comments	Discarded
`pr-review-standards`	9	1	4	4
`pr-review-sre`	5	1	1	3
`pr-review-errors`	4	0	1	3
`pr-review-consistency`	2	0	0	2
Total	20	2	6	12

Note: All Main findings were routed as inline comments due to their localized, single-file nature with clear fix suggestions.

agents-api/src/domains/run/routes/executions.ts

agents-api/src/domains/run/routes/chat.ts

agents-api/src/domains/run/routes/chatDataStream.ts

agents-api/src/domains/run/routes/executions.ts

itoqa · 2026-03-11T21:35:56Z

Ito Test Report ❌

12 test cases ran. 11 passed, 1 failed.

✅ The durable routing and security checks included in this report were largely successful under local verification, and one reproducible durable-execution defect was confirmed through source inspection in the /run/api/executions path.

✅ Passed (11)

Test Case	Summary	Timestamp
ROUTE-2	Durable chat route returned 200 with required stream headers and parseable SSE frames.	0:00
ROUTE-3	Execution creation succeeded after remediation and returned workflow stream headers/chunks.	0:00
ROUTE-4	Execution status returned required scoped fields and rejected a fake execution ID.	0:00
ROUTE-5	Reconnect endpoint streamed successfully for normal and offset requests.	0:00
LOGIC-1	Durable mode persisted across reload and runtime header behavior matched mode selection.	7:23
LOGIC-3	Header forwarding and malformed timezone/timestamp handling remained stable without transport failure.	9:44
ADV-1	Cross-scope probes to status/stream/approval endpoints were denied without data leakage.	9:44
ADV-2	Unauthorized and malformed durable API calls were denied; no unauthorized execution access succeeded.	10:59
ADV-3	Header injection attempts were handled without crash and without unsafe payload reflection.	9:44
MOBILE-1	Mobile execution mode control remained visible, usable, and persisted after reload.	7:23
RAPID-1	Rapid concurrent durable submissions completed without server crash.	0:00

❌ Failed (1)

Test Case	Summary	Timestamp	Screenshot
ROUTE-6	Approval flow could not reach suspended approval state because execution creation via `/run/api/executions` returned 500 before approval resume could occur.	2:35

Approve pending tool call via dedicated approvals endpoint – Failed

Where: Durable execution creation path (POST /run/api/executions) before approval suspension/resume handling.
Steps to reproduce: Call /run/api/executions with a new conversationId and user message for an approval-required flow; observe immediate 500 before approval state is reached.
What failed: The endpoint fails during message persistence instead of creating the execution and entering suspended approval state.
Code analysis: The executions route writes a user message directly but does not initialize the conversation record first. The standard chat route performs createOrGetConversation(...) before createMessage(...), which avoids this failure path.

Relevant code:

agents-api/src/domains/run/routes/executions.ts (lines 198-236)

const body = c.get('requestBody') || (await c.req.json());
const conversationId = body.conversationId || getConversationId();

const messageId = generateId();
const messageContent = await buildPersistedMessageContent(userMessage, messageParts, {
  tenantId,
  projectId,
  conversationId,
  messageId,
});

await createMessage(runDbClient)({
  id: messageId,
  tenantId,
  projectId,
  conversationId,
  role: 'user',
  content: messageContent,
  visibility: 'user-facing',
  messageType: 'chat',
});

agents-api/src/domains/run/routes/chat.ts (lines 248-256)

await createOrGetConversation(runDbClient)({
  tenantId,
  projectId,
  id: conversationId,
  agentId: agentId,
  activeSubAgentId: defaultSubAgentId,
  ref: executionContext.resolvedRef,
  userId: executionContext.metadata?.endUserId,
});

Why this is likely a bug: The durable executions route omits required conversation initialization before inserting a message, which plausibly causes the observed server-side insert failure and blocks approval workflow startup.
Introduced by this PR: Yes - this PR modified the relevant code.
Timestamp: 2:35

📋 View Recording

Screen Recording

claude

PR Review Summary

(3) Total Issues | Risk: Medium

Re-review scope: This review covers only the delta changes since commit 6d4e03ec20b9. All 6 issues from prior reviews have been verified as fixed ✅

🟠⚠️ Major (3) 🟠⚠️

Inline Comments:

🟠 Major: chat.ts:449 Stream errors not communicated to client — durable execution errors logged but no SSE error event written
🟠 Major: chatDataStream.ts:479 Same issue — durable stream errors silent for Vercel AI SDK clients
🟠 Major: chatDataStream.ts:217-225 Missing error handling for approval stream read loop — no try/catch around run.getReadable()

🟡 Minor (0) 🟡

None.

💭 Consider (1) 💭

💭 1) chatDataStream.ts:188-206 Promise.allSettled results not inspected

Issue: The change from Promise.all to Promise.allSettled was made to handle batch approvals gracefully, but the settled results are never inspected. If an approval fails with an unexpected error (not "not found" or "already"), it's silently discarded.

Why: Debugging approval failures 6 months later would require correlating workflow execution logs with the approval request, with no indication in the API response that anything went wrong.

Fix: Consider logging or returning failures:

const results = await Promise.allSettled(...);
const failures = results
  .filter((r): r is PromiseRejectedResult => r.status === 'rejected')
  .filter(r => !r.reason?.message?.includes('not found') && !r.reason?.message?.includes('already'));

if (failures.length > 0) {
  logger.warn({ failureCount: failures.length, conversationId }, 'Unexpected approval failures');
}

Refs:

executions.ts:372-386 — similar idempotent handling with logging

🧹 While You're Here (0) 🧹

None identified.

Prior Feedback Status ✅

All issues from the previous 2 reviews have been addressed in this delta:

Prior Issue	Status	Fix Location
Missing conversation creation before message insert	✅ Fixed	`executions.ts:227-238`
Missing SSE headers for durable execution	✅ Fixed	`chat.ts:434-437`, `executions.ts:269-272`
Non-idempotent tool approval resume	✅ Fixed	`executions.ts:372-386`
Batch approval Promise.all failure handling	✅ Fixed	`chatDataStream.ts:188-206`
HTTPException guards and error.message leaks	✅ Fixed	`chat.ts:576-589`, `chatDataStream.ts:648-663`
Stream error event writing	✅ Fixed	`executions.ts:350`

🚫 REQUEST CHANGES

Summary: The delta successfully addresses all 6 prior review issues — great work on the fixes! However, the same error-communication pattern that was correctly applied to executions.ts wasn't consistently applied to the durable execution streams in chat.ts and chatDataStream.ts. Clients need to receive SSE error events when streams fail, otherwise they cannot distinguish between successful completion and errors. These are straightforward additions following the established pattern.

Discarded (3)

Location	Issue	Reason Discarded
`durable-stream-helper.ts:57-61`	Synchronous write swallows errors	Fire-and-forget semantics are intentional for streaming; warning is now logged (addressed in prior delta)
`defaults.ts:166-170`	Durable timeout constants unused	Out of scope for this PR — timeout configuration is a feature enhancement, not a bug
`Promise.allSettled` results	Not inspected for unexpected errors	Routed to Consider — valid improvement but not blocking

Reviewers (4)

Reviewer	Returned	Consider	Inline Comments	Discarded
`pr-review-standards`	3	0	3	0
`pr-review-errors`	3	1	0	2
`pr-review-sre`	5	0	0	5
`pr-review-consistency`	2	0	0	2
Total	13	1	3	9

Note: All 3 Main findings were routed as inline comments due to their localized, single-file nature with clear fix patterns.

agents-api/src/domains/run/routes/chat.ts

agents-api/src/domains/run/routes/chatDataStream.ts

claude

PR Review Summary

(0) Total Issues | Risk: Low

Re-review scope: This review covers only the delta changes since commit 2476a89a00a5 — 1 commit (cb7374912) affecting 2 files.

Prior Feedback Status ✅

All 3 Major issues from the previous review have been fixed in this delta:

Prior Issue	Status	Fix Location
🟠 Stream errors not communicated to client (`chat.ts:449`)	✅ Fixed	`chat.ts:449` — SSE error event now written
🟠 Same issue for Vercel AI SDK clients (`chatDataStream.ts:479`)	✅ Fixed	`chatDataStream.ts:487` — SSE error event now written
🟠 Missing error handling for approval stream read loop (`chatDataStream.ts:217-225`)	✅ Fixed	`chatDataStream.ts:217-232` — try/catch added with SSE error event

Delta Analysis

The commit cb7374912 (fix: add SSE error events and stream error handling for durable paths) addresses all feedback correctly:

chat.ts:449 — Added await s.write(\event: error\ndata: ${JSON.stringify({ error: 'Stream error' })}\n\n`)` in the catch block
chatDataStream.ts:217-232 — Wrapped the approval continuation stream read loop in try/catch with SSE error event
chatDataStream.ts:487 — Added SSE error event in the durable /chat stream error path

All three patterns now match the established error handling pattern from executions.ts:283-285.

🔴❗ Critical (0) ❗🔴

None.

🟠⚠️ Major (0) 🟠⚠️

None.

🟡 Minor (0) 🟡

None.

💭 Consider (0) 💭

None.

🧹 While You're Here (0) 🧹

None identified.

✅ APPROVE

Summary: All 3 Major issues from the prior review have been addressed. The SSE error event pattern is now consistently applied across all durable execution stream paths — chat.ts, chatDataStream.ts (durable mode), and chatDataStream.ts (approval continuation). Clients will now receive event: error notifications when streams fail, enabling proper error handling and debugging. Ship it! 🚀

Discarded (0)

No findings discarded.

Reviewers (0)

Reviewer	Returned	Main Findings	Consider	While You're Here	Inline Comments	Pending Recs	Discarded
Total	0	0	0	0	0	0	0

Note: Subagent reviewers were not dispatched for this delta re-review as the changes are limited to error handling additions that directly address prior feedback. The delta was reviewed directly by the orchestrator.

itoqa · 2026-03-12T02:48:42Z

Ito Test Report ❌

18 test cases ran. 12 passed, 6 failed.

✅ Durable-mode happy paths, classic rollback behavior, and responsive UI interactions were stable in the included passing coverage. 🔍 Code-first verification found six likely product defects in approval-response handling, durable execution concurrency/idempotency, and stream reconnect index validation.

✅ Passed (12)

Test Case	Summary	Timestamp
ROUTE-1	Execution mode persisted as Durable after refresh, and run API key creation succeeded.	0:00
ROUTE-2	Durable execution endpoint returned SSE with a workflow run ID.	3:02
ROUTE-3	Execution status endpoint returned expected durable execution schema.	3:02
ROUTE-4	Reconnect stream resumed successfully with valid SSE framing.	3:02
ROUTE-5	`/run/v1/chat/completions` used durable streaming with workflow header.	3:02
ROUTE-6	`/run/api/chat` returned Vercel stream headers with durable workflow ID.	3:02
ROUTE-7	Concurrent durable runs stayed isolated with distinct workflow mappings.	3:02
EDGE-3	Timezone/timestamp header pair combinations were handled without stream breakage.	12:57
EDGE-4	Missing user-role text input was handled without server crash.	12:57
EDGE-5	Classic mode rollback removed durable run header as expected.	20:17
ADV-6	Mobile viewport execution mode workflow stayed usable and persisted after reload.	20:17
ADV-7	Rapid toggle/save ended with the final intended execution mode persisted.	20:17

❌ Failed (6)

Test Case	Summary	Timestamp
EDGE-1	Approval response without `conversationId` started execution (200 SSE) instead of returning validation error.	12:57
EDGE-2	Approval response with unknown `conversationId` was accepted and executed (200 SSE).	12:57
ADV-1	Rapid duplicate durable execution creation produced a 500 internal server error.	23:32
ADV-2	Duplicate approval submissions returned success even when execution state was already failed.	23:32
ADV-3	Forged/mismatched approval attempts returned success for existing execution IDs.	23:32
ADV-4	Reconnect stream accepted invalid start-index probes without input validation, leading to instability.	12:57

Approval response without conversationId – Failed

Where: POST /run/api/chat approval-response fast path
Steps to reproduce: Submit a tool-approval response payload in messages[].content without conversationId.
What failed: Request was treated as normal chat execution and started a durable run instead of being rejected with a validation error.
Code analysis: The fast-path detector only scans messages[].parts for approval responses. Payloads that encode approval responses in messages[].content bypass validation and proceed through normal execution creation.

Relevant code:

agents-api/src/domains/run/routes/chatDataStream.ts (lines 147-149)

const approvalParts = (body.messages || [])
  .flatMap((m: any) => m?.parts || [])
  .filter((p: any) => p?.state === 'approval-responded' && typeof p?.toolCallId === 'string');

agents-api/src/domains/run/routes/chatDataStream.ts (lines 156-165)

if (isApprovalResponse) {
  const conversationId = body.conversationId;
  if (!conversationId) {
    return c.json(
      {
        success: false,
        error: 'conversationId is required for approval response',
      },
      400
    );
  }
}

Why this is likely a bug: Approval responses encoded in supported message content shapes should not bypass approval validation and trigger a new execution path.
Introduced by this PR: Yes – this PR modified the approval fast-path logic in chatDataStream.
Timestamp: 12:57

Approval response for unknown conversation – Failed

Where: POST /run/api/chat approval-response handling for unknown conversation IDs
Steps to reproduce: Submit approval-response content with a nonexistent conversationId.
What failed: Instead of rejecting as unknown conversation, the request reached execution flow and returned a successful stream response.
Code analysis: The same approval detection gap (only scanning parts) allows approval-response content in other fields to bypass the getConversation(...) existence check and durable approval path.

Relevant code:

agents-api/src/domains/run/routes/chatDataStream.ts (lines 168-176)

const conversation = await getConversation(runDbClient)({
  scopes: { tenantId, projectId },
  conversationId,
});

if (!conversation) {
  return c.json({ success: false, error: 'Conversation not found' }, 404);
}

agents-api/src/domains/run/routes/chatDataStream.ts (lines 147-149)

const approvalParts = (body.messages || [])
  .flatMap((m: any) => m?.parts || [])
  .filter((p: any) => p?.state === 'approval-responded' && typeof p?.toolCallId === 'string');

Why this is likely a bug: Unknown-conversation approval requests are intended to be rejected, but valid-looking approval payload shapes can bypass that guard.
Introduced by this PR: Yes – this PR added the new approval-response fast path and its current parsing logic.
Timestamp: 12:57

Rapid double-submit create durable execution – Failed

Where: POST /run/api/executions conversation creation path under concurrent duplicate submits
Steps to reproduce: Send multiple concurrent create requests with the same conversationId.
What failed: One request returned HTTP 500 due to a failing conversation insert while other concurrent requests succeeded.
Code analysis: Conversation creation uses a read-then-insert pattern without atomic upsert/conflict handling. Under concurrent calls, two requests can both miss the existing row and then race into duplicate insert attempts.

Relevant code:

packages/agents-core/src/data-access/runtime/conversations.ts (lines 166-184)

if (input.id) {
  const existing = await db.query.conversations.findFirst({
    where: and(eq(conversations.tenantId, input.tenantId), eq(conversations.id, input.id)),
  });

  if (existing) {
    if (existing.activeSubAgentId !== input.activeSubAgentId) {
      await db
        .update(conversations)
        .set({
          activeSubAgentId: input.activeSubAgentId,
          updatedAt: new Date().toISOString(),
        })

packages/agents-core/src/data-access/runtime/conversations.ts (lines 202-203)

await db.insert(conversations).values(newConversation);
return newConversation;

agents-api/src/domains/run/routes/executions.ts (lines 231-238)

await createOrGetConversation(runDbClient)({
  tenantId,
  projectId,
  id: conversationId,
  agentId,
  activeSubAgentId,
  ref: resolvedRef,
});

Why this is likely a bug: The non-atomic create-or-get implementation is vulnerable to concurrent duplicate inserts, which directly explains the observed 500 on retry storms.
Introduced by this PR: Yes – this PR introduced the new durable execution entrypoint that triggers this race-prone path concurrently.
Timestamp: 23:32

Idempotency of duplicate approval submissions – Failed

Where: POST /run/api/executions/:executionId/approvals/:toolCallId
Steps to reproduce: Submit duplicate approvals for an execution that is already terminal or no longer has that pending tool call.
What failed: Endpoint returned {"success":true} repeatedly without confirming execution state or pending approval validity.
Code analysis: The handler validates only execution existence, then suppresses hook not found/already errors and still returns success. There is no explicit guard for non-suspended execution states or absent pending approvals.

Relevant code:

agents-api/src/domains/run/routes/executions.ts (lines 361-368)

const execution = await getWorkflowExecution(runDbClient)({
  tenantId,
  projectId,
  id: executionId,
});
if (!execution) {
  throw createApiError({ code: 'not_found', message: 'Execution not found' });
}

agents-api/src/domains/run/routes/executions.ts (lines 377-385)

} catch (error) {
  const message = error instanceof Error ? error.message : String(error);
  if (!message.includes('not found') && !message.includes('already')) {
    throw error;
  }
  logger.warn(
    { executionId, toolCallId, message },
    'Tool approval already processed (idempotent)'
  );
}

Why this is likely a bug: Returning unconditional success for non-actionable approval submissions hides invalid state transitions and breaks approval-path contract guarantees.
Introduced by this PR: Yes – this PR added the approvals route and current permissive error handling.
Timestamp: 23:32

Forged approval with mismatched IDs – Failed

Where: POST /run/api/executions/:executionId/approvals/:toolCallId forged binding checks
Steps to reproduce: Call approvals endpoint with an existing execution ID but a mismatched/nonexistent tool call ID.
What failed: Requests returned success for invalid approval bindings instead of explicit rejection.
Code analysis: Token construction trusts URL params, and hook resume failures containing not found/already are swallowed, causing successful response even when no valid approval binding exists.

Relevant code:

agents-api/src/domains/run/routes/executions.ts (lines 370-376)

const token = `tool-approval:${execution.conversationId}:${executionId}:${toolCallId}`;

try {
  await toolApprovalHook.resume(token, {
    approved,
    reason: approved ? undefined : reason,
  });

agents-api/src/domains/run/routes/executions.ts (lines 377-385)

} catch (error) {
  const message = error instanceof Error ? error.message : String(error);
  if (!message.includes('not found') && !message.includes('already')) {
    throw error;
  }
  logger.warn(
    { executionId, toolCallId, message },
    'Tool approval already processed (idempotent)'
  );
}

Why this is likely a bug: Invalid execution/tool-call bindings should be rejected; silent success allows forged approval attempts to appear accepted.
Introduced by this PR: Yes – this PR introduced this durable approvals endpoint behavior.
Timestamp: 23:32

Reconnect probing with invalid start index values – Failed

Where: GET /run/api/executions/:executionId/stream reconnect start index parsing
Steps to reproduce: Send invalid x-stream-start-index values (NaN, negative, very large) and then retry valid reconnect.
What failed: Invalid probes were not rejected up front; route attempted reconnect with unchecked index values, matching observed reconnect instability under probing.
Code analysis: Header is parsed with parseInt but never validated for finite, non-negative, bounded values before passing into run.getReadable({ startIndex }).

Relevant code:

agents-api/src/domains/run/routes/executions.ts (lines 321-322)

const startIndexHeader = c.req.header('x-stream-start-index');
const startIndex = startIndexHeader ? Number.parseInt(startIndexHeader, 10) : 0;

agents-api/src/domains/run/routes/executions.ts (lines 339-342)

return stream(c, async (s) => {
  try {
    const readable = run.getReadable({ startIndex });
    const reader = readable.getReader();

Why this is likely a bug: Missing defensive validation on reconnect cursor input can push invalid values into stream internals and destabilize reconnect behavior.
Introduced by this PR: Yes – this PR added the new reconnect endpoint and current start-index handling.
Timestamp: 12:57

📋 View Recording

Screen Recording

…hat and execution routes

pullfrog · 2026-03-16T18:50:55Z

TL;DR — Adds a new "durable" execution mode for agents, backed by the WDK (Workflow Development Kit). When an agent is configured with executionMode: 'durable', its runs are orchestrated as persistent, crash-recoverable workflows where tool approvals suspend execution to disk — allowing clients to disconnect and reconnect without losing state. A new workflow_executions runtime table, new /api/executions REST routes, and a UI dropdown for selecting execution mode are introduced alongside the existing "classic" in-memory path.

Key changes

executionMode agent column + schema — Adds execution_mode varchar column to the manage agent table (defaulting to 'classic'), with corresponding Zod validation, OpenAPI spec, DAL, and UI serialization updates.
workflow_executions runtime table — New runtime table tracking durable execution state (running → suspended → completed/failed) with CRUD data-access functions.
agentExecutionWorkflow + agentExecutionSteps — New WDK workflow function that orchestrates LLM generation → tool approval suspension → tool execution → completion as discrete durable steps, with a defineHook-based toolApprovalHook for external resume.
/api/executions route module — New REST endpoints: POST /executions (create), GET /executions/:id (status), GET /executions/:id/stream (reconnect SSE), POST /executions/:id/approvals/:toolCallId (approve/deny).
Durable branches in chat.ts and chatDataStream.ts — Existing chat routes detect executionMode === 'durable' and delegate to workflow/api.start() instead of the classic in-memory path; the data stream route handles suspended-execution resume via toolApprovalHook.resume().
Durable tool approval flow in tool-approval.ts — When durableWorkflowRunId is set, tool approval short-circuits: pre-approved decisions are consumed from ctx.approvedToolCalls, otherwise pendingDurableApproval is set and { approved: 'pending' } is returned to halt the AI SDK loop.
tool-wrapper.ts effectiveToolCallId mapping — Pre-approved tool calls reuse the originalToolCallId from a prior round so the stream shows consistent tool call IDs across approval boundaries.
WritableBackedHonoSSEStream / WritableBackedVercelWriter — Durable stream adapters that write SSE events into a WDK WritableStream so events are persisted and clients can getReadable() to reconnect.
stream-registry.ts globalThis singleton — Registry now uses globalThis instead of module-scoped Map so the WDK workflow bundle and main app share the same stream helper instances.
SSEStreamHelper.writeDone() removal — The [DONE] sentinel is no longer written by complete(); the stream simply closes after the finish-reason event.
generate.ts schemaOnlyTools option — New option strips execute from tool definitions, giving the LLM tool schemas without actually executing — used by the durable path to get tool-call decisions without side effects.
Durable execution timeout constants — defaults.ts adds four constants (LLM 60 min, MCP tool 30 min, function tool 10 min, tool approval 30 min).
build-workflow.ts ?raw import inlining — Pre-processes Vite ?raw imports into string literals before the WDK esbuild build (and restores after), working around esbuild's lack of Vite query suffix support.
Manage UI execution mode selector — Adds a <Select> dropdown ("Classic" / "Durable") to the agent metadata editor sidebar.

_{Summary ｜ 48 files ｜ base: main ← durable-execution}

`agentExecutionWorkflow` — durable execution orchestrator

Before: All agent execution ran in-memory within a single HTTP request; a crash or disconnect lost all state.
After: Agents configured with executionMode: 'durable' run inside a WDK workflow with discrete steps — each LLM call, tool execution, and status transition is persisted and recoverable.

The workflow loop (_agentExecutionWorkflow) iterates through LLM calls and tool approvals: initializeTaskStep → callLlmStep → (if tool needs approval) markWorkflowSuspendedStep → toolApprovalHook.create() (suspends) → external resume() → markWorkflowResumingStep → executeToolStep → loop. Each function uses 'use step' / 'use workflow' directives for WDK durability. On completion or failure, dedicated steps update the workflow_executions row.

buildAgentForStep reconstructs the full Agent instance from project/agent config on each step — necessary because durable steps may execute on different processes after recovery.

How does the tool approval hook work?

toolApprovalHook is a WDK defineHook with a deterministic token format: tool-approval:${conversationId}:${workflowRunId}:${toolCallId}. When the workflow reaches a tool needing approval, it creates a hook and awaits it — the WDK suspends the workflow to disk. External callers (the /executions/:id/approvals/:toolCallId endpoint or the chatDataStream approval handler) call toolApprovalHook.resume(token, { approved, reason }) to wake the workflow.

agentExecution.ts · agentExecutionSteps.ts · executionHandler.ts

`/api/executions` routes and durable chat integration

Before: No dedicated execution management API; tool approvals were purely in-memory via PendingToolApprovalManager.
After: Four new endpoints manage durable executions end-to-end, and both chat.ts and chatDataStream.ts route durable agents through the WDK.

Endpoint	Purpose
`POST /api/executions`	Start a durable execution, returns SSE stream + `x-workflow-run-id` header
`GET /api/executions/:id`	Get execution status (`running`/`suspended`/`completed`/`failed`)
`GET /api/executions/:id/stream`	Reconnect to an existing execution's SSE stream (supports `x-stream-start-index`)
`POST /api/executions/:id/approvals/:toolCallId`	Approve or deny a suspended tool call

The existing /v1/chat/completions and /api/chat routes check agent.executionMode and, when 'durable', call workflow/api.start(agentExecutionWorkflow, [...]) instead of running the classic ExecutionHandler. The data stream handler also detects suspended durable executions on the conversation and routes approval parts through toolApprovalHook.resume().

executions.ts · chat.ts · chatDataStream.ts

Durable tool approval flow

Before: Tool approval always waited in-memory via PendingToolApprovalManager, blocking the HTTP connection.
After: When durableWorkflowRunId is set, approval decisions are either consumed from pre-approved ctx.approvedToolCalls (replaying prior decisions) or the flow returns { approved: 'pending' }, setting ctx.pendingDurableApproval to signal the caller to suspend the workflow.

The parseAndCheckApproval return type gains a pendingApproval?: true flag. Both function-tools.ts and mcp-tools.ts check for it and return null (halting execution). tool-wrapper.ts introduces effectiveToolCallId — when a pre-approved entry carries an originalToolCallId, stream events use that ID to maintain consistency with the original approval round.

tool-approval.ts · tool-wrapper.ts · generate.ts

`workflow_executions` table and `executionMode` schema

Before: No persistent tracking of workflow execution state; no executionMode property on agents.
After: A new workflow_executions runtime table with (tenant_id, project_id, id) PK and a conversation index tracks each durable run's lifecycle. The manage agent table gains an execution_mode column defaulting to 'classic'.

The runtime migration (0022/0023) creates the table; the manage migration (0013) adds the column. Zod schemas (WorkflowExecutionSelectSchema, WorkflowExecutionInsertSchema, WorkflowExecutionUpdateSchema) and entity types are added. The AgentInsertSchema gains executionMode: z.enum(['classic', 'durable']).optional().

runtime-schema.ts · manage-schema.ts · workflowExecutions.ts · schemas.ts

Durable stream adapters and `globalThis` registry

Before: Stream helpers were module-scoped Map singletons; no adapter existed for writing SSE events to a WritableStream.
After: WritableBackedHonoSSEStream and WritableBackedVercelWriter bridge the SSE / Vercel data stream formats to WDK's WritableStream<Uint8Array>, and the stream registry uses globalThis so the WDK bundle and main app share one instance.

The SSEStreamHelper.writeDone() method is removed — complete() no longer emits the [DONE] sentinel, as the WDK stream closing signals end-of-stream.

durable-stream-helper.ts · stream-registry.ts · stream-helpers.ts

^{｜ View workflow run ｜ Using Claude Code ｜ Triggered by Pullfrog ｜ pullfrog.com ｜ 𝕏}

claude

PR Review Summary

(3) Total Issues | Risk: Medium

Re-review scope: This review covers only the delta changes since commit cb73749129ee — 3 commits (ab733eb4a, fa68e4d14, 4ed6f5df5) affecting 10 files. The prior review was an APPROVE after all issues were fixed.

🔴❗ Critical (0) ❗🔴

None.

🟠⚠️ Major (2) 🟠⚠️

Inline Comments:

🟠 Major: agentExecutionSteps.ts:789 Stream always releaseLock() but never close() in executeToolStep — potential stream lifecycle inconsistency
🟠 Major: build-workflow.ts:54-61 File restoration can fail partway through, leaving inconsistent state

🟡 Minor (1) 🟡

Inline Comments:

🟡 Minor: agentExecutionSteps.ts:134-148 Silent catch blocks suppress configuration errors

💭 Consider (1) 💭

Inline Comments:

💭 Consider: agentExecutionSteps.ts:381 Test coverage for new step functions (~500 LOC with no unit tests)

🧹 While You're Here (0) 🧹

None identified.

Delta Analysis

The delta introduces a significant architectural refactor of the workflow execution model:

Before (monolithic step):

runAgentExecutionStep → returns needs_approval or completed

After (granular steps):

initializeTaskStep → creates task, resolves subagent
callLlmStep → LLM generation, returns completion/transfer/tool_calls
executeToolStep → executes single tool after approval

Key observations:

Per-call approval loop: The workflow now iterates per-tool-call within llmResult.toolCalls (lines 70-102 in agentExecution.ts) instead of per-turn. This enables finer-grained crash recovery.
State reconstruction: buildAgentForStep() reconstructs the full Agent on each step invocation. This is necessary for WDK but adds latency (5+ DB queries per step). This is an acceptable tradeoff for durability.
Stream lifecycle: The conditional close() vs releaseLock() pattern in callLlmStep is sound, but executeToolStep always uses releaseLock(). Verify this doesn't cause client issues when workflow completes after tool execution.
Removed [DONE] message: The writeDone() call was removed from SSEStreamHelper.complete(). This is intentional per commit fa68e4d14 but may affect clients that specifically watch for data: [DONE].

Prior Feedback Status ✅

All issues from the previous 4 reviews have been addressed. The prior review at commit cb7374912 was an APPROVE. This delta introduces new Major issues that should be addressed.

💡 APPROVE WITH SUGGESTIONS

Summary: The per-call step architecture is a sound design choice for durable execution with finer-grained crash recovery. The two Major issues identified are straightforward to address: (1) verify the stream lifecycle in executeToolStep doesn't cause client issues, and (2) make the build script's file restoration more robust. Neither blocks the PR, but both should be addressed before production rollout of durable execution to production workloads. The test coverage gap is noted but not blocking given this is a new feature that likely has integration test coverage planned.

Discarded (5)

Location	Issue	Reason Discarded
`agentExecution.ts:90-101`	`executeToolStep` return value ignored	The `needs_approval` return path appears unreachable in the current architecture — tool execution after approval shouldn't trigger another approval. This is defensive code or future-proofing, not a bug.
`agentExecutionSteps.ts:172-239`	Agent reconstruction adds latency	Intentional tradeoff for WDK step isolation and crash recovery. Not a defect.
`agentExecutionSteps.ts:458`	Session created before try block	The session creation is unlikely to fail, and the stream helper is unregistered in finally block. Minor resource leak risk on error path but not actionable.
`stream-helpers.ts:380-387`	`[DONE]` removal	Intentional per commit message. Clients should watch for `finish_reason: 'stop'` not `[DONE]`.
`build-workflow.ts:32`	Regex only matches single quotes	Current codebase only uses single quotes for imports. The regex is correct for actual usage.

Reviewers (5)

Reviewer	Returned	Consider	Inline Comments	Discarded
`pr-review-standards`	1	0	0	1
`pr-review-architecture`	3	0	0	3
`pr-review-sre`	4	0	2	2
`pr-review-errors`	2	0	1	1
`pr-review-tests`	5	1	0	4
Total	15	1	3	11

Note: Main findings were all routed as inline comments due to their localized, single-file nature with clear fix suggestions.

claude · 2026-03-16T18:55:57Z

agents-api/src/domains/run/workflow/steps/agentExecutionSteps.ts

+    await agentSessionManager.endSession(requestId);
+    unregisterStreamHelper(requestId);
+    await agent.cleanup();
+    closeable.releaseLock();


🟠 MAJOR: Stream always releaseLock() but never close() in executeToolStep

Issue: Unlike callLlmStep which conditionally calls close() for terminal states vs releaseLock() for non-terminal states (lines 640-645), executeToolStep always calls releaseLock() regardless of outcome. This asymmetry could leave streams in an inconsistent state.

Why: If tool execution is the final step before workflow completion (after the for-loop in agentExecution.ts:70-102), the stream should be closed properly. The current code releases the lock but doesn't signal stream end, potentially causing clients to wait indefinitely for more data.

Fix: Consider whether executeToolStep should distinguish terminal vs non-terminal outcomes. Looking at the workflow, executeToolStep is followed by either more tool executions (continue the for-loop) or a callLlmStep (next iteration), so releaseLock() may be correct. However, verify this doesn't cause issues when the workflow completes after the last tool execution.

Refs:

callLlmStep terminal handling

claude · 2026-03-16T18:56:10Z

agents-api/scripts/build-workflow.ts

+  try {
+    await fn();
+  } finally {
+    for (const [filePath, original] of originals) {
+      await writeFile(filePath, original, 'utf-8');
+    }
+  }
+}


🟠 MAJOR: File restoration can fail partway through, leaving inconsistent state

Issue: If writeFile throws during restoration (disk full, permissions, file lock), the loop stops immediately. Files before the failure are restored, files after remain patched. The error is thrown but the partial restoration creates an inconsistent file system state that's hard to diagnose.

Why: A developer or CI run won't know which files are corrupted until TypeScript compilation fails on a subsequent run. git status will show unexpected changes to multiple files, obscuring the actual cause (restoration failure).

Fix: Consider collecting errors and restoring all files before reporting:

finally { const errors: Array<{ path: string; error: unknown }> = []; for (const [filePath, original] of originals) { try { await writeFile(filePath, original, 'utf-8'); } catch (error) { errors.push({ path: filePath, error }); } } if (errors.length > 0) { console.error('Failed to restore files:', errors.map(e => e.path)); // Optionally throw after all restoration attempts } }

Refs:

Node.js fs.writeFile error handling

claude · 2026-03-16T18:56:21Z

agents-api/src/domains/run/workflow/steps/agentExecutionSteps.ts

+  const enhancedInternalRelations = allInternalRelations.map((relation) => {
+    try {
+      return enhanceInternalRelation({ relation, agent: agentEntry, project });
+    } catch {
+      return relation;
+    }
+  });
+
+  const enhancedTeamRelations = teamRelations.map((relation) => {
+    try {
+      return enhanceTeamRelation({ relation, project });
+    } catch {
+      return relation;
+    }
+  });


🟡 Minor: Silent catch blocks suppress potentially important configuration errors

Issue: The catch {} blocks at lines 137 and 145 silently swallow errors from enhanceInternalRelation and enhanceTeamRelation, falling back to the original relation with no logging.

Why: If relation enhancement fails due to misconfigured agents or missing data, the agent silently uses degraded relation data. Debugging "why is my agent not using the right description" becomes difficult with no indication that enhancement failed.

Fix: Add debug-level logging when enhancement fails:

const enhancedInternalRelations = allInternalRelations.map((relation) => { try { return enhanceInternalRelation({ relation, agent: agentEntry, project }); } catch (error) { logger.debug( { relationId: relation.id, error: error instanceof Error ? error.message : error }, 'Failed to enhance internal relation, using original' ); return relation; } });

Refs:

api-logging-guidelines skill

claude · 2026-03-16T18:56:32Z

agents-api/src/domains/run/workflow/steps/agentExecutionSteps.ts

+  return { taskId, defaultSubAgentId, maxTransfers };
+}
+
+export async function callLlmStep(params: CallLlmStepParams): Promise<CallLlmResult> {


💭 Consider: Test coverage for new step functions

Issue: The three new step functions (initializeTaskStep, callLlmStep, executeToolStep) have ~500 lines of complex logic with multiple code paths but no direct unit tests.

Why: Each function has distinct outcomes:

initializeTaskStep: handles unique constraint errors for idempotency

callLlmStep: returns completion, transfer, or tool_calls

executeToolStep: returns completed or needs_approval

A bug in any of these paths could cause incorrect workflow behavior, and without tests, regressions won't be caught until production.

Fix: Consider adding unit tests that mock buildAgentForStep, agent.generate(), and database calls to verify each outcome path. At minimum, integration tests covering the durable approval flow would provide confidence.

Refs:

Testing patterns in AGENTS.md

itoqa · 2026-03-16T20:37:39Z

Ito Test Report ✅

9 test cases ran. 9 passed.

✅ The verified scenarios show durable headers, execution-mode UI persistence, and header-hardening behavior working as expected in this run. Code review of non-passed outcomes did not provide sufficient production-code evidence to elevate them as confirmed defects under the code-first bar.

✅ Passed (9)

Test Case	Summary	Timestamp
ROUTE-2	Execution status endpoint returned HTTP 200 with required schema fields and a valid lifecycle enum value.	0:00
ROUTE-5	After adding required model field, /run/v1/chat/completions returned durable SSE headers with a non-empty workflow run id and no fallback-path error response.	10:18
SCREEN-1	Execution mode section and interactive Mode selector are present in agent metadata pane.	15:57
SCREEN-2	Mode changes persisted after repeated save and reload cycles in both directions.	18:28
EDGE-4	Timezone-only and invalid timestamp header combinations were handled with successful durable responses, indicating robust pair handling without service instability.	10:18
EDGE-5	When creating a new agent without explicitly setting mode, metadata resolves to Classic; API attempt lacked bypass auth but UI fallback confirmed default behavior.	18:42
ADV-3	Execution succeeded under explicit local test context and metadata remained scoped to agent batch-agent; forwarded header spoof payload did not demonstrate context escalation in captured responses.	30:46
ADV-4	Header fuzzing with script and newline-style payloads produced controlled responses and stable service behavior without observed payload reflection.	10:18
MOBILE-1	At 390x844, metadata controls remained usable and execution mode state persisted after reload.	20:00

📋 View Recording

Screen Recording

anubra266 added 5 commits March 9, 2026 23:31

feat: durable execution setup

9c25d49

Merge branch 'main' of https://github.com/inkeep/agents into durable-…

6e97e39

…execution

Implement durable tool approval workflow with new context properties …

95db34c

…and error handling

Implement durable tool approval workflow with new context properties …

cec2704

…and error handling

Implement durable tool approval workflow with new context properties …

bf21da5

…and error handling

Merge branch 'main' of https://github.com/inkeep/agents into durable-…

4304e76

…execution

vercel bot had a problem deploying to Preview – agents-docs March 10, 2026 21:35 Failure

vercel bot had a problem deploying to Preview – agents-manage-ui March 10, 2026 21:37 Failure

vercel bot deployed to Preview – agents-api March 10, 2026 21:37 View deployment

Refactor PendingToolApprovalManager to use an internal Map for pendin…

5f8fe20

…g approvals and remove global state dependencies. Delete DurableApprovalRequiredError class as it is no longer needed.

claude bot reviewed Mar 10, 2026

View reviewed changes

github-actions bot deleted a comment from claude bot Mar 10, 2026

vercel bot temporarily deployed to Preview – agents-docs March 11, 2026 01:11 Inactive

vercel bot deployed to Preview – agents-api March 11, 2026 01:14 View deployment

vercel bot had a problem deploying to Preview – agents-manage-ui March 11, 2026 01:15 Failure

anubra266 and others added 9 commits March 10, 2026 21:07

feat: durable execution setup

b11c048

Implement durable tool approval workflow with new context properties …

11c4b3c

…and error handling

Implement durable tool approval workflow with new context properties …

8b6e925

…and error handling

Implement durable tool approval workflow with new context properties …

8d9f9d8

…and error handling

Refactor PendingToolApprovalManager to use an internal Map for pendin…

ce0798c

…g approvals and remove global state dependencies. Delete DurableApprovalRequiredError class as it is no longer needed.

WIP: durable execution refinements before rebase

a8b8fb8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'main' into durable-execution

cd7e82c

Merge branch 'main' into durable-execution

86aae44

Merge remote-tracking branch 'origin/durable-execution' into durable-…

6d4e03e

…execution

vercel bot had a problem deploying to Preview – agents-docs March 11, 2026 20:52 Failure

vercel bot deployed to Preview – agents-api March 11, 2026 20:54 View deployment

vercel bot deployed to Preview – agents-manage-ui March 11, 2026 20:55 View deployment

claude bot reviewed Mar 11, 2026

View reviewed changes

github-actions bot deleted a comment from claude bot Mar 11, 2026

anubra266 added 2 commits March 11, 2026 18:28

fix: address durable execution PR review comments (round 2)

d69e597

Merge branch 'main' into durable-execution

2476a89

vercel bot had a problem deploying to Preview – agents-docs March 12, 2026 01:31 Failure

vercel bot deployed to Preview – agents-manage-ui March 12, 2026 01:33 View deployment

vercel bot deployed to Preview – agents-api March 12, 2026 01:33 View deployment

claude bot reviewed Mar 12, 2026

View reviewed changes

agents-api/src/domains/run/routes/chat.ts Show resolved Hide resolved

agents-api/src/domains/run/routes/chatDataStream.ts Show resolved Hide resolved

agents-api/src/domains/run/routes/chatDataStream.ts Show resolved Hide resolved

github-actions bot deleted a comment from claude bot Mar 12, 2026

fix: add SSE error events and stream error handling for durable paths

cb73749

vercel bot temporarily deployed to Preview – agents-docs March 12, 2026 01:53 Inactive

vercel bot deployed to Preview – agents-manage-ui March 12, 2026 01:55 View deployment

vercel bot deployed to Preview – agents-api March 12, 2026 01:55 View deployment

claude bot reviewed Mar 12, 2026

View reviewed changes

github-actions bot deleted a comment from claude bot Mar 12, 2026

anubra266 added 3 commits March 12, 2026 18:05

add done event

ab733eb

refactor: remove redundant [DONE] message from stream completion in c…

fa68e4d

…hat and execution routes

feat: per call step

4ed6f5d

vercel bot temporarily deployed to Preview – agents-docs March 16, 2026 18:47 Inactive

vercel bot deployed to Preview – agents-manage-ui March 16, 2026 18:49 View deployment

vercel bot deployed to Preview – agents-api March 16, 2026 18:49 View deployment

claude bot reviewed Mar 16, 2026

View reviewed changes

github-actions bot deleted a comment from claude bot Mar 16, 2026

Conversation

anubra266 commented Mar 10, 2026

Uh oh!

pullfrog bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

pullfrog bot commented Mar 10, 2026

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

PR Review Summary

🔴❗ Critical (3) ❗🔴

🟠⚠️ Major (3) 🟠⚠️

🟡 Minor (1) 🟡

💭 Consider (1) 💭

🧹 While You're Here (0) 🧹

Additional Observations

🚫 REQUEST CHANGES

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

itoqa bot commented Mar 10, 2026

Ito Test Report ❌

Uh oh!

itoqa bot commented Mar 11, 2026

Ito Test Report ❌

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

PR Review Summary

🔴❗ Critical (1) ❗🔴

🟠⚠️ Major (3) 🟠⚠️

🟡 Minor (2) 🟡

💭 Consider (2) 💭

🕐 Pending Recommendations (0)

🚫 REQUEST CHANGES

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

itoqa bot commented Mar 11, 2026

Ito Test Report ❌

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

PR Review Summary

🟠⚠️ Major (3) 🟠⚠️

🟡 Minor (0) 🟡

💭 Consider (1) 💭

🧹 While You're Here (0) 🧹

Prior Feedback Status ✅

🚫 REQUEST CHANGES

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

PR Review Summary

Prior Feedback Status ✅

Delta Analysis

🔴❗ Critical (0) ❗🔴

🟠⚠️ Major (0) 🟠⚠️

🟡 Minor (0) 🟡

pullfrog bot commented Mar 10, 2026 •

edited

Loading

vercel bot commented Mar 10, 2026 •

edited

Loading

changeset-bot bot commented Mar 10, 2026 •

edited

Loading

`agentExecutionWorkflow` — durable execution orchestrator

`/api/executions` routes and durable chat integration

`workflow_executions` table and `executionMode` schema

Durable stream adapters and `globalThis` registry