Skip to content

Durable execution#2615

Open
anubra266 wants to merge 22 commits intomainfrom
durable-execution
Open

Durable execution#2615
anubra266 wants to merge 22 commits intomainfrom
durable-execution

Conversation

@anubra266
Copy link
Contributor

No description provided.

@pullfrog
Copy link
Contributor

pullfrog bot commented Mar 10, 2026

Reviewing PR #2615 — Durable execution mode for agent runs with tool approvals and crash recovery. Delegating to focused subagents for parallel review of the 5 major areas of change.

Pullfrog  | View workflow run | Using Claude Code | Triggered by Pullfrogpullfrog.com𝕏

@vercel
Copy link

vercel bot commented Mar 10, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Mar 16, 2026 6:49pm
agents-manage-ui Ready Ready Preview, Comment Mar 16, 2026 6:49pm
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
agents-docs Skipped Skipped Mar 16, 2026 6:49pm

Request Review

@changeset-bot
Copy link

changeset-bot bot commented Mar 10, 2026

🦋 Changeset detected

Latest commit: 4ed6f5d

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages
Name Type
@inkeep/agents-api Patch
@inkeep/agents-manage-ui Patch
@inkeep/agents-cli Patch
@inkeep/agents-core Patch
@inkeep/agents-email Patch
@inkeep/agents-mcp Patch
@inkeep/agents-sdk Patch
@inkeep/agents-work-apps Patch
@inkeep/ai-sdk-provider Patch
@inkeep/create-agents Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pullfrog
Copy link
Contributor

pullfrog bot commented Mar 10, 2026

Adds a durable execution mode for agent runs, backed by a WDK (Workflow Development Kit) workflow engine, enabling crash recovery and persistent tool-approval loops that survive process restarts.

  • New executionMode column ('classic' | 'durable') on the agent table (manage-schema.ts) with migration 0013
  • New workflow_executions runtime table tracking run lifecycle (running → suspended → completed/failed) with migration 0022, plus data-access layer in packages/agents-core/src/data-access/runtime/workflowExecutions.ts
  • Core workflow definition in agents-api/src/domains/run/workflow/functions/agentExecution.ts — orchestrates an approval loop: run agent → if tool needs approval, suspend → wait on toolApprovalHook → resume with pre-approved decisions
  • Step implementations in agents-api/src/domains/run/workflow/steps/agentExecutionSteps.ts — each step ('use step') is individually retryable by WDK
  • New /api/executions route file (agents-api/src/domains/run/routes/executions.ts) with endpoints: create execution, get status, reconnect stream, and approve/deny tool calls
  • Durable-mode branches added to existing /v1/chat and /api/chat routes — when agent.executionMode === 'durable', starts the workflow and pipes the WDK readable stream to the response
  • WritableBackedHonoSSEStream and WritableBackedVercelWriter adapters (agents-api/src/domains/run/stream/durable-stream-helper.ts) bridge SSE/Vercel data streams to WDK writable streams
  • Tool approval flow extended (tool-approval.ts, tool-wrapper.ts) — in durable mode, returns { approved: 'pending' } instead of blocking, sets ctx.pendingDurableApproval, and uses originalToolCallId for stream continuity
  • PendingToolApprovalManager and stream registry moved to globalThis-backed maps so state is shared across WDK bundle and main app module boundaries
  • Bumped workflow and @workflow/* dependencies; added new @workflow/ai dependency
  • UI updates in agents-manage-ui for the new executionMode field in agent config forms and serialization
  • Durable execution timeout defaults added to execution-limits/defaults.ts (1h LLM generation, 30m MCP/approval, 10m function tool)

Pullfrog  | View workflow run | Using Claude Code | Triggered by Pullfrogpullfrog.com𝕏

…g approvals and remove global state dependencies. Delete DurableApprovalRequiredError class as it is no longer needed.
Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(8) Total Issues | Risk: High

🔴❗ Critical (3) ❗🔴

Inline Comments:

  • 🔴 Critical: workflowExecutions.ts:62 Query returns oldest execution instead of newest due to missing DESC order
  • 🔴 Critical: executions.ts:303 Missing tenant/project authorization check allows cross-tenant access
  • 🔴 Critical: executions.ts:190 Middleware path /executions doesn't match child routes like /executions/:id

🟠⚠️ Major (3) 🟠⚠️

Inline Comments:

  • 🟠 Major: agentExecution.ts:69 Tool approvals keyed by toolName causes collision when same tool called multiple times
  • 🟠 Major: durable-stream-helper.ts:54-57 Silent .catch(() => {}) swallows write errors making debugging impossible
  • 🟠 Major: executions.ts:328 Request body bypasses Zod validation, allowing malformed input

🟡 Minor (1) 🟡

Inline Comments:

  • 🟡 Minor: agentExecutionSteps.ts:159 Stream errors caught but not communicated to client

💭 Consider (1) 💭

Inline Comments:

  • 💭 Consider: DurableApprovalRequiredError.ts:1-13 Dead code — error class defined but never used

🧹 While You're Here (0) 🧹

None identified.


Additional Observations

Testing Gap: The new workflow code (agentExecution.ts, agentExecutionSteps.ts) has no direct unit tests. The existing generateTaskHandler.test.ts only mocks the new methods without exercising the actual workflow logic. Consider adding tests that cover:

  • The approval loop behavior (multiple rounds)
  • Stream namespace switching
  • Error handling paths
  • Status transitions (running → suspended → completed/failed)

Documentation: The API reference docs (tools.mdx) were updated with Slack endpoints, but there's no documentation for the new /api/executions endpoints. Customers will need guidance on:

  • When to use durable vs classic mode
  • How to reconnect to streams
  • Tool approval flow

Schema Migration: The migrations look correct, but verify the workflow_executions table index covers the query patterns in getWorkflowExecutionByConversation.


🚫 REQUEST CHANGES

Summary: This PR introduces significant new infrastructure for durable execution, but has three critical issues that must be addressed before merge: (1) the query ordering bug will return stale executions instead of the most recent, (2) the missing authorization check on reconnect enables cross-tenant data access, and (3) the middleware path matching means child routes don't have context validation. The tool approval keying by name instead of call ID is also a blocking issue that will cause incorrect behavior when the same tool is called multiple times. After these are fixed, the architecture looks solid.

Discarded (4)
Location Issue Reason Discarded
chat.ts Header extraction duplicated Intentional code movement to execute before branching on execution mode — not duplication
PendingToolApprovalManager.ts globalThis pattern Valid pattern for cross-module state sharing, matches stream-registry.ts
runtime-schema.ts Missing foreign key constraint Runtime tables intentionally avoid FK constraints for performance/flexibility
agentExecution.ts Workflow ID string magic Standard WDK pattern for workflow identification
Reviewers (10)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-standards 6 0 0 0 2 0 4
pr-review-architecture 4 0 0 0 1 0 3
pr-review-sre 3 0 0 0 2 0 1
pr-review-security-iam 2 0 0 0 1 0 1
pr-review-breaking-changes 2 0 0 0 0 0 2
pr-review-tests 3 0 0 0 0 0 3
pr-review-types 2 0 0 0 1 0 1
pr-review-errors 3 0 0 0 1 0 2
pr-review-consistency 2 0 0 0 0 0 2
pr-review-precision 2 0 1 0 0 0 1
Total 29 0 1 0 8 0 20

@github-actions github-actions bot deleted a comment from claude bot Mar 10, 2026
@itoqa
Copy link

itoqa bot commented Mar 10, 2026

Ito Test Report ❌

16 test cases ran. 14 passed, 2 failed.

✅ Verification found broad durable-flow coverage passing across metadata persistence, stream reconnect behavior, approval resume, authz boundaries, and mobile usability. 🔍 Two failures are likely real product defects in the durable executions API: execution creation can fail before workflow start, and duplicate approval submissions are not handled idempotently.

✅ Passed (14)
Test Case Summary Timestamp Screenshot
ROUTE-1 Agent settings showed exactly Classic and Durable options, and default selection was Classic on initial load. 0:00 ROUTE-1_0-00.png
ROUTE-2 Execution mode change to Durable persisted after save and hard reload without reverting. 0:00 ROUTE-2_0-00.png
ROUTE-6 Execution status endpoint returned HTTP 200 across repeated polls with consistent run id and required fields (id, status, conversationId, agentId, createdAt, updatedAt). 20:36 ROUTE-6_20-36.png
ROUTE-7 Reconnect requests returned HTTP 200 SSE for the same execution, with index 0 replaying earlier events and index 2 resuming at a later boundary without starting a new run. 20:36 ROUTE-7_20-36.png
ROUTE-8 Execution emitted durable-approval-required, approval POST returned success, and status transitioned to completed on the same run id. 45:15 ROUTE-8_45-15.png
EDGE-3 Timezone-only and timestamp-only forwarded-header requests both returned functional 200 streaming responses without server failure. 3:24 EDGE-3_3-24.png
EDGE-4 Malformed timestamp and invalid timezone input were tolerated and stream contract remained available without 5xx response. 3:24 EDGE-4_3-24.png
ADV-1 Unauthenticated execution creation request was denied (404 not_found for missing project context) and returned no execution metadata. 46:22 ADV-1_46-22.png
ADV-2 In-scope status request succeeded for the control run, while the same run queried with foreign tenant/project headers was denied (404) with no execution payload leakage. 47:47 ADV-2_47-47.png
ADV-3 Forged toolCallId request did not resume execution (status remained suspended), and only the real toolCallId transitioned the run to completed. 45:15 ADV-3_45-15.png
ADV-4 Concurrent approve and deny submissions completed without crash, and execution converged to a single deterministic completed terminal state. 45:15 ADV-4_45-15.png
ADV-5 Created durable run, interrupted/reconnected stream from start indexes 0 and 2 on the same runId, then approved pending tool call and confirmed deterministic terminal completion. 49:46 ADV-5_49-46.png
MOBILE-1 At 390x844 viewport, execution mode control remained usable and persisted after save and reload. 0:00 MOBILE-1_0-00.png
FLOW-1 Direct execution emitted durable-approval-required with toolCallId, approval succeeded, reconnect stream remained valid, and run reached completed. 45:15 FLOW-1_45-15.png
❌ Failed (2)
Test Case Summary Timestamp Screenshot
ROUTE-5 Execution creation returned HTTP 500 with backend insert failure, so no durable execution was created. 3:24 ROUTE-5_3-24.png
EDGE-5 Parallel duplicate approvals produced one internal error (Hook not found) and one success; execution still completed once, but duplicate-submit handling was not cleanly idempotent. 45:15 EDGE-5_45-15.png
Create durable execution via dedicated endpoint – Failed
  • Where: POST /run/api/executions durable execution creation path

  • Steps to reproduce: Send a durable execution creation request with a new conversationId, then observe server response before any stream output begins.

  • What failed: The endpoint returns HTTP 500 during user-message persistence (insert into "messages"), instead of starting execution and returning a run id stream.

  • Code analysis: The execution route writes a message immediately, but unlike the chat routes it does not create or ensure the conversation record first. This can violate message-to-conversation persistence expectations and cause the observed insert failure.

  • Relevant code:

    agents-api/src/domains/run/routes/executions.ts (lines 207-234)

    const lastUserMessage = (body.messages as Message[])
      .filter((msg) => msg.role === 'user')
      .slice(-1)[0];
    
    const messageParts = z
      .array(PartSchema)
      .parse(lastUserMessage ? getMessagePartsFromOpenAIContent(lastUserMessage.content) : []);
    
    const userMessage = extractTextFromParts(messageParts);
    const messageId = generateId();
    const messageContent = await buildPersistedMessageContent(userMessage, messageParts, {
      tenantId,
      projectId,
      conversationId,
      messageId,
    });
    
    await createMessage(runDbClient)({
      id: messageId,
      tenantId,
      projectId,
      conversationId,

    agents-api/src/domains/run/routes/chat.ts (lines 248-255)

    await createOrGetConversation(runDbClient)({
      tenantId,
      projectId,
      id: conversationId,
      agentId: agentId,
      activeSubAgentId: defaultSubAgentId,
      ref: executionContext.resolvedRef,
    });
  • Why this is likely a bug: The dedicated executions route skips conversation initialization that parallel chat flows perform, which directly explains the persistence-time 500 on message insert.

  • Introduced by this PR: Yes - this PR modified the relevant code.

  • Timestamp: 3:24

Duplicate approval submissions stay idempotent – Failed
  • Where: POST /run/api/executions/:executionId/approvals/:toolCallId duplicate approval handling

  • Steps to reproduce: Submit two approval requests in parallel for the same execution/toolCallId.

  • What failed: One request succeeds while the other returns HTTP 500 Hook not found, instead of both resolving as idempotent outcomes.

  • Code analysis: The approvals endpoint resumes the workflow hook directly without handling already-consumed/missing-hook cases. Under races, one resume consumes the hook and the second throws, surfacing as a server error.

  • Relevant code:

    agents-api/src/domains/run/routes/executions.ts (lines 339-346)

    const token = `tool-approval:${execution.conversationId}:${executionId}:${toolCallId}`;
    
    await toolApprovalHook.resume(token, {
      approved,
      reason: approved ? undefined : reason,
    });
    
    return c.json({ success: true as boolean }, 200);

    agents-api/src/domains/run/routes/chatDataStream.ts (lines 225-233)

    const ok = approved
      ? pendingToolApprovalManager.approveToolCall(toolCallId)
      : pendingToolApprovalManager.denyToolCall(toolCallId, reason);
    return { toolCallId, approved, alreadyProcessed: !ok };
    
    return c.json({ success: true, results });
  • Why this is likely a bug: The durable approvals endpoint lacks race-safe/idempotent handling for duplicate submissions and leaks an internal hook state error as HTTP 500.

  • Introduced by this PR: Yes - this PR modified the relevant code.

  • Timestamp: 45:15

📋 View Recording

Screen Recording

@itoqa
Copy link

itoqa bot commented Mar 11, 2026

Ito Test Report ❌

6 test cases ran. 5 passed, 1 failed.

🔍 The durable execution rollout is mostly working for metadata persistence, /run/api/chat streaming, reconnect behavior, and stream-index edge handling. One production bug was confirmed in the OpenAI-compatible durable route: it streams SSE frames but returns the wrong response content type.

✅ Passed (5)
Test Case Summary Timestamp Screenshot
ROUTE-1 Execution mode was set to Durable, saved, and rehydrated as Durable after reload in agent settings. 0:00 ROUTE-1_0-00.png
ROUTE-3 Route returned 200 SSE with both x-workflow-run-id and x-vercel-ai-data-stream: v2 headers and no JSON error payload envelope. 4:05 ROUTE-3_4-05.png
ROUTE-5 Both reconnect attempts returned 200 SSE responses including retry with x-stream-start-index and showed no 5xx or crash behavior. 4:05 ROUTE-5_4-05.png
EDGE-5 Reconnect calls with extreme and invalid stream indices remained stable without 5xx or crash; behavior was bounded and SSE contract persisted. 1:08:22 EDGE-5_1-08-22.png
EDGE-6 At iPhone 13 viewport size, execution mode controls remained usable, mode changes saved successfully, and final Durable selection persisted after reload. 0:00 EDGE-6_0-00.png
❌ Failed (1)
Test Case Summary Timestamp Screenshot
ROUTE-2 Endpoint returned 200 with x-workflow-run-id and SSE-like frames, but content-type was text/plain instead of text/event-stream. 4:05 ROUTE-2_4-05.png
Durable branch on /run/v1/chat/completions emits workflow run ID – Failed
  • Where: Run API route POST /run/v1/chat/completions in durable mode.

  • Steps to reproduce: Enable durable execution mode for an agent, call /run/v1/chat/completions with streaming, inspect response headers.

  • What failed: The endpoint returns stream frames (data: ...) but responds with content-type: text/plain; charset=UTF-8 instead of text/event-stream.

  • Code analysis: I traced the durable branch implementation in the OpenAI-compatible chat route and compared it to the durable implementation in the Vercel chat-data route. The /v1/chat/completions durable branch streams bytes via stream(...) but does not set SSE headers, while /run/api/chat explicitly sets SSE headers before returning stream(...).

  • Relevant code:

    agents-api/src/domains/run/routes/chat.ts (lines 414–445)

    if (agent.executionMode === 'durable') {
      const run = await start(agentExecutionWorkflow, [
        {
          tenantId,
          projectId,
          agentId,
          conversationId,
          userMessage,
          messageParts: messageParts.length > 0 ? messageParts : undefined,
          requestId,
          resolvedRef: executionContext.resolvedRef,
        },
      ]);
    
      c.header('x-workflow-run-id', run.runId);
    
      return stream(c, async (s) => {

    agents-api/src/domains/run/routes/chatDataStream.ts (lines 451–456)

    c.header('x-workflow-run-id', run.runId);
    c.header('content-type', 'text/event-stream');
    c.header('cache-control', 'no-cache');
    c.header('connection', 'keep-alive');
    c.header('x-vercel-ai-data-stream', 'v2');
    c.header('x-accel-buffering', 'no');
  • Why this is likely a bug: The durable /v1/chat/completions path emits SSE-formatted data but omits the SSE content-type header in production code, causing protocol/header contract mismatch.

  • Introduced by this PR: Yes – this PR modified the relevant code.

  • Timestamp: 4:05

📋 View Recording

Screen Recording

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(6) Total Issues | Risk: High

🔴❗ Critical (1) ❗🔴

Inline Comments:

  • 🔴 Critical: executions.ts:226 Missing conversation creation before message insert — will cause foreign key violation

🟠⚠️ Major (3) 🟠⚠️

Inline Comments:

  • 🟠 Major: executions.ts:357 Tool approval resume lacks idempotency — duplicate approvals corrupt state
  • 🟠 Major: chat.ts:433-435 Missing SSE headers for durable execution path — breaks streaming
  • 🟠 Major: chatDataStream.ts:187-197 Batch approval Promise.all lacks error handling — one failure loses all

🟡 Minor (2) 🟡

Inline Comments:

  • 🟡 Minor: executions.ts:331-333 Stream close errors not communicated to client
  • 🟡 Minor: executions.ts:197 Request body bypasses Zod validation

💭 Consider (2) 💭

💭 1) tool-approval.ts:41-82 Queue-based approval consumption pattern

Issue: The tool approval system uses array queues keyed by toolName with shift() to consume approvals. This works correctly but relies on strict ordering guarantees.

Why: If approval order ever deviates from tool call order (e.g., parallel tool calls, out-of-order webhook delivery), the wrong approval could be applied to the wrong tool call.

Fix: Consider adding toolCallId matching as a secondary validation, or logging when queue consumption occurs for debugging.

Refs:

💭 2) durable-stream-helper.ts globalThis pattern for cross-module state

Issue: The globalThis.__durableStreamHelpers map is used to share stream state across module boundaries. While functional, this pattern can be fragile in certain bundler configurations.

Why: Some bundlers may create multiple module instances, breaking the singleton assumption. The current implementation handles this correctly with proper cleanup, but it's worth being aware of.

Fix: Consider adding a startup-time sanity check or logging when the registry is first accessed.

Refs:

🕐 Pending Recommendations (0)

No pending recommendations from prior reviews — all 6 previously raised issues have been addressed in this delta.


🚫 REQUEST CHANGES

Summary: This delta addresses the prior review feedback well (6/6 issues fixed), but introduces new critical issues that must be resolved before merge. The missing conversation creation (CRITICAL) will cause foreign key violations on first use. The SSE header omission and idempotency gap in tool approvals will cause production failures. These are straightforward fixes — see inline comments for suggested resolutions.


Discarded (8)
Location Issue Reason Discarded
executions.ts Missing tenant isolation checks Already handled by inheritedRunApiKeyAuth() middleware
agentExecution.ts Unbounded while loop risk Loop is bounded by workflow engine timeout and approval responses
workflowExecutions.ts Missing transaction for multi-step updates Single-row updates don't require transactions
chat.ts Race condition in stream initialization Stream is initialized before response starts
durable-stream-helper.ts Memory leak in registry Cleanup is handled in finally block
executions.ts Missing rate limiting Out of scope — rate limiting is handled at gateway level
tool-approval.ts Type safety of approval payload Validated by Zod schema at route level
chatDataStream.ts Missing timeout for workflow hooks Timeouts are handled by workflow engine
Reviewers (4)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-standards 9 0 1 0 4 0 4
pr-review-sre 5 0 1 0 1 0 3
pr-review-errors 4 0 0 0 1 0 3
pr-review-consistency 2 0 0 0 0 0 2
Total 20 0 2 0 6 0 12

Note: All Main findings were routed as inline comments due to their localized, single-file nature with clear fix suggestions.

@github-actions github-actions bot deleted a comment from claude bot Mar 11, 2026
@itoqa
Copy link

itoqa bot commented Mar 11, 2026

Ito Test Report ❌

12 test cases ran. 11 passed, 1 failed.

✅ The durable routing and security checks included in this report were largely successful under local verification, and one reproducible durable-execution defect was confirmed through source inspection in the /run/api/executions path.

✅ Passed (11)
Test Case Summary Timestamp Screenshot
ROUTE-2 Durable chat route returned 200 with required stream headers and parseable SSE frames. 0:00 ROUTE-2_0-00.png
ROUTE-3 Execution creation succeeded after remediation and returned workflow stream headers/chunks. 0:00 ROUTE-3_0-00.png
ROUTE-4 Execution status returned required scoped fields and rejected a fake execution ID. 0:00 ROUTE-4_0-00.png
ROUTE-5 Reconnect endpoint streamed successfully for normal and offset requests. 0:00 ROUTE-5_0-00.png
LOGIC-1 Durable mode persisted across reload and runtime header behavior matched mode selection. 7:23 LOGIC-1_7-23.png
LOGIC-3 Header forwarding and malformed timezone/timestamp handling remained stable without transport failure. 9:44 LOGIC-3_9-44.png
ADV-1 Cross-scope probes to status/stream/approval endpoints were denied without data leakage. 9:44 ADV-1_9-44.png
ADV-2 Unauthorized and malformed durable API calls were denied; no unauthorized execution access succeeded. 10:59 ADV-2_10-59.png
ADV-3 Header injection attempts were handled without crash and without unsafe payload reflection. 9:44 ADV-3_9-44.png
MOBILE-1 Mobile execution mode control remained visible, usable, and persisted after reload. 7:23 MOBILE-1_7-23.png
RAPID-1 Rapid concurrent durable submissions completed without server crash. 0:00 RAPID-1_0-00.png
❌ Failed (1)
Test Case Summary Timestamp Screenshot
ROUTE-6 Approval flow could not reach suspended approval state because execution creation via /run/api/executions returned 500 before approval resume could occur. 2:35 ROUTE-6_2-35.png
Approve pending tool call via dedicated approvals endpoint – Failed
  • Where: Durable execution creation path (POST /run/api/executions) before approval suspension/resume handling.

  • Steps to reproduce: Call /run/api/executions with a new conversationId and user message for an approval-required flow; observe immediate 500 before approval state is reached.

  • What failed: The endpoint fails during message persistence instead of creating the execution and entering suspended approval state.

  • Code analysis: The executions route writes a user message directly but does not initialize the conversation record first. The standard chat route performs createOrGetConversation(...) before createMessage(...), which avoids this failure path.

  • Relevant code:

    agents-api/src/domains/run/routes/executions.ts (lines 198-236)

    const body = c.get('requestBody') || (await c.req.json());
    const conversationId = body.conversationId || getConversationId();
    
    const messageId = generateId();
    const messageContent = await buildPersistedMessageContent(userMessage, messageParts, {
      tenantId,
      projectId,
      conversationId,
      messageId,
    });
    
    await createMessage(runDbClient)({
      id: messageId,
      tenantId,
      projectId,
      conversationId,
      role: 'user',
      content: messageContent,
      visibility: 'user-facing',
      messageType: 'chat',
    });

    agents-api/src/domains/run/routes/chat.ts (lines 248-256)

    await createOrGetConversation(runDbClient)({
      tenantId,
      projectId,
      id: conversationId,
      agentId: agentId,
      activeSubAgentId: defaultSubAgentId,
      ref: executionContext.resolvedRef,
      userId: executionContext.metadata?.endUserId,
    });
  • Why this is likely a bug: The durable executions route omits required conversation initialization before inserting a message, which plausibly causes the observed server-side insert failure and blocks approval workflow startup.

  • Introduced by this PR: Yes - this PR modified the relevant code.

  • Timestamp: 2:35

📋 View Recording

Screen Recording

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(3) Total Issues | Risk: Medium

Re-review scope: This review covers only the delta changes since commit 6d4e03ec20b9. All 6 issues from prior reviews have been verified as fixed ✅

🟠⚠️ Major (3) 🟠⚠️

Inline Comments:

  • 🟠 Major: chat.ts:449 Stream errors not communicated to client — durable execution errors logged but no SSE error event written
  • 🟠 Major: chatDataStream.ts:479 Same issue — durable stream errors silent for Vercel AI SDK clients
  • 🟠 Major: chatDataStream.ts:217-225 Missing error handling for approval stream read loop — no try/catch around run.getReadable()

🟡 Minor (0) 🟡

None.

💭 Consider (1) 💭

💭 1) chatDataStream.ts:188-206 Promise.allSettled results not inspected

Issue: The change from Promise.all to Promise.allSettled was made to handle batch approvals gracefully, but the settled results are never inspected. If an approval fails with an unexpected error (not "not found" or "already"), it's silently discarded.

Why: Debugging approval failures 6 months later would require correlating workflow execution logs with the approval request, with no indication in the API response that anything went wrong.

Fix: Consider logging or returning failures:

const results = await Promise.allSettled(...);
const failures = results
  .filter((r): r is PromiseRejectedResult => r.status === 'rejected')
  .filter(r => !r.reason?.message?.includes('not found') && !r.reason?.message?.includes('already'));

if (failures.length > 0) {
  logger.warn({ failureCount: failures.length, conversationId }, 'Unexpected approval failures');
}

Refs:

🧹 While You're Here (0) 🧹

None identified.


Prior Feedback Status ✅

All issues from the previous 2 reviews have been addressed in this delta:

Prior Issue Status Fix Location
Missing conversation creation before message insert ✅ Fixed executions.ts:227-238
Missing SSE headers for durable execution ✅ Fixed chat.ts:434-437, executions.ts:269-272
Non-idempotent tool approval resume ✅ Fixed executions.ts:372-386
Batch approval Promise.all failure handling ✅ Fixed chatDataStream.ts:188-206
HTTPException guards and error.message leaks ✅ Fixed chat.ts:576-589, chatDataStream.ts:648-663
Stream error event writing ✅ Fixed executions.ts:350

🚫 REQUEST CHANGES

Summary: The delta successfully addresses all 6 prior review issues — great work on the fixes! However, the same error-communication pattern that was correctly applied to executions.ts wasn't consistently applied to the durable execution streams in chat.ts and chatDataStream.ts. Clients need to receive SSE error events when streams fail, otherwise they cannot distinguish between successful completion and errors. These are straightforward additions following the established pattern.


Discarded (3)
Location Issue Reason Discarded
durable-stream-helper.ts:57-61 Synchronous write swallows errors Fire-and-forget semantics are intentional for streaming; warning is now logged (addressed in prior delta)
defaults.ts:166-170 Durable timeout constants unused Out of scope for this PR — timeout configuration is a feature enhancement, not a bug
Promise.allSettled results Not inspected for unexpected errors Routed to Consider — valid improvement but not blocking
Reviewers (4)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-standards 3 0 0 0 3 0 0
pr-review-errors 3 0 1 0 0 0 2
pr-review-sre 5 0 0 0 0 0 5
pr-review-consistency 2 0 0 0 0 0 2
Total 13 0 1 0 3 0 9

Note: All 3 Main findings were routed as inline comments due to their localized, single-file nature with clear fix patterns.

@github-actions github-actions bot deleted a comment from claude bot Mar 12, 2026
Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low

Re-review scope: This review covers only the delta changes since commit 2476a89a00a5 — 1 commit (cb7374912) affecting 2 files.

Prior Feedback Status ✅

All 3 Major issues from the previous review have been fixed in this delta:

Prior Issue Status Fix Location
🟠 Stream errors not communicated to client (chat.ts:449) ✅ Fixed chat.ts:449 — SSE error event now written
🟠 Same issue for Vercel AI SDK clients (chatDataStream.ts:479) ✅ Fixed chatDataStream.ts:487 — SSE error event now written
🟠 Missing error handling for approval stream read loop (chatDataStream.ts:217-225) ✅ Fixed chatDataStream.ts:217-232 — try/catch added with SSE error event

Delta Analysis

The commit cb7374912 (fix: add SSE error events and stream error handling for durable paths) addresses all feedback correctly:

  1. chat.ts:449 — Added await s.write(\event: error\ndata: ${JSON.stringify({ error: 'Stream error' })}\n\n`)` in the catch block
  2. chatDataStream.ts:217-232 — Wrapped the approval continuation stream read loop in try/catch with SSE error event
  3. chatDataStream.ts:487 — Added SSE error event in the durable /chat stream error path

All three patterns now match the established error handling pattern from executions.ts:283-285.

🔴❗ Critical (0) ❗🔴

None.

🟠⚠️ Major (0) 🟠⚠️

None.

🟡 Minor (0) 🟡

None.

💭 Consider (0) 💭

None.

🧹 While You're Here (0) 🧹

None identified.


✅ APPROVE

Summary: All 3 Major issues from the prior review have been addressed. The SSE error event pattern is now consistently applied across all durable execution stream paths — chat.ts, chatDataStream.ts (durable mode), and chatDataStream.ts (approval continuation). Clients will now receive event: error notifications when streams fail, enabling proper error handling and debugging. Ship it! 🚀


Discarded (0)

No findings discarded.

Reviewers (0)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
Total 0 0 0 0 0 0 0

Note: Subagent reviewers were not dispatched for this delta re-review as the changes are limited to error handling additions that directly address prior feedback. The delta was reviewed directly by the orchestrator.

@github-actions github-actions bot deleted a comment from claude bot Mar 12, 2026
@itoqa
Copy link

itoqa bot commented Mar 12, 2026

Ito Test Report ❌

18 test cases ran. 12 passed, 6 failed.

✅ Durable-mode happy paths, classic rollback behavior, and responsive UI interactions were stable in the included passing coverage. 🔍 Code-first verification found six likely product defects in approval-response handling, durable execution concurrency/idempotency, and stream reconnect index validation.

✅ Passed (12)
Test Case Summary Timestamp Screenshot
ROUTE-1 Execution mode persisted as Durable after refresh, and run API key creation succeeded. 0:00 ROUTE-1_0-00.png
ROUTE-2 Durable execution endpoint returned SSE with a workflow run ID. 3:02 ROUTE-2_3-02.png
ROUTE-3 Execution status endpoint returned expected durable execution schema. 3:02 ROUTE-3_3-02.png
ROUTE-4 Reconnect stream resumed successfully with valid SSE framing. 3:02 ROUTE-4_3-02.png
ROUTE-5 /run/v1/chat/completions used durable streaming with workflow header. 3:02 ROUTE-5_3-02.png
ROUTE-6 /run/api/chat returned Vercel stream headers with durable workflow ID. 3:02 ROUTE-6_3-02.png
ROUTE-7 Concurrent durable runs stayed isolated with distinct workflow mappings. 3:02 ROUTE-7_3-02.png
EDGE-3 Timezone/timestamp header pair combinations were handled without stream breakage. 12:57 EDGE-3_12-57.png
EDGE-4 Missing user-role text input was handled without server crash. 12:57 EDGE-4_12-57.png
EDGE-5 Classic mode rollback removed durable run header as expected. 20:17 EDGE-5_20-17.png
ADV-6 Mobile viewport execution mode workflow stayed usable and persisted after reload. 20:17 ADV-6_20-17.png
ADV-7 Rapid toggle/save ended with the final intended execution mode persisted. 20:17 ADV-7_20-17.png
❌ Failed (6)
Test Case Summary Timestamp Screenshot
EDGE-1 Approval response without conversationId started execution (200 SSE) instead of returning validation error. 12:57 EDGE-1_12-57.png
EDGE-2 Approval response with unknown conversationId was accepted and executed (200 SSE). 12:57 EDGE-2_12-57.png
ADV-1 Rapid duplicate durable execution creation produced a 500 internal server error. 23:32 ADV-1_23-32.png
ADV-2 Duplicate approval submissions returned success even when execution state was already failed. 23:32 ADV-2_23-32.png
ADV-3 Forged/mismatched approval attempts returned success for existing execution IDs. 23:32 ADV-3_23-32.png
ADV-4 Reconnect stream accepted invalid start-index probes without input validation, leading to instability. 12:57 ADV-4_12-57.png
Approval response without conversationId – Failed
  • Where: POST /run/api/chat approval-response fast path

  • Steps to reproduce: Submit a tool-approval response payload in messages[].content without conversationId.

  • What failed: Request was treated as normal chat execution and started a durable run instead of being rejected with a validation error.

  • Code analysis: The fast-path detector only scans messages[].parts for approval responses. Payloads that encode approval responses in messages[].content bypass validation and proceed through normal execution creation.

  • Relevant code:

    agents-api/src/domains/run/routes/chatDataStream.ts (lines 147-149)

    const approvalParts = (body.messages || [])
      .flatMap((m: any) => m?.parts || [])
      .filter((p: any) => p?.state === 'approval-responded' && typeof p?.toolCallId === 'string');

    agents-api/src/domains/run/routes/chatDataStream.ts (lines 156-165)

    if (isApprovalResponse) {
      const conversationId = body.conversationId;
      if (!conversationId) {
        return c.json(
          {
            success: false,
            error: 'conversationId is required for approval response',
          },
          400
        );
      }
    }
  • Why this is likely a bug: Approval responses encoded in supported message content shapes should not bypass approval validation and trigger a new execution path.

  • Introduced by this PR: Yes – this PR modified the approval fast-path logic in chatDataStream.

  • Timestamp: 12:57

Approval response for unknown conversation – Failed
  • Where: POST /run/api/chat approval-response handling for unknown conversation IDs

  • Steps to reproduce: Submit approval-response content with a nonexistent conversationId.

  • What failed: Instead of rejecting as unknown conversation, the request reached execution flow and returned a successful stream response.

  • Code analysis: The same approval detection gap (only scanning parts) allows approval-response content in other fields to bypass the getConversation(...) existence check and durable approval path.

  • Relevant code:

    agents-api/src/domains/run/routes/chatDataStream.ts (lines 168-176)

    const conversation = await getConversation(runDbClient)({
      scopes: { tenantId, projectId },
      conversationId,
    });
    
    if (!conversation) {
      return c.json({ success: false, error: 'Conversation not found' }, 404);
    }

    agents-api/src/domains/run/routes/chatDataStream.ts (lines 147-149)

    const approvalParts = (body.messages || [])
      .flatMap((m: any) => m?.parts || [])
      .filter((p: any) => p?.state === 'approval-responded' && typeof p?.toolCallId === 'string');
  • Why this is likely a bug: Unknown-conversation approval requests are intended to be rejected, but valid-looking approval payload shapes can bypass that guard.

  • Introduced by this PR: Yes – this PR added the new approval-response fast path and its current parsing logic.

  • Timestamp: 12:57

Rapid double-submit create durable execution – Failed
  • Where: POST /run/api/executions conversation creation path under concurrent duplicate submits

  • Steps to reproduce: Send multiple concurrent create requests with the same conversationId.

  • What failed: One request returned HTTP 500 due to a failing conversation insert while other concurrent requests succeeded.

  • Code analysis: Conversation creation uses a read-then-insert pattern without atomic upsert/conflict handling. Under concurrent calls, two requests can both miss the existing row and then race into duplicate insert attempts.

  • Relevant code:

    packages/agents-core/src/data-access/runtime/conversations.ts (lines 166-184)

    if (input.id) {
      const existing = await db.query.conversations.findFirst({
        where: and(eq(conversations.tenantId, input.tenantId), eq(conversations.id, input.id)),
      });
    
      if (existing) {
        if (existing.activeSubAgentId !== input.activeSubAgentId) {
          await db
            .update(conversations)
            .set({
              activeSubAgentId: input.activeSubAgentId,
              updatedAt: new Date().toISOString(),
            })

    packages/agents-core/src/data-access/runtime/conversations.ts (lines 202-203)

    await db.insert(conversations).values(newConversation);
    return newConversation;

    agents-api/src/domains/run/routes/executions.ts (lines 231-238)

    await createOrGetConversation(runDbClient)({
      tenantId,
      projectId,
      id: conversationId,
      agentId,
      activeSubAgentId,
      ref: resolvedRef,
    });
  • Why this is likely a bug: The non-atomic create-or-get implementation is vulnerable to concurrent duplicate inserts, which directly explains the observed 500 on retry storms.

  • Introduced by this PR: Yes – this PR introduced the new durable execution entrypoint that triggers this race-prone path concurrently.

  • Timestamp: 23:32

Idempotency of duplicate approval submissions – Failed
  • Where: POST /run/api/executions/:executionId/approvals/:toolCallId

  • Steps to reproduce: Submit duplicate approvals for an execution that is already terminal or no longer has that pending tool call.

  • What failed: Endpoint returned {"success":true} repeatedly without confirming execution state or pending approval validity.

  • Code analysis: The handler validates only execution existence, then suppresses hook not found/already errors and still returns success. There is no explicit guard for non-suspended execution states or absent pending approvals.

  • Relevant code:

    agents-api/src/domains/run/routes/executions.ts (lines 361-368)

    const execution = await getWorkflowExecution(runDbClient)({
      tenantId,
      projectId,
      id: executionId,
    });
    if (!execution) {
      throw createApiError({ code: 'not_found', message: 'Execution not found' });
    }

    agents-api/src/domains/run/routes/executions.ts (lines 377-385)

    } catch (error) {
      const message = error instanceof Error ? error.message : String(error);
      if (!message.includes('not found') && !message.includes('already')) {
        throw error;
      }
      logger.warn(
        { executionId, toolCallId, message },
        'Tool approval already processed (idempotent)'
      );
    }
  • Why this is likely a bug: Returning unconditional success for non-actionable approval submissions hides invalid state transitions and breaks approval-path contract guarantees.

  • Introduced by this PR: Yes – this PR added the approvals route and current permissive error handling.

  • Timestamp: 23:32

Forged approval with mismatched IDs – Failed
  • Where: POST /run/api/executions/:executionId/approvals/:toolCallId forged binding checks

  • Steps to reproduce: Call approvals endpoint with an existing execution ID but a mismatched/nonexistent tool call ID.

  • What failed: Requests returned success for invalid approval bindings instead of explicit rejection.

  • Code analysis: Token construction trusts URL params, and hook resume failures containing not found/already are swallowed, causing successful response even when no valid approval binding exists.

  • Relevant code:

    agents-api/src/domains/run/routes/executions.ts (lines 370-376)

    const token = `tool-approval:${execution.conversationId}:${executionId}:${toolCallId}`;
    
    try {
      await toolApprovalHook.resume(token, {
        approved,
        reason: approved ? undefined : reason,
      });

    agents-api/src/domains/run/routes/executions.ts (lines 377-385)

    } catch (error) {
      const message = error instanceof Error ? error.message : String(error);
      if (!message.includes('not found') && !message.includes('already')) {
        throw error;
      }
      logger.warn(
        { executionId, toolCallId, message },
        'Tool approval already processed (idempotent)'
      );
    }
  • Why this is likely a bug: Invalid execution/tool-call bindings should be rejected; silent success allows forged approval attempts to appear accepted.

  • Introduced by this PR: Yes – this PR introduced this durable approvals endpoint behavior.

  • Timestamp: 23:32

Reconnect probing with invalid start index values – Failed
  • Where: GET /run/api/executions/:executionId/stream reconnect start index parsing

  • Steps to reproduce: Send invalid x-stream-start-index values (NaN, negative, very large) and then retry valid reconnect.

  • What failed: Invalid probes were not rejected up front; route attempted reconnect with unchecked index values, matching observed reconnect instability under probing.

  • Code analysis: Header is parsed with parseInt but never validated for finite, non-negative, bounded values before passing into run.getReadable({ startIndex }).

  • Relevant code:

    agents-api/src/domains/run/routes/executions.ts (lines 321-322)

    const startIndexHeader = c.req.header('x-stream-start-index');
    const startIndex = startIndexHeader ? Number.parseInt(startIndexHeader, 10) : 0;

    agents-api/src/domains/run/routes/executions.ts (lines 339-342)

    return stream(c, async (s) => {
      try {
        const readable = run.getReadable({ startIndex });
        const reader = readable.getReader();
  • Why this is likely a bug: Missing defensive validation on reconnect cursor input can push invalid values into stream internals and destabilize reconnect behavior.

  • Introduced by this PR: Yes – this PR added the new reconnect endpoint and current start-index handling.

  • Timestamp: 12:57

📋 View Recording

Screen Recording

@pullfrog
Copy link
Contributor

pullfrog bot commented Mar 16, 2026

TL;DR — Adds a new "durable" execution mode for agents, backed by the WDK (Workflow Development Kit). When an agent is configured with executionMode: 'durable', its runs are orchestrated as persistent, crash-recoverable workflows where tool approvals suspend execution to disk — allowing clients to disconnect and reconnect without losing state. A new workflow_executions runtime table, new /api/executions REST routes, and a UI dropdown for selecting execution mode are introduced alongside the existing "classic" in-memory path.

Key changes

  • executionMode agent column + schema — Adds execution_mode varchar column to the manage agent table (defaulting to 'classic'), with corresponding Zod validation, OpenAPI spec, DAL, and UI serialization updates.
  • workflow_executions runtime table — New runtime table tracking durable execution state (runningsuspendedcompleted/failed) with CRUD data-access functions.
  • agentExecutionWorkflow + agentExecutionSteps — New WDK workflow function that orchestrates LLM generation → tool approval suspension → tool execution → completion as discrete durable steps, with a defineHook-based toolApprovalHook for external resume.
  • /api/executions route module — New REST endpoints: POST /executions (create), GET /executions/:id (status), GET /executions/:id/stream (reconnect SSE), POST /executions/:id/approvals/:toolCallId (approve/deny).
  • Durable branches in chat.ts and chatDataStream.ts — Existing chat routes detect executionMode === 'durable' and delegate to workflow/api.start() instead of the classic in-memory path; the data stream route handles suspended-execution resume via toolApprovalHook.resume().
  • Durable tool approval flow in tool-approval.ts — When durableWorkflowRunId is set, tool approval short-circuits: pre-approved decisions are consumed from ctx.approvedToolCalls, otherwise pendingDurableApproval is set and { approved: 'pending' } is returned to halt the AI SDK loop.
  • tool-wrapper.ts effectiveToolCallId mapping — Pre-approved tool calls reuse the originalToolCallId from a prior round so the stream shows consistent tool call IDs across approval boundaries.
  • WritableBackedHonoSSEStream / WritableBackedVercelWriter — Durable stream adapters that write SSE events into a WDK WritableStream so events are persisted and clients can getReadable() to reconnect.
  • stream-registry.ts globalThis singleton — Registry now uses globalThis instead of module-scoped Map so the WDK workflow bundle and main app share the same stream helper instances.
  • SSEStreamHelper.writeDone() removal — The [DONE] sentinel is no longer written by complete(); the stream simply closes after the finish-reason event.
  • generate.ts schemaOnlyTools option — New option strips execute from tool definitions, giving the LLM tool schemas without actually executing — used by the durable path to get tool-call decisions without side effects.
  • Durable execution timeout constantsdefaults.ts adds four constants (LLM 60 min, MCP tool 30 min, function tool 10 min, tool approval 30 min).
  • build-workflow.ts ?raw import inlining — Pre-processes Vite ?raw imports into string literals before the WDK esbuild build (and restores after), working around esbuild's lack of Vite query suffix support.
  • Manage UI execution mode selector — Adds a <Select> dropdown ("Classic" / "Durable") to the agent metadata editor sidebar.

Summary | 48 files | base: maindurable-execution


agentExecutionWorkflow — durable execution orchestrator

Before: All agent execution ran in-memory within a single HTTP request; a crash or disconnect lost all state.
After: Agents configured with executionMode: 'durable' run inside a WDK workflow with discrete steps — each LLM call, tool execution, and status transition is persisted and recoverable.

The workflow loop (_agentExecutionWorkflow) iterates through LLM calls and tool approvals: initializeTaskStepcallLlmStep → (if tool needs approval) markWorkflowSuspendedSteptoolApprovalHook.create() (suspends) → external resume()markWorkflowResumingStepexecuteToolStep → loop. Each function uses 'use step' / 'use workflow' directives for WDK durability. On completion or failure, dedicated steps update the workflow_executions row.

buildAgentForStep reconstructs the full Agent instance from project/agent config on each step — necessary because durable steps may execute on different processes after recovery.

How does the tool approval hook work?

toolApprovalHook is a WDK defineHook with a deterministic token format: tool-approval:${conversationId}:${workflowRunId}:${toolCallId}. When the workflow reaches a tool needing approval, it creates a hook and awaits it — the WDK suspends the workflow to disk. External callers (the /executions/:id/approvals/:toolCallId endpoint or the chatDataStream approval handler) call toolApprovalHook.resume(token, { approved, reason }) to wake the workflow.

agentExecution.ts · agentExecutionSteps.ts · executionHandler.ts


/api/executions routes and durable chat integration

Before: No dedicated execution management API; tool approvals were purely in-memory via PendingToolApprovalManager.
After: Four new endpoints manage durable executions end-to-end, and both chat.ts and chatDataStream.ts route durable agents through the WDK.

Endpoint Purpose
POST /api/executions Start a durable execution, returns SSE stream + x-workflow-run-id header
GET /api/executions/:id Get execution status (running/suspended/completed/failed)
GET /api/executions/:id/stream Reconnect to an existing execution's SSE stream (supports x-stream-start-index)
POST /api/executions/:id/approvals/:toolCallId Approve or deny a suspended tool call

The existing /v1/chat/completions and /api/chat routes check agent.executionMode and, when 'durable', call workflow/api.start(agentExecutionWorkflow, [...]) instead of running the classic ExecutionHandler. The data stream handler also detects suspended durable executions on the conversation and routes approval parts through toolApprovalHook.resume().

executions.ts · chat.ts · chatDataStream.ts


Durable tool approval flow

Before: Tool approval always waited in-memory via PendingToolApprovalManager, blocking the HTTP connection.
After: When durableWorkflowRunId is set, approval decisions are either consumed from pre-approved ctx.approvedToolCalls (replaying prior decisions) or the flow returns { approved: 'pending' }, setting ctx.pendingDurableApproval to signal the caller to suspend the workflow.

The parseAndCheckApproval return type gains a pendingApproval?: true flag. Both function-tools.ts and mcp-tools.ts check for it and return null (halting execution). tool-wrapper.ts introduces effectiveToolCallId — when a pre-approved entry carries an originalToolCallId, stream events use that ID to maintain consistency with the original approval round.

tool-approval.ts · tool-wrapper.ts · generate.ts


workflow_executions table and executionMode schema

Before: No persistent tracking of workflow execution state; no executionMode property on agents.
After: A new workflow_executions runtime table with (tenant_id, project_id, id) PK and a conversation index tracks each durable run's lifecycle. The manage agent table gains an execution_mode column defaulting to 'classic'.

The runtime migration (0022/0023) creates the table; the manage migration (0013) adds the column. Zod schemas (WorkflowExecutionSelectSchema, WorkflowExecutionInsertSchema, WorkflowExecutionUpdateSchema) and entity types are added. The AgentInsertSchema gains executionMode: z.enum(['classic', 'durable']).optional().

runtime-schema.ts · manage-schema.ts · workflowExecutions.ts · schemas.ts


Durable stream adapters and globalThis registry

Before: Stream helpers were module-scoped Map singletons; no adapter existed for writing SSE events to a WritableStream.
After: WritableBackedHonoSSEStream and WritableBackedVercelWriter bridge the SSE / Vercel data stream formats to WDK's WritableStream<Uint8Array>, and the stream registry uses globalThis so the WDK bundle and main app share one instance.

The SSEStreamHelper.writeDone() method is removed — complete() no longer emits the [DONE] sentinel, as the WDK stream closing signals end-of-stream.

durable-stream-helper.ts · stream-registry.ts · stream-helpers.ts

Pullfrog  | View workflow run | Using Claude Code | Triggered by Pullfrogpullfrog.com𝕏

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(3) Total Issues | Risk: Medium

Re-review scope: This review covers only the delta changes since commit cb73749129ee — 3 commits (ab733eb4a, fa68e4d14, 4ed6f5df5) affecting 10 files. The prior review was an APPROVE after all issues were fixed.

🔴❗ Critical (0) ❗🔴

None.

🟠⚠️ Major (2) 🟠⚠️

Inline Comments:

  • 🟠 Major: agentExecutionSteps.ts:789 Stream always releaseLock() but never close() in executeToolStep — potential stream lifecycle inconsistency
  • 🟠 Major: build-workflow.ts:54-61 File restoration can fail partway through, leaving inconsistent state

🟡 Minor (1) 🟡

Inline Comments:

  • 🟡 Minor: agentExecutionSteps.ts:134-148 Silent catch blocks suppress configuration errors

💭 Consider (1) 💭

Inline Comments:

  • 💭 Consider: agentExecutionSteps.ts:381 Test coverage for new step functions (~500 LOC with no unit tests)

🧹 While You're Here (0) 🧹

None identified.


Delta Analysis

The delta introduces a significant architectural refactor of the workflow execution model:

Before (monolithic step):

runAgentExecutionStep → returns needs_approval or completed

After (granular steps):

initializeTaskStep → creates task, resolves subagent
callLlmStep → LLM generation, returns completion/transfer/tool_calls
executeToolStep → executes single tool after approval

Key observations:

  1. Per-call approval loop: The workflow now iterates per-tool-call within llmResult.toolCalls (lines 70-102 in agentExecution.ts) instead of per-turn. This enables finer-grained crash recovery.

  2. State reconstruction: buildAgentForStep() reconstructs the full Agent on each step invocation. This is necessary for WDK but adds latency (5+ DB queries per step). This is an acceptable tradeoff for durability.

  3. Stream lifecycle: The conditional close() vs releaseLock() pattern in callLlmStep is sound, but executeToolStep always uses releaseLock(). Verify this doesn't cause client issues when workflow completes after tool execution.

  4. Removed [DONE] message: The writeDone() call was removed from SSEStreamHelper.complete(). This is intentional per commit fa68e4d14 but may affect clients that specifically watch for data: [DONE].

Prior Feedback Status ✅

All issues from the previous 4 reviews have been addressed. The prior review at commit cb7374912 was an APPROVE. This delta introduces new Major issues that should be addressed.


💡 APPROVE WITH SUGGESTIONS

Summary: The per-call step architecture is a sound design choice for durable execution with finer-grained crash recovery. The two Major issues identified are straightforward to address: (1) verify the stream lifecycle in executeToolStep doesn't cause client issues, and (2) make the build script's file restoration more robust. Neither blocks the PR, but both should be addressed before production rollout of durable execution to production workloads. The test coverage gap is noted but not blocking given this is a new feature that likely has integration test coverage planned.


Discarded (5)
Location Issue Reason Discarded
agentExecution.ts:90-101 executeToolStep return value ignored The needs_approval return path appears unreachable in the current architecture — tool execution after approval shouldn't trigger another approval. This is defensive code or future-proofing, not a bug.
agentExecutionSteps.ts:172-239 Agent reconstruction adds latency Intentional tradeoff for WDK step isolation and crash recovery. Not a defect.
agentExecutionSteps.ts:458 Session created before try block The session creation is unlikely to fail, and the stream helper is unregistered in finally block. Minor resource leak risk on error path but not actionable.
stream-helpers.ts:380-387 [DONE] removal Intentional per commit message. Clients should watch for finish_reason: 'stop' not [DONE].
build-workflow.ts:32 Regex only matches single quotes Current codebase only uses single quotes for imports. The regex is correct for actual usage.
Reviewers (5)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-standards 1 0 0 0 0 0 1
pr-review-architecture 3 0 0 0 0 0 3
pr-review-sre 4 0 0 0 2 0 2
pr-review-errors 2 0 0 0 1 0 1
pr-review-tests 5 0 1 0 0 0 4
Total 15 0 1 0 3 0 11

Note: Main findings were all routed as inline comments due to their localized, single-file nature with clear fix suggestions.

await agentSessionManager.endSession(requestId);
unregisterStreamHelper(requestId);
await agent.cleanup();
closeable.releaseLock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Stream always releaseLock() but never close() in executeToolStep

Issue: Unlike callLlmStep which conditionally calls close() for terminal states vs releaseLock() for non-terminal states (lines 640-645), executeToolStep always calls releaseLock() regardless of outcome. This asymmetry could leave streams in an inconsistent state.

Why: If tool execution is the final step before workflow completion (after the for-loop in agentExecution.ts:70-102), the stream should be closed properly. The current code releases the lock but doesn't signal stream end, potentially causing clients to wait indefinitely for more data.

Fix: Consider whether executeToolStep should distinguish terminal vs non-terminal outcomes. Looking at the workflow, executeToolStep is followed by either more tool executions (continue the for-loop) or a callLlmStep (next iteration), so releaseLock() may be correct. However, verify this doesn't cause issues when the workflow completes after the last tool execution.

Refs:

Comment on lines +54 to +61
try {
await fn();
} finally {
for (const [filePath, original] of originals) {
await writeFile(filePath, original, 'utf-8');
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: File restoration can fail partway through, leaving inconsistent state

Issue: If writeFile throws during restoration (disk full, permissions, file lock), the loop stops immediately. Files before the failure are restored, files after remain patched. The error is thrown but the partial restoration creates an inconsistent file system state that's hard to diagnose.

Why: A developer or CI run won't know which files are corrupted until TypeScript compilation fails on a subsequent run. git status will show unexpected changes to multiple files, obscuring the actual cause (restoration failure).

Fix: Consider collecting errors and restoring all files before reporting:

finally {
  const errors: Array<{ path: string; error: unknown }> = [];
  for (const [filePath, original] of originals) {
    try {
      await writeFile(filePath, original, 'utf-8');
    } catch (error) {
      errors.push({ path: filePath, error });
    }
  }
  if (errors.length > 0) {
    console.error('Failed to restore files:', errors.map(e => e.path));
    // Optionally throw after all restoration attempts
  }
}

Refs:

Comment on lines +134 to +148
const enhancedInternalRelations = allInternalRelations.map((relation) => {
try {
return enhanceInternalRelation({ relation, agent: agentEntry, project });
} catch {
return relation;
}
});

const enhancedTeamRelations = teamRelations.map((relation) => {
try {
return enhanceTeamRelation({ relation, project });
} catch {
return relation;
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Silent catch blocks suppress potentially important configuration errors

Issue: The catch {} blocks at lines 137 and 145 silently swallow errors from enhanceInternalRelation and enhanceTeamRelation, falling back to the original relation with no logging.

Why: If relation enhancement fails due to misconfigured agents or missing data, the agent silently uses degraded relation data. Debugging "why is my agent not using the right description" becomes difficult with no indication that enhancement failed.

Fix: Add debug-level logging when enhancement fails:

const enhancedInternalRelations = allInternalRelations.map((relation) => {
  try {
    return enhanceInternalRelation({ relation, agent: agentEntry, project });
  } catch (error) {
    logger.debug(
      { relationId: relation.id, error: error instanceof Error ? error.message : error },
      'Failed to enhance internal relation, using original'
    );
    return relation;
  }
});

Refs:

return { taskId, defaultSubAgentId, maxTransfers };
}

export async function callLlmStep(params: CallLlmStepParams): Promise<CallLlmResult> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Consider: Test coverage for new step functions

Issue: The three new step functions (initializeTaskStep, callLlmStep, executeToolStep) have ~500 lines of complex logic with multiple code paths but no direct unit tests.

Why: Each function has distinct outcomes:

  • initializeTaskStep: handles unique constraint errors for idempotency
  • callLlmStep: returns completion, transfer, or tool_calls
  • executeToolStep: returns completed or needs_approval

A bug in any of these paths could cause incorrect workflow behavior, and without tests, regressions won't be caught until production.

Fix: Consider adding unit tests that mock buildAgentForStep, agent.generate(), and database calls to verify each outcome path. At minimum, integration tests covering the durable approval flow would provide confidence.

Refs:

@github-actions github-actions bot deleted a comment from claude bot Mar 16, 2026
@itoqa
Copy link

itoqa bot commented Mar 16, 2026

Ito Test Report ✅

9 test cases ran. 9 passed.

✅ The verified scenarios show durable headers, execution-mode UI persistence, and header-hardening behavior working as expected in this run. Code review of non-passed outcomes did not provide sufficient production-code evidence to elevate them as confirmed defects under the code-first bar.

✅ Passed (9)
Test Case Summary Timestamp Screenshot
ROUTE-2 Execution status endpoint returned HTTP 200 with required schema fields and a valid lifecycle enum value. 0:00 ROUTE-2_0-00.png
ROUTE-5 After adding required model field, /run/v1/chat/completions returned durable SSE headers with a non-empty workflow run id and no fallback-path error response. 10:18 ROUTE-5_10-18.png
SCREEN-1 Execution mode section and interactive Mode selector are present in agent metadata pane. 15:57 SCREEN-1_15-57.png
SCREEN-2 Mode changes persisted after repeated save and reload cycles in both directions. 18:28 SCREEN-2_18-28.png
EDGE-4 Timezone-only and invalid timestamp header combinations were handled with successful durable responses, indicating robust pair handling without service instability. 10:18 EDGE-4_10-18.png
EDGE-5 When creating a new agent without explicitly setting mode, metadata resolves to Classic; API attempt lacked bypass auth but UI fallback confirmed default behavior. 18:42 EDGE-5_18-42.png
ADV-3 Execution succeeded under explicit local test context and metadata remained scoped to agent batch-agent; forwarded header spoof payload did not demonstrate context escalation in captured responses. 30:46 ADV-3_30-46.png
ADV-4 Header fuzzing with script and newline-style payloads produced controlled responses and stable service behavior without observed payload reflection. 10:18 ADV-4_10-18.png
MOBILE-1 At 390x844, metadata controls remained usable and execution mode state persisted after reload. 20:00 MOBILE-1_20-00.png
📋 View Recording

Screen Recording

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant