Skip to content

fix(hitl): fix stream endpoint, pause persistence, and resume page#3995

Merged
icecrasher321 merged 8 commits intostagingfrom
fix/hitl-stream
Apr 6, 2026
Merged

fix(hitl): fix stream endpoint, pause persistence, and resume page#3995
icecrasher321 merged 8 commits intostagingfrom
fix/hitl-stream

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

@waleedlatif1 waleedlatif1 commented Apr 6, 2026

Summary

  • Fix HITL stream endpoint to correctly handle paused workflow execution
  • Fix HITL pause persistence so paused state survives server restarts
  • Fix /stream endpoint to block API key usage (only session auth allowed)
  • Resume page cleanup for better UX

HITL fixes

  • fix(hitl): fix hitl stream — corrected streaming logic for human-in-the-loop blocks
  • fix(hitl): fix hitl pause persistence — paused state now persists correctly
  • fix(stream): block api key usage on /stream endpoint — stream endpoint now requires session auth
  • chore(resume): resume page cleanup

Type of Change

  • Bug fix

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Apr 6, 2026 11:04pm

Request Review

@cursor
Copy link
Copy Markdown

cursor bot commented Apr 6, 2026

PR Summary

Medium Risk
Touches workflow execution lifecycle (pause persistence, SSE eventing, resume execution behavior) and changes auth requirements on the execution stream endpoint, so regressions could impact HITL reliability and observability.

Overview
Fixes HITL execution lifecycle handling so paused runs are persisted consistently and stream consumers can distinguish paused from completed.

Execution callers are refactored to share a new handlePostExecutionPauseState helper, resume executions now write buffered SSE events (including a new execution:paused terminal event) and can return synchronous results for API-key callers, and /executions/{id}/stream is tightened to session-only auth (no API keys). The resume portal UI is cleaned up with better value previews, refresh UX, and input editing tweaks; docs are updated to the new resume endpoint and stream polling flow.

Reviewed by Cursor Bugbot for commit e0147eb. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 6, 2026

Greptile Summary

This PR fixes three HITL subsystem bugs: (1) refactors duplicated pause-persistence logic into a shared handlePostExecutionPauseState utility used across all 6 execution paths, (2) adds SSE event-writer callbacks to the resume execution path so clients can stream progress via the reconnect endpoint, and (3) restricts the /stream endpoint to session-only auth. The refactoring is clean and the stream auth fix is correct. One logic issue exists in the new resume SSE path: when a resumed workflow hits a second HITL block, result.status === 'paused' falls through to the else branch, emitting an execution:completed event and setting execution meta to 'complete', which causes the stream endpoint to close prematurely and mislead clients about the execution state.

  • pause-persistence.ts — new utility centralising pause/resume queue handling across 6 callers; clean extraction with no issues
  • human-in-the-loop-manager.ts — adds streaming callbacks to resume execution; the 'paused' re-pause branch is not handled, causing an incorrect execution:completed event
  • stream/route.ts — session-only auth enforced correctly; API key callers now receive 401
  • resume-page-client.tsxrenderStructuredValuePreview added; uses any type and inline styles against project conventions

Confidence Score: 4/5

Safe to merge after addressing the P1 re-pause logic gap — the stream endpoint closes early and misleads clients when a resumed workflow hits a second HITL block

Score of 4 reflects one P1 finding: when a resumed execution returns status 'paused', the new SSE path writes execution:completed and marks meta as 'complete', causing the stream to close prematurely. This is a real production scenario for any workflow with chained HITL blocks. The two P2 findings (any type, inline styles) are project-convention violations with no runtime impact. All other changes — the pause-persistence extraction, stream auth fix, and background job refactors — are correct and well-structured.

Focus on apps/sim/lib/workflows/executor/human-in-the-loop-manager.ts around the finalMetaStatus/else branch (lines 748-810) for the re-pause handling gap

Important Files Changed

Filename Overview
apps/sim/lib/workflows/executor/pause-persistence.ts New shared utility cleanly extracting post-execution pause handling; well-documented, correct branching, no issues
apps/sim/lib/workflows/executor/human-in-the-loop-manager.ts SSE callbacks and event writer added to resume execution; re-pause path emits incorrect 'completed' event and sets terminal meta status
apps/sim/app/api/workflows/[id]/executions/[executionId]/stream/route.ts Session-only auth enforcement is correct; polling loop and terminal status detection are sound
apps/sim/app/resume/[workflowId]/[executionId]/resume-page-client.tsx UI improvements are solid; renderStructuredValuePreview uses 'any' type and inline styles against project conventions
apps/sim/app/api/resume/[workflowId]/[executionId]/[contextId]/route.ts Pre-emptive setExecutionMeta before fire-and-forget startResumeExecution ensures stream availability; correct approach
apps/sim/background/schedule-execution.ts Mechanical refactor delegating to handlePostExecutionPauseState; logically equivalent to removed inline code
apps/sim/background/webhook-execution.ts Mechanical refactor delegating to handlePostExecutionPauseState; logically equivalent to removed inline code
apps/sim/background/workflow-execution.ts Mechanical refactor delegating to handlePostExecutionPauseState; logically equivalent to removed inline code
apps/sim/lib/workflows/executor/execute-workflow.ts Refactored to use shared pause persistence helper; duplicate pause-handling blocks correctly removed
apps/sim/lib/workflows/executor/queued-workflow-execution.ts Refactored to use shared pause persistence helper; error handling now consistent with all other callers
apps/docs/content/docs/en/blocks/human-in-the-loop.mdx Documentation updated to reflect new resume endpoint path, request shape, and SSE polling instructions; accurate

Sequence Diagram

sequenceDiagram
    participant C as Client
    participant SR as stream/route.ts
    participant RR as resume/route.ts
    participant HM as human-in-the-loop-manager
    participant EW as ExecutionEventWriter
    participant EC as executeWorkflowCore

    C->>SR: GET /stream?from=N (session auth only)
    SR->>SR: getSession() → authorizeWorkflow
    SR->>EW: readExecutionEvents(executionId, N)
    SR-->>C: SSE events stream (polling loop)

    C->>RR: POST /resume/:workflowId/:executionId/:contextId
    RR->>RR: getSession() → authorize
    RR->>EW: setExecutionMeta(resumeExecId, active)
    RR->>HM: startResumeExecution() [fire-and-forget]
    RR-->>C: 200 { executionId: resumeExecId }

    HM->>EW: setExecutionMeta(active)
    HM->>EW: write execution:started
    HM->>EC: executeWorkflowCore(resumeSnapshot, callbacks)
    EC->>HM: onBlockStart → write block:started
    EC->>HM: onStream → write stream:chunk / stream:done
    EC->>HM: onBlockComplete → write block:completed
    EC-->>HM: result

    alt result.status = completed/cancelled
        HM->>EW: write execution:completed or execution:cancelled
        HM->>EW: setExecutionMeta(complete/cancelled)
        SR->>SR: isTerminalStatus → send [DONE]
        SR-->>C: data: [DONE]
    else result.status = paused (P1 BUG)
        HM->>EW: write execution:completed (WRONG)
        HM->>EW: setExecutionMeta(complete) (WRONG)
        SR->>SR: isTerminalStatus('complete') = true
        SR-->>C: data: [DONE] — client misled, workflow still paused
    end
Loading

Comments Outside Diff (2)

  1. apps/sim/lib/workflows/executor/human-in-the-loop-manager.ts, line 748-772 (link)

    P1 Re-paused execution incorrectly emits execution:completed and marks stream as terminal

    The finalMetaStatus type is 'complete' | 'error' | 'cancelled' — there is no 'paused' slot. When a resumed execution hits a second HITL block and returns result.status === 'paused', it falls through to the else branch (lines 757–772), which writes an execution:completed SSE event and leaves finalMetaStatus = 'complete'. The finally block then calls setExecutionMeta(resumeExecutionId, { status: 'complete' }).

    The stream endpoint's isTerminalStatus check — status === 'complete' || status === 'error' || status === 'cancelled' — sees 'complete' and closes the SSE connection. The client receives data: [DONE] thinking the execution is finished, while the workflow is actually paused waiting for another human action. Any downstream subscriber relying on the stream to know when to call the resume endpoint will miss the second pause entirely.

    Add an explicit guard before the else branch:

    } else if (result.status === 'paused') {
      writeBufferedEvent({
        type: 'execution:paused',
        timestamp: new Date().toISOString(),
        executionId: resumeExecutionId,
        workflowId,
        data: { pausePoints: result.pausePoints ?? [] },
      } as ExecutionEvent)
      // finalMetaStatus stays 'complete' — this sub-execution is finished;
      // the parent workflow pause is persisted separately via handlePostExecutionPauseState
    } else {

    This prevents the misleading execution:completed event from being written when the workflow has only re-paused.

  2. apps/sim/app/resume/[workflowId]/[executionId]/resume-page-client.tsx, line 205-237 (link)

    P2 Inline style={{...}} objects should be replaced with Tailwind classes

    The project's styling rule is "No inline styles — Use Tailwind classes." The new renderStructuredValuePreview function contains two JSX elements with inline style objects covering ~12 CSS properties (fontSize, color, display, maxWidth, borderRadius, border, background, padding, whiteSpace, wordBreak, fontFamily, lineHeight). These should be Tailwind utility classes:

    // null / undefined span
    <span className="text-[12px] text-muted-foreground"></span>
    
    // object code viewer wrapper
    <div className="min-w-[220px]">
      <Code.Viewer code={JSON.stringify(value, null, 2)} language="json" wrapText className="max-h-[220px]" />
    </div>
    
    // string / primitive wrapper
    <div className="inline-flex max-w-full rounded-[6px] border border-border bg-[var(--surface-5)] px-2 py-1 whitespace-pre-wrap break-words font-mono text-[12px] leading-4 text-[var(--text-primary)]">
      {stringValue}
    </div>

    CSS variable tokens that lack a Tailwind alias can use bracket notation (bg-[var(--surface-5)]). Static layout and typography properties must be in Tailwind classes.

    Context Used: Tailwind CSS and styling conventions (source)

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Reviews (1): Last reviewed commit: "resume page cleanup" | Re-trigger Greptile

@icecrasher321
Copy link
Copy Markdown
Collaborator

bugbot run

@icecrasher321
Copy link
Copy Markdown
Collaborator

bugbot run

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit e0147eb. Configure here.

@icecrasher321 icecrasher321 merged commit 58571fe into staging Apr 6, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants