Skip to content

fix(execution): queued execution finalization and async correlation#3535

Merged
icecrasher321 merged 11 commits intosimstudioai:stagingfrom
PlaneInABottle:pr/phase-1-2-upstream-staging
Mar 13, 2026
Merged

fix(execution): queued execution finalization and async correlation#3535
icecrasher321 merged 11 commits intosimstudioai:stagingfrom
PlaneInABottle:pr/phase-1-2-upstream-staging

Conversation

@PlaneInABottle
Copy link

@PlaneInABottle PlaneInABottle commented Mar 12, 2026

Summary

This PR isolates the first two remediation steps from a larger webhook/async execution incident branch.

It does two things:

  1. makes queued execution finalization happen inside the core execution path instead of relying on wrapper-level recovery
  2. preserves request/execution correlation across queued async execution paths so background runs can be traced consistently

Problem

We were seeing cases where async/queued executions could finish logically, but terminal-state finalization was still dependent on detached follow-up behavior outside the core execution path.

That creates two operational problems:

  • an execution can do the actual work, but fail to persist a durable terminal state if the wrapper process is interrupted or recycled at the wrong time
  • webhook/schedule/workflow async paths do not consistently preserve request/execution correlation, which makes queued runs harder to trace and reconcile once they leave the initial request path

Root cause

1) Terminal-state finalization was not owned strongly enough by the core execution path

Some queued execution flows still depended on wrapper-side cleanup/recovery behavior to finish terminal logging/finalization. That means the execution engine and the durable terminal-state write were not fully coupled.

2) Async correlation was not propagated consistently across queued boundaries

Queued webhook/schedule/workflow execution paths did not always carry the same correlation information all the way through preprocessing, enqueueing, and background execution. That made it harder to connect:

  • the incoming request
  • the queued job
  • the final execution record

What changed

A. Finalize runs inside core execution

This PR moves terminal finalization responsibility into the core execution flow so the same code path that determines the execution outcome is also responsible for finalizing it durably.

Concretely, the changes ensure that:

  • terminal execution outcomes are finalized from core execution
  • wrapper-level duplicate/fallback finalization paths are reduced or guarded
  • timeout/cancellation cleanup is handled more predictably
  • logging/session behavior is covered by targeted regression tests

B. Preserve async correlation across queued execution paths

This PR also threads correlation data through queued execution paths so the same execution/request identity survives across:

  • workflow async execution
  • schedule async execution
  • webhook preprocessing
  • webhook background execution
  • Trigger.dev-backed async dispatch

Concretely, the changes add or preserve:

  • preassigned execution IDs where needed before queueing
  • request/execution correlation in preprocessing
  • correlation on queued payloads and metadata
  • regression coverage for async webhook/schedule/workflow correlation behavior

Files / surfaces affected

Main execution/finalization surfaces:

  • apps/sim/lib/workflows/executor/execution-core.ts
  • apps/sim/lib/logs/execution/logging-session.ts
  • apps/sim/background/workflow-execution.ts
  • apps/sim/background/webhook-execution.ts
  • apps/sim/background/schedule-execution.ts

Async correlation / queueing surfaces:

  • apps/sim/app/api/workflows/[id]/execute/route.ts
  • apps/sim/app/api/schedules/execute/route.ts
  • apps/sim/lib/webhooks/processor.ts
  • apps/sim/lib/execution/preprocessing.ts
  • apps/sim/lib/core/async-jobs/types.ts
  • apps/sim/lib/core/async-jobs/backends/trigger-dev.ts

Why this PR is intentionally scoped this way

This PR is intentionally limited to the first two fixes from the broader incident work:

  • finalization durability
  • async correlation preservation

It does not include the later follow-up work around:

  • richer stuck-execution diagnostics
  • progress markers / last-block visibility
  • stale cleanup classification and reconciliation
  • later review-driven hardening passes

That split is intentional so this PR stays focused on the minimum behavior changes needed to:

  • make terminal finalization more reliable for queued runs
  • make async runs traceable across queue boundaries

Testing

Targeted tests were rerun on this isolated branch:

Finalization / execution-core coverage

  • bun --cwd apps/sim vitest run lib/workflows/executor/execution-core.test.ts lib/logs/execution/logging-session.test.ts

Async correlation coverage

  • bun --cwd apps/sim vitest run background/async-execution-correlation.test.ts background/async-preprocessing-correlation.test.ts lib/execution/preprocessing.test.ts lib/execution/preprocessing.webhook-correlation.test.ts "app/api/workflows/[id]/execute/route.async.test.ts" "app/api/schedules/execute/route.test.ts" "app/api/webhooks/trigger/[path]/route.test.ts"

Results on this PR branch:

  • 9 test files passed
  • 34 tests passed

Reviewer notes

The large diff is mostly because the webhook execution / processor surfaces sit at the boundary where correlation has to be preserved end-to-end.

The intended review focus is:

  1. terminal-state ownership moving into core execution
  2. correlation propagation through preprocessing + queueing + background execution
  3. regression coverage matching those two behaviors

@cursor
Copy link

cursor bot commented Mar 12, 2026

PR Summary

Medium Risk
Changes core execution finalization and logging semantics, which can affect durability of execution records and failure handling across workflow/webhook/schedule paths. Correlation/threading changes touch multiple enqueue/background boundaries and could impact tracing and job metadata consumers.

Overview
Makes terminal execution finalization synchronous and owned by executeWorkflowCore. Post-run completion (success/paused/cancelled/error) is now awaited in-core, always clears cancellation via a safe helper, and introduces a guard (wasExecutionFinalizedByCore) to prevent wrapper layers from double-finalizing.

Threads a new AsyncExecutionCorrelation object across queued boundaries. Schedule, webhook, and async workflow API routes now preassign executionId/requestId, include correlation in payloads and job metadata, and background jobs + preprocessing/logging preserve and fall back for legacy payloads.

Hardens logging behavior. LoggingSession gains completion-attempt dedup/retry behavior and explicit pause/error semantics, and ExecutionLogger preserves correlation from start through completion; extensive new tests cover finalization ordering and correlation propagation.

Written by Cursor Bugbot for commit a72a671. Configure here.

@vercel
Copy link

vercel bot commented Mar 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Mar 13, 2026 7:34am

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 12, 2026

Greptile Summary

This PR fixes two related reliability issues in async execution paths: it moves execution finalization out of fire-and-forget wrappers into the core execution path so terminal state is durably written for webhook, schedule, and workflow runs, and it pre-assigns correlation identifiers at the point of enqueueing so background jobs can trace back to the originating request without generating new IDs.

Key changes:

  • execution-core.ts: finalizeExecutionOutcome and finalizeExecutionError replace the previous fire-and-forget void (async () => {...})() blocks with awaited calls. A module-level finalizedExecutionIds Set + markExecutionFinalizedByCore/wasExecutionFinalizedByCore prevent duplicate finalization in background wrappers.
  • logging-session.ts: Stores triggerData.correlation on the session and resets completionPromise to null on rejection (allowing retries after failure); buildCompletedExecutionData in logger.ts now preserves environment, trigger, and correlation from the start record through to the terminal record.
  • preprocessing.ts: Threads triggerData through all logPreprocessingError call-sites so preprocessing failure logs carry full correlation.
  • Background executors (schedule-execution.ts, workflow-execution.ts): Correctly add buildXxxCorrelation helpers, pass triggerData: { correlation } to both preprocessExecution and safeStart, and gate post-error finalization on wasExecutionFinalizedByCore.
  • webhook-execution.ts: Adds the same correlation propagation but is missing triggerData: { correlation } in its preprocessExecution call (unlike the other two executors), leaving preprocessing failure logs without correlation metadata. The file was also reformatted to double-quotes/semicolons, inconsistent with the rest of the codebase.

Confidence Score: 3/5

  • Safe to merge with two issues to address: a missing correlation argument in the webhook executor's preprocessing call and a quote-style inconsistency that will likely fail linting.
  • The core finalization logic is sound and a meaningful reliability improvement. The schedule and workflow paths are correctly and completely updated. The webhook path, however, has a gap — preprocessExecution is called without triggerData: { correlation }, breaking the correlation chain for webhook preprocessing failures in a way that's inconsistent with the other two executors. Additionally, webhook-execution.ts was reformatted to double-quotes with semicolons, which is inconsistent with the project's single-quote style and will fail linting. Both issues are straightforward to fix but should be resolved before merge.
  • Pay close attention to apps/sim/background/webhook-execution.ts — missing triggerData in preprocessExecution and inconsistent quote style throughout the file.

Important Files Changed

Filename Overview
apps/sim/lib/workflows/executor/execution-core.ts Core change: moves post-execution finalization from fire-and-forget to awaited calls, adds finalizeExecutionOutcome/finalizeExecutionError helpers, and introduces a module-level finalizedExecutionIds Set with potential unbounded growth in long-lived processes.
apps/sim/background/webhook-execution.ts Adds correlation propagation and wasExecutionFinalizedByCore guard; however preprocessExecution is called without triggerData: { correlation } (unlike schedule and workflow executors), and the entire file was reformatted to double-quotes/semicolons inconsistent with the codebase style.
apps/sim/background/schedule-execution.ts Correctly adds buildScheduleCorrelation, passes triggerData: { correlation } to preprocessExecution, adds an explicit safeStart with correlation, and gates post-error finalization on wasExecutionFinalizedByCore.
apps/sim/background/workflow-execution.ts Correctly adds buildWorkflowCorrelation, passes triggerData: { correlation } in both preprocessExecution and safeStart, and guards the catch block with wasExecutionFinalizedByCore.
apps/sim/lib/logs/execution/logging-session.ts Stores triggerData.correlation in a private field and now resets completionPromise to null on rejection — a design change that allows retried completions after failure, with a subtle risk of double-completion if internal guards aren't airtight.
apps/sim/lib/logs/execution/logger.ts Refactors completion logic into buildCompletedExecutionData, which now preserves environment, trigger, and correlation from the initial start log through to the final completion record. Clean and correct.
apps/sim/lib/execution/preprocessing.ts Threads triggerData through all error-logging paths so that preprocessing failures carry full correlation metadata when a LoggingSession error record is written.
apps/sim/lib/webhooks/processor.ts Adds a pre-assigned correlation object in checkWebhookPreprocessing and queueWebhookExecution, and includes it in the job metadata; largely reformatted to double-quotes which is inconsistent with codebase style.

Sequence Diagram

sequenceDiagram
    participant R as Route Handler<br/>(schedule/webhook/workflow)
    participant Q as Job Queue
    participant BG as Background Executor<br/>(schedule/webhook/workflow)
    participant PP as preprocessExecution
    participant LS as LoggingSession
    participant EC as executeWorkflowCore
    participant LOG as ExecutionLogger

    R->>R: generate executionId + correlation
    R->>Q: enqueue(payload { executionId, requestId, correlation }, metadata { correlation })
    Q-->>BG: deliver job

    BG->>BG: buildXxxCorrelation(payload)<br/>reuse or generate executionId/requestId
    BG->>LS: new LoggingSession(workflowId, executionId, triggerType, requestId)
    BG->>PP: preprocessExecution({ ..., triggerData: { correlation }, loggingSession })
    alt Preprocessing fails
        PP->>LS: logPreprocessingError({ ..., triggerData }) → safeCompleteWithError
        LS->>LOG: startWorkflowExecution + completeWithError<br/>(correlation preserved in executionData)
    else Preprocessing succeeds
        BG->>LS: safeStart({ triggerData: { correlation } })
        LS->>LOG: startWorkflowExecution<br/>executionData.correlation = triggerData.correlation
        BG->>EC: executeWorkflowCore({ snapshot, loggingSession })
        EC->>LS: safeStart (with metadata.correlation)
        alt Execution succeeds
            EC->>EC: finalizeExecutionOutcome()
            EC->>LS: safeComplete / safeCompleteWithCancellation / safeCompleteWithPause
            LS->>LOG: completeWorkflowExecution<br/>buildCompletedExecutionData preserves correlation
        else Execution throws
            EC->>EC: finalizeExecutionError()
            EC->>LS: safeCompleteWithError
            EC->>EC: markExecutionFinalizedByCore(error, executionId)
            EC-->>BG: rethrow error
            BG->>BG: wasExecutionFinalizedByCore(error, executionId) → true<br/>skip duplicate finalization
        end
    end
Loading

Last reviewed commit: 78db4ee

@icecrasher321
Copy link
Collaborator

icecrasher321 commented Mar 12, 2026

@PlaneInABottle I'll review this. But can you mark the bugbot/greptile comments as resolved or respond to them if they're inaccurate.

@icecrasher321 icecrasher321 self-requested a review March 12, 2026 20:50
@icecrasher321 icecrasher321 changed the title fix queued execution finalization and async correlation fix(execution): queued execution finalization and async correlation Mar 12, 2026
Let executeWorkflowCore own normal-path logging start so queued workflow and schedule executions persist the richer deployment and environment metadata instead of an earlier placeholder start record.
Prevent leaked core finalization markers from accumulating while keeping outer recovery paths idempotent. Preserve best-effort logging completion by reusing settled completion promises instead of reopening duplicate terminal writes.
Keep execution finalization cleanup best-effort so cancellation cleanup failures do not overwrite successful or failed outcomes. Restore webhook processor formatting to the repository Biome style to avoid noisy formatter churn.
Retry minimal logging for early failures, only mark core finalization after a log row actually completes, and let paused completions fall back cleanly.
PlaneInABottle and others added 2 commits March 13, 2026 10:19
Scan all finalized execution ids during TTL cleanup so refreshed keys cannot keep expired guards alive, and cover the reused-id ordering regression.
Allow error finalization to retry after a non-error completion and fallback both fail, and always persist failed/error semantics for completeWithError.
@PlaneInABottle PlaneInABottle force-pushed the pr/phase-1-2-upstream-staging branch from df55288 to ef63e9b Compare March 13, 2026 07:20
Thread preprocessing execution identity into queued webhook execution so both phases share the same correlation and logs.
@icecrasher321
Copy link
Collaborator

bugbot run

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Copy link
Collaborator

@icecrasher321 icecrasher321 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clear requirement for threading through correlation. Preliminary testing works correctly. WIll continue testing on staging ennvironment with prod replica of trigger dev setup.

@icecrasher321 icecrasher321 merged commit 9229002 into simstudioai:staging Mar 13, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants