fix(cjson): santize bullmq payloads accurately by icecrasher321 · Pull Request #3892 · simstudioai/sim

icecrasher321 · 2026-04-02T01:17:05Z

Summary

Sanitize bullmq payloads since cjson within bullmq is treating empty arrays as objects (known issue).

Type of Change

Bug fix

Testing

Tested manually

Checklist

Code follows project style guidelines
Self-reviewed my changes
Tests added/updated and passing
No new warnings introduced
I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

cursor · 2026-04-02T01:17:10Z

PR Summary

Medium Risk
Touches async execution/dispatch paths (workflow/webhook/schedule) and Redis-backed queueing behavior; mistakes could break background executions or leave jobs unprocessed/over-retained. Changes are mostly defensive normalization and config simplification, but they affect production concurrency/retention settings.

Overview
Fixes BullMQ payload corruption caused by Redis Lua cjson round-tripping empty arrays as objects by adding sanitizeBullMQPayload/ensureArray and using them at worker/executor boundaries (e.g., queued-workflow-execution, workflow-execution, snapshot/selectedOutputs handling).

Simplifies async backend selection by removing shouldExecuteInline/shouldUseBullMQ and basing routing/inline execution decisions on isBullMQEnabled() plus isTriggerDevEnabled across workflow execution, schedules, and webhook polling.

Adjusts BullMQ operational settings: BullMQ is now enabled solely by presence of REDIS_URL (removing CONCURRENCY_CONTROL_ENABLED), queue job cleanup defaults are shortened and capped, workspace-dispatch Redis job TTL drops to 2h, the claim Lua script returns jobId and moves record status updates to TypeScript, and completed/failed dispatch records clear bullmqPayload to reduce retained payload size.

^{Written by Cursor Bugbot for commit 07716cd. Configure here.}

vercel · 2026-04-02T01:17:10Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
docs	Skipped		Apr 2, 2026 1:17am

greptile-apps · 2026-04-02T01:23:15Z

Greptile Summary

This PR fixes a known Lua cjson issue where BullMQ's internal Redis Lua scripts silently corrupt empty JSON arrays ([]) into empty objects ({}) when deserializing job payloads. The fix introduces a new sanitizeBullMQPayload utility that normalises all known array fields at the BullMQ worker boundary, and applies targeted Array.isArray guards at every point where those fields are consumed in the execution engine.

Alongside the core cjson fix, the PR also:

Simplifies BullMQ activation: removes the CONCURRENCY_CONTROL_ENABLED env-var gate so that BullMQ is now enabled whenever REDIS_URL is set. This is a potentially breaking change for self-hosted deployments that have Redis configured for non-BullMQ purposes (caching, sessions) without running BullMQ workers.
Refactors the Redis Lua claim script: the Lua script now returns only a jobId instead of the full record; the status: 'admitting' mutation is moved to TypeScript, while the atomicity-critical operations (lease addition and lane removal) remain in Lua.
Reduces job retention TTLs from 24 h – 7 days to 2 h across all queue types and the dispatch Redis store, cutting the post-failure investigation window.
Removes shouldExecuteInline / shouldUseBullMQ wrappers from async-jobs, replacing them with direct calls to isBullMQEnabled() and the isTriggerDevEnabled constant.

Key items to verify before merge:

Confirm that all production and self-hosted deployments that set REDIS_URL also have BullMQ workers deployed, or add documentation/guards for the new implicit activation behaviour.
Consider whether a 2-hour removeOnFail TTL is sufficient for on-call debugging and job recovery.
Add unit tests for sanitizeBullMQPayload to cover nested field paths and edge cases.

Confidence Score: 4/5

Safe to merge once the implicit BullMQ activation behaviour is confirmed intentional for all deployment targets.
The core cjson sanitization fix is correct and well-placed. The main concern is the removal of CONCURRENCY_CONTROL_ENABLED which makes BullMQ implicit for any REDIS_URL deployment — a P1 behavioural change that could silently break self-hosted setups without BullMQ workers. The TTL reduction is a P2 operational concern. All other changes (Lua refactor, wrapper removal, array guards) look correct.
apps/sim/lib/core/bullmq/connection.ts (implicit BullMQ activation), apps/sim/lib/core/bullmq/queues.ts and apps/sim/lib/core/workspace-dispatch/redis-store.ts (TTL reduction)

Important Files Changed

Filename	Overview
apps/sim/lib/core/utils/json-sanitize.ts	New utility that normalizes empty-array fields corrupted by Lua cjson round-tripping through BullMQ; logic is sound but lacks unit tests for the nested sanitization paths.
apps/sim/lib/core/bullmq/connection.ts	Removes the CONCURRENCY_CONTROL_ENABLED guard so BullMQ is now implicitly active whenever REDIS_URL is set — a potential breaking change for deployments with Redis but no BullMQ workers.
apps/sim/lib/core/bullmq/queues.ts	Uniformly reduces all job retention TTLs from 24h/7d to 2h for both completed and failed jobs, which cuts the debugging/recovery window significantly.
apps/sim/lib/core/workspace-dispatch/redis-store.ts	Significant Lua script refactor: the claim script now returns only jobId instead of the full record; status/lease mutation is moved to TypeScript, preserving atomic lane-removal + lease-add invariant. Edge case when the fetched record has already expired is handled gracefully.
apps/sim/lib/workflows/executor/queued-workflow-execution.ts	Correctly applies sanitizeBullMQPayload before constructing ExecutionSnapshot, ensuring cjson-corrupted empty arrays are restored prior to any execution logic.
apps/sim/background/workflow-execution.ts	Trigger.dev path uses ensureArray only for the top-level callChain field; full sanitizeBullMQPayload is intentionally omitted as Trigger.dev doesn't suffer from the cjson empty-array issue.
apps/sim/lib/webhooks/processor.ts	Polling webhook dispatch logic inverted to be explicit about isTriggerDevEnabled vs isBullMQEnabled paths; functional behaviour is preserved across all backend configurations.
apps/sim/lib/core/async-jobs/config.ts	Removes the shouldExecuteInline and shouldUseBullMQ wrapper functions; call sites now call isBullMQEnabled() and isTriggerDevEnabled directly, simplifying the indirection.

Sequence Diagram

sequenceDiagram
    participant R as Route/Webhook
    participant Q as WorkspaceDispatch
    participant B as BullMQ Worker
    participant S as sanitizeBullMQPayload
    participant E as ExecutionEngine

    R->>R: isBullMQEnabled()?<br/>(now: Boolean(REDIS_URL))
    alt BullMQ enabled
        R->>Q: enqueueWorkspaceDispatch(bullmqPayload)
        Q->>Q: Lua: atomically claim<br/>(add lease + remove from lane)<br/>returns jobId only
        Q->>Q: TypeScript: fetch record,<br/>update status to 'admitting',<br/>save back to Redis
        Q->>B: dispatch BullMQ job
        B->>S: sanitizeBullMQPayload(payload)<br/>fix cjson {} → [] for empty arrays
        S->>E: executeQueuedWorkflowJob(sanitized)
    else TriggerDev enabled
        R->>Q: getJobQueue().enqueue(...)
        Q->>E: executeWorkflowJob(payload)<br/>ensureArray(callChain) only
    else Database/inline mode
        R->>E: executeInline()
    end

_{Reviews (1): Last reviewed commit: "fix(cjson): santization bullmq payloads ..." | Re-trigger Greptile}

greptile-apps · 2026-04-02T01:23:19Z

apps/sim/lib/core/bullmq/connection.ts

 export function isBullMQEnabled(): boolean {
-  return isTruthy(env.CONCURRENCY_CONTROL_ENABLED) && Boolean(env.REDIS_URL)
+  return Boolean(env.REDIS_URL)
 }


Implicit BullMQ activation breaks deployments without workers

isBullMQEnabled() now returns true whenever REDIS_URL is set. Previously it also required CONCURRENCY_CONTROL_ENABLED=true, which was the explicit opt-in gate. Many self-hosted deployments configure REDIS_URL for caching, rate-limiting, or session storage without running BullMQ workers. After this change, any such deployment will start routing async workflow executions to BullMQ queues that are never drained — jobs silently queue up and never run.

The old guard (isTruthy(env.CONCURRENCY_CONTROL_ENABLED) && Boolean(env.REDIS_URL)) ensured both the infrastructure (Redis) and the operator intent (CONCURRENCY_CONTROL_ENABLED=true) were present. Removing CONCURRENCY_CONTROL_ENABLED entirely eliminates that intent signal with no migration path.

If the goal is to consolidate configuration, consider keeping REDIS_URL as the sole gate but documenting this clearly, and ensuring all deployment guides explicitly note that providing REDIS_URL now enables BullMQ.

greptile-apps · 2026-04-02T01:23:20Z

apps/sim/lib/core/bullmq/queues.ts

+        removeOnComplete: { age: 2 * 60 * 60, count: 1000 },
+        removeOnFail: { age: 2 * 60 * 60, count: 500 },
      }
    case 'webhook-execution':
      return {


Aggressive TTL reduction limits observability and retry windows

All BullMQ queue retention has been reduced from 24 h / 7 days down to 2 hours, and the same change applies to JOB_TTL_SECONDS in redis-store.ts (previously 48 h). While shorter retention helps control Redis memory, 2 hours is a very tight window:

A failed workflow job that fails at 01:00 and isn't noticed until 03:30 will have already been evicted — no state to inspect or manually retry.

removeOnFail at 2 h means failed jobs disappear before an on-call engineer in a different timezone can investigate them.

The previous removeOnFail values (7 days for workflow-execution, 3 days for webhook-execution and schedule-execution) provided a more reasonable recovery window. Consider retaining a longer failure TTL (e.g., 24–72 h) while keeping the shorter success TTL for completed jobs.

greptile-apps · 2026-04-02T01:23:21Z

apps/sim/lib/core/utils/json-sanitize.ts

+export function sanitizeBullMQPayload(payload: any): void {
+  if (!payload) return
+
+  payload.selectedOutputs = ensureArray(payload.selectedOutputs)
+
+  if (payload.metadata) {
+    payload.metadata.callChain = ensureArray(payload.metadata.callChain)
+
+    if (payload.metadata.pendingBlocks !== undefined) {
+      payload.metadata.pendingBlocks = ensureArray(payload.metadata.pendingBlocks)
+    }
+
+    if (payload.metadata.workflowStateOverride?.edges !== undefined) {
+      payload.metadata.workflowStateOverride.edges = ensureArray(
+        payload.metadata.workflowStateOverride.edges
+      )
+    }
+  }
+
+  if (payload.runFromBlock?.sourceSnapshot) {
+    const state = payload.runFromBlock.sourceSnapshot
+    for (const field of EXECUTION_STATE_ARRAY_FIELDS) {
+      if (field in state && !Array.isArray(state[field])) {
+        state[field] = []
+      }
+    }
+
+    if (state.dagIncomingEdges && typeof state.dagIncomingEdges === 'object') {
+      for (const key of Object.keys(state.dagIncomingEdges)) {
+        if (!Array.isArray(state.dagIncomingEdges[key])) {
+          state.dagIncomingEdges[key] = []
+        }
+      }
+    }
+  }


sanitizeBullMQPayload has no unit tests

The PR checklist marks "Tests added/updated and passing", but no test file for json-sanitize.ts appears in the diff. The sanitization logic handles several nested paths (payload.metadata.callChain, payload.metadata.workflowStateOverride.edges, payload.runFromBlock.sourceSnapshot.*, and dagIncomingEdges per-key) that are easy to accidentally break.

Consider adding a dedicated test file that exercises:

Top-level fields (selectedOutputs: {} → [])

Nested metadata fields

The EXECUTION_STATE_ARRAY_FIELDS loop with mixed present/absent keys

The dagIncomingEdges per-key fix

A null/undefined payload (guard at line 34)

greptile-apps · 2026-04-02T01:23:22Z

apps/sim/app/api/schedules/execute/route.ts

        )

-        if (shouldExecuteInline()) {
+        if (!isBullMQEnabled() && !isTriggerDevEnabled) {


isTriggerDevEnabled used inconsistently as a value vs. function call

isBullMQEnabled() is called as a function (parentheses), while isTriggerDevEnabled is referenced as a constant (no parentheses). Both are correct — isBullMQEnabled is a function and isTriggerDevEnabled is a module-level constant — but the asymmetry can confuse readers who might expect both to be function calls given the naming convention isFoo.

The same pattern appears at apps/sim/app/api/workflows/[id]/execute/route.ts (line 246) and apps/sim/lib/webhooks/processor.ts (line 1268). No code change is strictly necessary, but adding an inline comment clarifying that isTriggerDevEnabled is a constant (evaluated at module load) would aid readability.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

icecrasher321 added 11 commits March 30, 2026 10:58

improvement(queue): remove indirection for flag

93efac6

improvement(lua): return job id instedad of serializing within lua

6cc1233

memory improvements

584d591

cleanup code

c4987da

fix types

6b9a1bc

fix trigger dev, bullmq, inline prioritization

d97f27e

address comments

72d8a68

fix tests

f52ab39

remove unused lua code

d34d6f4

Merge branch 'staging' into fix/q-sys

9fc22a5

fix(cjson): santization bullmq payloads accurately

07716cd

icecrasher321 merged commit 7c68061 into dev Apr 2, 2026
7 checks passed

greptile-apps bot reviewed Apr 2, 2026

View reviewed changes

waleedlatif1 deleted the fix/q-sys-backup branch April 2, 2026 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cjson): santize bullmq payloads accurately#3892

fix(cjson): santize bullmq payloads accurately#3892
icecrasher321 merged 11 commits intodevfrom
fix/q-sys-backup

icecrasher321 commented Apr 2, 2026

Uh oh!

cursor bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

vercel bot commented Apr 2, 2026

Uh oh!

Uh oh!

greptile-apps bot commented Apr 2, 2026

Uh oh!

greptile-apps bot Apr 2, 2026

Uh oh!

greptile-apps bot Apr 2, 2026

Uh oh!

greptile-apps bot Apr 2, 2026

Uh oh!

greptile-apps bot Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

icecrasher321 commented Apr 2, 2026

Summary

Type of Change

Testing

Checklist

Uh oh!

cursor bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

vercel bot commented Apr 2, 2026

Uh oh!

Uh oh!

greptile-apps bot commented Apr 2, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cursor bot commented Apr 2, 2026 •

edited

Loading