Skip to content

Workflow dep resolution leaves scheduled_at stale, breaking queue delay monitoring #1185

@jguttman94

Description

@jguttman94

Description

When WorkflowStageJobs / WorkflowStageJobsByIDMany resolve dependencies and transition a job from pending to available, the scheduled_at column is not updated. It retains its original value from insertion time, which can be hours or months old for long-running workflows.

The UPDATE in both queries only sets state and metadata.workflow_staged_at:

UPDATE river_job
SET
  state = jobs_to_make_available.new_state,
  metadata = jsonb_set(metadata, '{workflow_staged_at}'::text[], $1::jsonb, true)
FROM jobs_to_make_available
WHERE river_job.id = jobs_to_make_available.id

The jobs_to_make_available CTE already reads scheduled_at to decide the target state (available if scheduled_at <= now() + 5s, otherwise scheduled), so by the time the UPDATE executes, the original scheduled_at value has served its purpose.

Impact

Any monitoring that uses NOW() - scheduled_at on available jobs to measure queue delay will report wildly inflated values for dependency-resolved workflow jobs. For workflows where deps take hours or months to resolve, this produces false alarms on queue health metrics.

Current workaround

We discovered that workflow_staged_at is already stamped in metadata during dep resolution, so we use it as a fallback in our metrics query:

MAX(
  CASE
    WHEN metadata ? 'workflow_staged_at'
      THEN NOW() - (metadata->>'workflow_staged_at')::timestamptz
    ELSE NOW() - scheduled_at
  END
) as oldest_delay

This works but requires casting a JSONB string to timestamptz in an aggregate query, which is less ergonomic than using the native scheduled_at column directly.

Proposed solutions

Either of these would address the problem:

  1. Update scheduled_at = now() in WorkflowStageJobs / WorkflowStageJobsByIDMany when transitioning jobs to available. This makes scheduled_at accurately reflect when the job became eligible for pickup, consistent with how non-workflow jobs behave. For jobs transitioning to scheduled (because their scheduled_at is still in the future), no change is needed — scheduled_at is already correct.

  2. Add a first-class available_at column to river_job that records when a job entered the available state, regardless of how it got there (direct insert, scheduled time reached, or workflow dep resolution). This would give monitoring queries a reliable, indexed timestamp without relying on scheduled_at semantics or JSONB metadata. It would also benefit non-workflow use cases like jobs inserted with Pending: true that are later moved to available by application code.

Environment

  • River Pro v0.22.0
  • PostgreSQL

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions