Skip to content

taskprocessing:worker does not atomically claim tasks → duplicate processing with multiple workers #61052

@bygadd

Description

@bygadd

Bug description

Running multiple occ taskprocessing:worker processes in parallel (as the AI admin docs recommend — "run the command 4 or more times") makes the same scheduled task get processed by more than one worker. This happens in two ways:

  • concurrently — several workers pick up the same task in the same instant, and
  • sequentially — a second worker re-claims a task the first has already finished.

Each duplicate is a full extra provider/LLM invocation → wasted worker capacity and multiplied external API cost. On a busy instance the effective throughput collapses toward a single worker even when many are running.

Root cause (two parts)

1. OC\Core\Command\TaskProcessing\WorkerCommand::processNextTask() never calls lockTask().
It calls IManager::getNextScheduledTask() (a plain SELECT, no lock) and then processTask() directly. By contrast the OCS endpoint TaskProcessingApiController::getNextScheduledTask() does it correctly — it loops, calls lockTask(), and skips tasks it failed to claim. So concurrent CLI workers all SELECT the same scheduled row and all proceed to process it.

2. TaskMapper::lockTask() guards with status != STATUS_RUNNING instead of status = STATUS_SCHEDULED.
This lets a second worker re-claim a task that is in any non-running state (including STATUS_SUCCESSFUL) which it had SELECTed before the first worker finished → sequential re-processing of an already completed task.

Steps to reproduce

  1. Configure a synchronous TaskProcessing provider (e.g. integration_openai).
  2. Run 4 workers in parallel: occ taskprocessing:worker -v ×4.
  3. Schedule several core:text2text tasks.
  4. Watch the worker logs: the same task id is logged Processing task N / Finished processing task N by multiple PIDs.

Verified on 33.0.5 (MySQL): with 4 workers the same task id appears 4× within the same second. After applying only fix #1, a residual ~1/15 sequential re-processing of already-STATUS_SUCCESSFUL tasks remained — traced to root cause #2.

Expected behavior

Each scheduled task is processed exactly once, regardless of how many parallel workers run.

Proposed fix (verified — 8 workers, 0 duplication)

  1. In WorkerCommand::processNextTask(), after getNextScheduledTask() call lockTask() and, when it returns false, add the task id to an ignore list and re-fetch — mirroring TaskProcessingApiController.
  2. In TaskMapper::lockTask(), change the guard from ->neq('status', … STATUS_RUNNING) to ->eq('status', … STATUS_SCHEDULED).

With both changes, 8 concurrent workers processed every task exactly once (0 duplicates over repeated measurement windows). A PR follows.

Affected versions

The missing lockTask() call and the != running guard are both present on 30.x–33.x and current master.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    To triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions