Skip to content

Immediate task timeout leaves task stuck in RUNNING state #7599

@dkliban

Description

@dkliban

Summary

When an immediate task exceeds the 5-second IMMEDIATE_TIMEOUT, the log message is
correctly produced:

pulpcore.tasking.tasks:INFO: Immediate task <uuid> timed out after 5 seconds.

However, the task can remain stuck in running state rather than transitioning to
failed. This affects both the PostgreSQL and Redis (WORKER_TYPE=redis) worker paths,
since both share the same _execute_task/_aexecute_task code path.

Root cause

When asyncio.wait_for cancels the inner coroutine on timeout, Django's
sync_to_async (with thread_sensitive=True) serializes all ORM operations through
the main thread. The cancelled coroutine's thread may still be running a database
operation, blocking the main thread queue. The subsequent set_failed call (which
needs the same main thread) is therefore delayed. During this window the task appears
stuck in running.

If a cancel request arrives during this delay:

  • set_canceling() and the delayed set_failed race to UPDATE the task row.
  • If set_failed wins: task transitions to failed with the timeout message.
  • If set_canceling wins: task transitions to canceling; set_failed then finds
    0 matching rows (state is no longer running) and raises RuntimeError, which
    propagates up uncaught, leaving the task stuck in canceling until a worker
    eventually cleans it up as canceled.

Expected behavior

The task transitions to failed immediately after the 5-second timeout with an error
describing the timeout.

Actual behavior

The task remains in running state. It may eventually transition to failed (with the
timeout error) after a delay, or to canceled if a cancel request races ahead of the
delayed set_failed.

Environment

  • pulpcore version: 3.108.0
  • WORKER_TYPE: redis

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions