Skip to content

[13.x] Resumable Jobs#59975

Open
cosmastech wants to merge 60 commits intolaravel:13.xfrom
cosmastech:resumable-jobs
Open

[13.x] Resumable Jobs#59975
cosmastech wants to merge 60 commits intolaravel:13.xfrom
cosmastech:resumable-jobs

Conversation

@cosmastech
Copy link
Copy Markdown
Contributor

@cosmastech cosmastech commented May 2, 2026

Vision

Job batches/chains don't make it easy to pass data from one job to the next. Interruptible was added to the framework last week, but handling these interrupts is neither intuitive nor easy. More than likely, if a job is interrupted, we would probably want it to resume at a given state. What if Laravel offered an easy way to keep a durable object in memory so that resuming a job (or any code) in a given state had simple, beautiful primitives?

Example

class ProvisionWorkspace implements ShouldQueue, Resumable
{
    use Dispatchable;
    use InteractsWithQueue;
    use Queueable;
    use ResumableTrait;
    use SerializesModels;

    public int $tries = 3;

    // Keep the context saved after this job is completed
    public bool $deleteContextWhenCompleted = false;

    public function __construct(
        public Workspace $workspace,
    ) {
    }

    public function handle(LoggerInterface $logger): void
    {
        $billingCustomerId = $this->step(
            'create-billing-customer',
            $this->createBillingCustomer(...),
        );

        // Not everything has to be executed inside of a step
        $this->workspace->forceFill(['billing_customer_id' => $billingCustomerId])->save();

        $starterProjectId = $this->step('create-starter-project', $this->createStarterProject(...));
        $logger->info('Starter project created', ['starter_project_id' => $startProjectId]);
        $logger->info('Workspace provisioning advanced.', [
            'workspace_id' => $this->workspace->id,
        ]);

        $this->step('enable-trial-features', $this->enableTrialFeatures(...));

        $this->step('notify-workspace-ready', fn () => $this->notifyWorkspaceReady($this->workspace));
    }

    private function createBillingCustomer(): string
    {
        return app(BillingProvider::class)->createCustomer($this->workspace->id);
    }

    private function createStarterProject(): int
    {
        return app(StarterProjectCreator::class)->create($this->workspace->id)->id;
    }

    private function enableTrialFeatures(): void
    {
        app(TrialFeatureEnabler::class)->enableFor($this->workspace->id);
    }

    private function notifyWorkspaceReady(Workspace $workspace): void
    {
        $projectId = $this->context
            ->getState()
            ->resultFor('create-starter-project');

        event(new WorkspaceProvisioned($workspace->id, $projectId));

        Mail::to($workspace->owner->email)->queue(
            new WorkspaceReady($workspace, $projectId),
        );
    }
}

Where to go from here

I would love to make these simpler to use. From a thread on Twitter, it seems like people really want the idea of workflows that can be "woken up" later on. So rather than just something which lives in Redis and is tightly coupled to a job execution, they want a workflow that persists the state of an execution and is later woken up on user action. I would probably say we need a DB store and then a Manager class.

I added the primitives to make this easier to do. I realize in its current state, it seems a bit more complicated than is absolutely necessary, but I really believe there is a strong appetite for this functionality. I will happily continue to expand on these after the fact.

Questions for Mr. Otwell

  • Is this a new Illuminate\Workflows component? My gut says yes, since it doesn't make sense to be coupled to the Bus component. That said, I was trying to make this as minimally scary to you as possible.
  • What's a better name for ExecutionContext? I don't love that name and I figured you would have a better suggestion for it.

@cosmastech cosmastech marked this pull request as draft May 4, 2026 16:10
Comment thread src/Illuminate/Bus/ExecutionContext/CacheExecutionRepository.php Outdated
Comment thread src/Illuminate/Bus/ExecutionContext/ExecutionContext.php Outdated
Comment thread src/Illuminate/Bus/ExecutionContext/ExecutionState.php
Comment thread src/Illuminate/Bus/ExecutionContext/ExecutionState.php Outdated
Comment thread src/Illuminate/Bus/ExecutionContext/ExecutionState.php Outdated
Comment thread src/Illuminate/Bus/ExecutionContext/ExecutionState.php Outdated
Comment thread src/Illuminate/Bus/ExecutionContext/ExecutionState.php Outdated
Comment thread src/Illuminate/Queue/ResumableTrait.php
@cosmastech cosmastech marked this pull request as ready for review May 5, 2026 02:13
Comment on lines +139 to +141
* @param array<array-key, mixed>|\Illuminate\Contracts\Support\Arrayable $options
* @param ExecutionStepResult $stepResult
* @return array
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can improve the typing here:

Suggested change
* @param array<array-key, mixed>|\Illuminate\Contracts\Support\Arrayable $options
* @param ExecutionStepResult $stepResult
* @return array
* @template TKey of array-key
* @template TValue
*
* @param array<TKey, TValue>|\Illuminate\Contracts\Support\Arrayable<TKey, TValue> $options
* @param ExecutionStepResult $stepResult
* @return array<TKey, TValue>

@martinbean
Copy link
Copy Markdown
Contributor

@cosmastech I’m trying to understand what problem this solves that a batch doesn’t? As in this case (provisioning a workspace) I’d just have a CreateWorkspace job that then dispatches child jobs, passing the workspace model instance as a parameter to each child:

class CreateWorkspace implements ShouldQueue
{
    use Queueable;

    public function __construct(public string $name) {}

    public function handle(): void
    {
        $workspace = Workspace::query()->create(['name' => $this->name]);

        Bus::batch([
            new CreateWorkspaceBillingCustomer($workspace),
            new CreateWorkspaceStarterProject($workspace),
            new EnableWorkspaceTrialFeatures($workspace),
        ])->then(function (Batch $batch) {
            WorkspaceReady::dispatch($workspace);
        })->name(sprintf('Set up workspace [%s]', $workspace->id))->dispatch();
    }
}

The child jobs could then be idempotent if you need to re-run the batch in case of failure:

class CreateWorkspaceBillingCustomer implements ShouldQueue
{
    use Batchable;
    use Queueable;

    public function __construct(public Workspace $workspace) {}

    public function handle(): void
    {
        if ($this->workspace->billingCustomer()->exists()) {
            return;
        }

        $this->workspace->billingCustomer()->create([
            // Workspace billing customer attributes...
        ]);
    }
}

@cosmastech
Copy link
Copy Markdown
Contributor Author

cosmastech commented May 5, 2026

@cosmastech I’m trying to understand what problem this solves that a batch doesn’t? As in this case (provisioning a workspace) I’d just have a CreateWorkspace job that then dispatches child jobs, passing the workspace model instance as a parameter to each child:

Thanks for the reply @martinbean. The example above could certainly be done with a chained or batched job. This is one of the pains of trying to choose a toy example 😅

Trying to think of an example where I had to implement this in the past, and if I recall correctly, it was something to the effect of:

$paymentApiResponse = Cache::get('payment-request:' . $this->transaction->id);
if (! $paymentApiResponse) {
    $paymentApiResponse = $this->submitPayment();
    Cache::put('payment-request'.$this->transaction->id, $paymentApiResponse); // Store indefinitely, we'll handle clean up at the end
}

$this->recordPaidInDatabase($paymentApiResponse); // This is idempotent, so it doesn't matter if we run it again

$warehouseOrder = Cache::get('warehouse-order:' . $this->transaction->id);
if (! $warehouseOrder) {
    $warehouseOrder = $this->openWarehouseOrder($this->transaction);
    Cache::put('warehouse-order:' . $this->transaction->id);
}

foreach($this->transaction->products as $product) {
    // We wouldn't want a job retry to send the product again
    $warehouseSent = Cache::get('product-shipped:' . $this->transaction->id . ':' . $product);
    if ($warehouseSent) {
        continue;
    }
    
    $warehouseOrderResponse = $this->addProductToWarehouseOrder($warehouseOrder, $product, $this->transaction);
    Cache::put('product-shipped:'. $this->transaction->id . ':' . $product, $warehouseOrderResponse);

    $this->storeWarehouseOrder($product, $warehouseOrderId);
}

$this->closeWarehouseOrder($warehouseOrder);

foreach($this->transaction->participants as $participant) {
    // We never want to email the participants more than once
    PaymentSent::dispatch($participant, $this->transaction);
}

// We also need to calculate their bonus points, update stock, send the info to the analytics team, etc

// Now let's clear out the cache, since this stuff isn't needed anymore
Cache::forget('payment-request:' . $this->transaction->id);
$this->transaction->products->each(fn ($product) => Cache::forget('product-shipped:' . $this->transaction->id . ':' . $product->id));

After this, we could have something like:

$paymentApiResponse = $this->step(
    'send-payment',
    $this->submitPayment(...)
);

$this->recordPaidInDatabase($paymentApiResponse);

$warehouseOrder = $this->step(
    'warehouse-order',
    fn () => $this->openWarehouseOrder($this->transaction)
);

foreach($this->transaction->products as $product) {
    $warehouseOrderResponse = $this->step(
        'added-product:' . $product->id,
        fn () => $this->addProductToWarehouseOrder(
            $warehouseOrder,
            $product,
            $this->transaction
        );

    $this->step(
        'stored-product:' . $product->id,
        fn () => $this->storeWarehouseOrder($product, $warehouseOrderId)
    );
}

$this->closeWarehouseOrder($warehouseOrder);

// ... other steps, but no clean up required

You could definitely make this work by daisy-chaining jobs together. There's nothing here that can't be done other ways, though I would argue that the mental model of daisy chaining jobs isn't as clean. Also, having worked on projects where queues can get backed up, the chain/batch approach means that jobs can take minutes between steps when queue work is slow.

Last week,@jackbayliss introduced an Interruptible job. Kubernetes auto-scaling of pods, intermittent API failures, rollouts killing existing Horizon instances all mean that a job can die in any state. This makes it cleaner within a job to be able to recover from any of those failures.

Where I think this is even more exciting would be resuming a workflow outside of a queued job. This is where I would like to eventually get this to go, and with the current primitives, can be done.

@romansh
Copy link
Copy Markdown

romansh commented May 5, 2026

@martinbean I didn’t dive too deep into this component, but on the surface it looks similar to something I’ve already built for myself: a kind of state-machine-like workflow engine for queues with a unified interface.

The abstraction helps avoid rewriting each chain from scratch and provides a way to manage data and failures the same way I did before.

@yoeriboven
Copy link
Copy Markdown
Contributor

@cosmastech I’m trying to understand what problem this solves that a batch doesn’t?

I'm assuming there are things you want to happen, of which no result is stored.

@CosmaTech I haven't tested this, but let's say you break it up in separate jobs and run it as a chain. If one breaks and you retry it, does it then also resume the chain? If so, what is the benefit of this over chains?

@koomai
Copy link
Copy Markdown

koomai commented May 6, 2026

@cosmastech great work with the PR.

It looks like this sits somewhere between a Bus::batch of idempotent jobs and a full workflow engine (durable-workflow/workflow in the Laravel ecosystem).

I don't know if I would reach for this because I wouldn't trust the default cache enough to not make my jobs idempotent. A single cache miss would mean everything re-runs. So this doesn't give me much over a batch.

@Lukasss93
Copy link
Copy Markdown
Contributor

Lukasss93 commented May 6, 2026

This fills a real gap that neither Bus::chain nor Bus::batch covers well.

Bus::chain is simple but stateless. If a job fails midway, there's no built-in way to inspect where it stopped or resume from that exact point without re-running everything from scratch. External intervention (fixing the root cause, updating data, etc.) has to happen before you can safely retry, and a plain chain gives you no hook for that.

This implementation looks like a nice middle ground: sequential execution like a chain, but with the state tracking and recoverability you'd expect from a more structured workflow engine.

I noticed it's conceptually similar to durable-workflow/workflow, though that project doesn't support manually resuming from the last failed job without workarounds, which seems to be the key differentiator here.

For context, I built something along these lines internally for my company, mainly to orchestrate VM provisioning on Proxmox. Each step is a Laravel Job (retry = 1), managed by a coordinator class that persists everything to the database and exposes it through a GUI. A few things I ended up needing that might be worth considering here:

  • Unique workflows via a configurable lock key (prevent duplicate runs)
  • Per-task timeouts
  • Manual-only retries (failures block until someone intervenes externally)
  • Explicit states: pending -> running -> failed / completed
  • State transition events (Laravel Events + broadcasting, so the UI updates in real time)
  • Per-task start/end timestamps for execution time tracking
  • Global payload shared across all tasks in the workflow

I'm also attaching a screenshot of the GUI I built for my implementation:
img

@cosmastech
Copy link
Copy Markdown
Contributor Author

@CosmaTech I haven't tested this, but let's say you break it up in separate jobs and run it as a chain. If one breaks and you retry it, does it then also resume the chain? If so, what is the benefit of this over chains?

@yoeriboven I'm not positive I'm understanding the question. Could you give me an example?

@cosmastech
Copy link
Copy Markdown
Contributor Author

I don't know if I would reach for this because I wouldn't trust the default cache enough to not make my jobs idempotent. A single cache miss would mean everything re-runs. So this doesn't give me much over a batch.

@koomai Can you say more about what you find hard to trust about the default cache? What would you reach for instead?

@koomai
Copy link
Copy Markdown

koomai commented May 7, 2026

@koomai Can you say more about what you find hard to trust about the default cache?

Cache can be cleared accidentally or the keys could be evicted when memory is full (probably the most common one). It's also common to have Redis setup without persistence, so you could lose the cache on restart.

What would you reach for instead?

Database.

@cosmastech
Copy link
Copy Markdown
Contributor Author

What would you reach for instead?

Database.

👍 That makes sense and it would be my immediate follow up for this PR. I already can think of plenty of workflows that are better suited to that level of durability. 🫡

@martinbean
Copy link
Copy Markdown
Contributor

@koomai Can you say more about what you find hard to trust about the default cache?

Cache can be cleared accidentally or the keys could be evicted when memory is full (probably the most common one). It's also common to have Redis setup without persistence, so you could lose the cache on restart.

What would you reach for instead?

Database.

Agree with this. A cache should not be replied upon nor used for storing state. As @koomai mentions, if the cache fails, you’re then going to have unintended side effects in re-running jobs you were trying to prevent from being re-ran in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants