Skip to content

Conversation

DonalEvans
Copy link
Contributor

AbstractProcessWorkerExecutorService.notifyQueueRunnables() was making an incorrect assumption that all AbstractRunnables that were submitted for execution would be queued as AbstractRunnables. However, PriorityProcessWorkerExecutorService wraps AbstractRunnables in OrderedRunnable before queueing them, and since OrderedRunnable is not an AbstractRunnable, these were skipped when notifyQueueRunnables() drained the queue, leading to potential hangs.

  • Refactor notifyQueueRunnables() to allow PriorityProcessWorkerExecutorService to notify the AbstractRunnable contained within queued OrderedRunnables
  • Ensure that notifyQueueRunnables() is called and the executor marked as shut down if an exception is thrown from start()
  • Add unit tests

Closes #134651

AbstractProcessWorkerExecutorService.notifyQueueRunnables() was making
an incorrect assumption that all AbstractRunnables that were submitted
for execution would be queued as AbstractRunnables. However,
PriorityProcessWorkerExecutorService wraps AbstractRunnables in
OrderedRunnable before queueing them, and since OrderedRunnable is not
an AbstractRunnable, these were skipped when notifyQueueRunnables()
drained the queue, leading to potential hangs.

- Refactor notifyQueueRunnables() to allow
  PriorityProcessWorkerExecutorService to notify the AbstractRunnable
  contained within queued OrderedRunnables
- Ensure that notifyQueueRunnables() is called and the executor marked
  as shut down if an exception is thrown from start()
- Add unit tests

Closes elastic#134651
@DonalEvans DonalEvans added >bug :ml Machine learning Team:ML Meta label for the ML team auto-backport Automatically create backport pull requests when merged v8.19.6 v9.1.6 v9.0.9 v8.18.9 v9.2.1 v9.3.0 labels Oct 3, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Hi @DonalEvans, I've created a changelog YAML for you.

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, left one suggestion

}
}

protected abstract void notifyIfAbstractRunnable(T runnable, Exception ex, String msg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about having AbstractProcessWorkerExecutorService contain the logic for notifyIfAbstractRunnable and relying on an abstract method like getAsAbstractRunnable which either returns the AbstractRunnable or null. That way child classes don't need to call back into the parent to do the notification, they would only return the abstract runnable if it was one.

Something like:

    protected abstract AbstractRunnable getAsAbstractRunnable(T runnable);

    private void notifyIfAbstractRunnable(T runnable, Exception ex, String msg) {
        var abstractRunnable = getAsAbstractRunnable(runnable);
        if (abstractRunnable != null) {
            notifyAbstractRunnable(ex, msg, abstractRunnable);
        }
    }

Then PriorityProcessWorkerExecutorService would have something like:

    @Override
    protected AbstractRunnable getAsAbstractRunnable(OrderedRunnable orderedRunnable, Exception ex, String msg) {
        // The runnable contained within OrderedRunnable is always an AbstractRunnable, so no need to check the type
        return orderedRunnable.runnable();
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >bug :ml Machine learning Team:ML Meta label for the ML team v8.18.9 v8.19.6 v9.0.9 v9.1.6 v9.2.1 v9.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] Reindex hangs when deployment forcefully shuts down
3 participants