[ML] Ensure inference queue is cleared after shutdown #96738

davidkyle · 2023-06-09T11:21:22Z

If an inference request is inserted into the work queue after the queue has shutdown the request will never get processed causing it to hang. When adding to the work queue there is a small window after the is shutdown check where the the work item is added but the worker thread may have stopped. In addition, any processed requests are notified when the deployment is stopped.

elasticsearchmachine · 2023-06-09T11:21:47Z

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine · 2023-06-09T11:21:47Z

Hi @davidkyle, I've created a changelog YAML for you.

droberts195

LGTM

I had a question, but feel free to merge without changing anything if it doesn't make sense.

davidkyle · 2023-06-09T13:47:34Z

@droberts195

I had a question, but feel free to merge without changing anything if it doesn't make sense.

What's the question?

docs/changelog/96738.yaml

droberts195 · 2023-06-09T13:57:23Z

...c/main/java/org/elasticsearch/xpack/ml/job/process/AbstractProcessWorkerExecutorService.java

+            String msg = "unable to process as " + processName + " worker service has shutdown";
+            Exception ex = error.get();
+            for (Runnable runnable : notExecuted) {
+                if (runnable instanceof AbstractRunnable ar) {


Do we expect every runnable to be an AbstractRunnable?

If so then there can be an else branch to assert that it's always the case.

If not, should we log something for other types of Runnable?

I would prefer a solution that uses typing to guarantee we have AbstractRunnable by making the class generic type <T extends AbstractRunnable > instead of <T extends Runnable > but as far as I can tell that is not possible due to the inheritance hierarchy.

There are multiple uses of this class, this code is only run when there is work left after the shutdown. I think using Runnable is reasonable and I want to keep this change as small as possible

droberts195 · 2023-06-09T13:59:11Z

What's the question?

I think you force-pushed part way through my review, so it got lost. I added it again. It's a reason why merging latest main is better than force pushing.

Ensure queue is cleared after shutdown

5fcfb2c

davidkyle added >bug :ml Machine learning auto-backport-and-merge Automatically create backport pull requests and merge when ready v8.9.0 v8.8.2 labels Jun 9, 2023

elasticsearchmachine added the Team:ML Meta label for the ML team label Jun 9, 2023

davidkyle added 2 commits June 9, 2023 12:21

Update docs/changelog/96738.yaml

38762ab

also clear on stop

fa8c604

davidkyle force-pushed the drain-runnables branch from 2472471 to fa8c604 Compare June 9, 2023 12:23

droberts195 approved these changes Jun 9, 2023

View reviewed changes

davidkyle commented Jun 9, 2023

View reviewed changes

docs/changelog/96738.yaml Outdated Show resolved Hide resolved

Update docs/changelog/96738.yaml

3c48d00

droberts195 reviewed Jun 9, 2023

View reviewed changes

davidkyle merged commit 307425f into elastic:main Jun 9, 2023
11 of 12 checks passed

davidkyle mentioned this pull request Jun 9, 2023

[8.8] [ML] Ensure inference queue is cleared after shutdown (#96738) #96743

Merged

davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Jun 9, 2023

[ML] Ensure inference queue is cleared after shutdown (elastic#96738)

5efc5e4

elasticsearchmachine pushed a commit that referenced this pull request Jun 12, 2023

[ML] Ensure inference queue is cleared after shutdown (#96738) (#96743)

f231658

HiDAl pushed a commit to HiDAl/elasticsearch that referenced this pull request Jun 14, 2023

[ML] Ensure inference queue is cleared after shutdown (elastic#96738)

77a9f99

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Ensure inference queue is cleared after shutdown #96738

[ML] Ensure inference queue is cleared after shutdown #96738

davidkyle commented Jun 9, 2023 •

edited

elasticsearchmachine commented Jun 9, 2023

elasticsearchmachine commented Jun 9, 2023

droberts195 left a comment

davidkyle commented Jun 9, 2023

droberts195 Jun 9, 2023

davidkyle Jun 9, 2023

droberts195 commented Jun 9, 2023

[ML] Ensure inference queue is cleared after shutdown #96738

[ML] Ensure inference queue is cleared after shutdown #96738

Conversation

davidkyle commented Jun 9, 2023 • edited

elasticsearchmachine commented Jun 9, 2023

elasticsearchmachine commented Jun 9, 2023

droberts195 left a comment

Choose a reason for hiding this comment

davidkyle commented Jun 9, 2023

droberts195 Jun 9, 2023

Choose a reason for hiding this comment

davidkyle Jun 9, 2023

Choose a reason for hiding this comment

droberts195 commented Jun 9, 2023

davidkyle commented Jun 9, 2023 •

edited