Aggregation keeps running long after task cancellation #108701
Labels
:Analytics/Aggregations
Aggregations
>bug
Team:Analytics
Meta label for analytical engine team (ESQL/Aggs/Geo)
A user reported to me that they had inadvertently run a very expensive collection of queries which caused stress to their cluster so they cancelled them, but some
indices:data/read/search[phase/query]
tasks continued to run for a very long time after being cancelled and eventually they had to restart nodes to restore their cluster back to a working state. They shared a thread dump which shows various places where we appear to be missing cancellation detection today, most commonly in stack traces that look like this one:The text was updated successfully, but these errors were encountered: