Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregation keeps running long after task cancellation #108701

Open
DaveCTurner opened this issue May 16, 2024 · 2 comments
Open

Aggregation keeps running long after task cancellation #108701

DaveCTurner opened this issue May 16, 2024 · 2 comments
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@DaveCTurner
Copy link
Contributor

A user reported to me that they had inadvertently run a very expensive collection of queries which caused stress to their cluster so they cancelled them, but some indices:data/read/search[phase/query] tasks continued to run for a very long time after being cancelled and eventually they had to restart nodes to restore their cluster back to a working state. They shared a thread dump which shows various places where we appear to be missing cancellation detection today, most commonly in stack traces that look like this one:

   100.0% [cpu=99.9%, other=0.1%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[REDACTED][search_worker][T#6]'
     10/10 snapshots sharing following 35 elements
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.bucket.terms.LongKeyedBucketOrds$FromMany$1.next(LongKeyedBucketOrds.java:368)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$RemapGlobalOrds.forEach(GlobalOrdinalsStringTermsAggregator.java:560)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$ResultStrategy.buildAggregations(GlobalOrdinalsStringTermsAggregator.java:606)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.buildAggregations(GlobalOrdinalsStringTermsAggregator.java:185)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.bucket.BestBucketsDeferringCollector$2.buildAggregations(BestBucketsDeferringCollector.java:245)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.bucket.BucketsAggregator.buildSubAggsForBuckets(BucketsAggregator.java:180)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.bucket.BucketsAggregator.buildSubAggsForAllBuckets(BucketsAggregator.java:242)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.access$300(GlobalOrdinalsStringTermsAggregator.java:55)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$StandardTermsResults.buildSubAggs(GlobalOrdinalsStringTermsAggregator.java:766)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$StandardTermsResults.buildSubAggs(GlobalOrdinalsStringTermsAggregator.java:715)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$ResultStrategy.buildAggregations(GlobalOrdinalsStringTermsAggregator.java:630)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.buildAggregations(GlobalOrdinalsStringTermsAggregator.java:185)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.bucket.BestBucketsDeferringCollector$2.buildAggregations(BestBucketsDeferringCollector.java:245)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.bucket.BucketsAggregator.buildSubAggsForBuckets(BucketsAggregator.java:180)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.bucket.BucketsAggregator.buildSubAggsForAllBuckets(BucketsAggregator.java:242)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.bucket.terms.MapStringTermsAggregator.access$100(MapStringTermsAggregator.java:50)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.bucket.terms.MapStringTermsAggregator$StandardTermsResults.buildSubAggs(MapStringTermsAggregator.java:439)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.bucket.terms.MapStringTermsAggregator$StandardTermsResults.buildSubAggs(MapStringTermsAggregator.java:357)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.bucket.terms.MapStringTermsAggregator$ResultStrategy.buildAggregations(MapStringTermsAggregator.java:276)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.bucket.terms.MapStringTermsAggregator.buildAggregations(MapStringTermsAggregator.java:112)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.Aggregator.buildTopLevel(Aggregator.java:159)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.aggregations.AggregatorCollector.doPostCollection(AggregatorCollector.java:47)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.query.QueryPhaseCollector.doPostCollection(QueryPhaseCollector.java:379)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.internal.ContextIndexSearcher.doAggregationPostCollection(ContextIndexSearcher.java:486)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:475)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.internal.ContextIndexSearcher.lambda$search$4(ContextIndexSearcher.java:375)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.search.internal.ContextIndexSearcher$$Lambda/0x00007f8aa1962038.call(Unknown Source)
       java.base@21.0.1/java.util.concurrent.FutureTask.run(FutureTask.java:317)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)
       app/org.elasticsearch.server@8.11.1/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       java.base@21.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       java.base@21.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       java.base@21.0.1/java.lang.Thread.runWith(Thread.java:1596)
       java.base@21.0.1/java.lang.Thread.run(Thread.java:1583)
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@nik9000
Copy link
Member

nik9000 commented May 17, 2024

Looks like they are building a huge aggregation. We could certainly check for interruption periodically in there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

3 participants