You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
JVM version (java -version):
openjdk version "1.8.0_141"
OpenJDK Runtime Environment (build 1.8.0_141-b16)
OpenJDK 64-Bit Server VM (build 25.141-b16, mixed mode)
OS version (uname -a if on a Unix-like system):
Linux 55d1beb7fa7e 4.9.43-17.38.amzn1.x86_64 #1 SMP Thu Aug 17 00:20:39 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
(running inside docker container)
Description of the problem including expected versus actual behavior:
Using the Search Profiler in Kibana causes runaway CPU usage on ES node(s), even when using the default query:
Index: _any
Query:
{
"query":{
"match_all" : {}
}
}
I can reproduce this in production (8 nodes, 7.5TB primary, 73 indices, 524 shards), and in a test environment (1 node, 1TB primary, 93 indices, 97 shards), but not on an empty cluster running locally.
In test env, it causes CPU on the single to spike to 100% for ~5 minutes before returning to normal.
In production, it caused CPU to spike to 100% on 4 out of 8 nodes for over half an hour (at which point I used the Task Cancellation API).
Steps to reproduce:
Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.
I have not tried to put together a minimal example yet (other than trying it on an empty cluster). Reproduction for me (on real systems) just involves navigating to https://kibana/app/kibana#/dev_tools/searchprofiler?_g=() and clicking the 'Profile' button.
Provide logs (if relevant):
Hot threads (in test env, on a node that is otherwise almost idle):
::: {55d1beb7fa7e}{98KLc2f5SQebjsekSGmGDg}{TFUNTxuYRpqHvigoyx0kdg}{10.2.92.71}{10.2.92.71:9300}{aws_availability_zone=us-west-2c}
Hot threads at 2017-09-05T05:54:17.904Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
50.2% (250.7ms out of 500ms) cpu usage by thread 'elasticsearch[55d1beb7fa7e][search][T#2]'
8/10 snapshots sharing following 18 elements
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:246)
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:197)
org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:668)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:388)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:108)
org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:248)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:263)
org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:330)
org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:327)
org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
2/10 snapshots sharing following 18 elements
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:244)
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:197)
org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:668)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:388)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:108)
org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:248)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:263)
org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:330)
org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:327)
org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
50.0% (250ms out of 500ms) cpu usage by thread 'elasticsearch[55d1beb7fa7e][search][T#4]'
9/10 snapshots sharing following 18 elements
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:246)
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:197)
org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:668)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:388)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:108)
org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:248)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:263)
org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:330)
org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:327)
org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
unique snapshot
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:244)
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:197)
org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:668)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:388)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:108)
org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:248)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:263)
org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:330)
org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:327)
org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
49.8% (249.1ms out of 500ms) cpu usage by thread 'elasticsearch[55d1beb7fa7e][search][T#3]'
9/10 snapshots sharing following 18 elements
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:246)
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:197)
org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:668)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:388)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:108)
org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:248)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:263)
org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:330)
org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:327)
org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
unique snapshot
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:244)
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:197)
org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:668)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:388)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:108)
org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:248)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:263)
org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:330)
org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:327)
org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
::: {ip-10-38-75-80-62c1ea431ae5}{0Tq96erBTLWE7MT06ssANg}{Fkk03QXhTWaGX0jd4hhgUw}{10.38.75.80}{10.38.75.80:9300}{aws_availability_zone=us-west-2c}
Hot threads at 2017-09-05T05:54:17.907Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
Screenshot after clicking 'Profile', waiting for ES to recover, clicking it again, waiting for recovery:
The text was updated successfully, but these errors were encountered:
The volume of data not the complexity of the query here is likely to be the issue. Even the smallest overhead can amount to a lot when you have a tight loop on a very large number of docs.
From the docs:
Like any profiler, the Profile API introduces a non-negligible overhead to search execution.
The act of instrumenting low-level method calls ... can be fairly expensive, since these
methods are called in tight loops
Can you confirm the number of docs you have? If it's very large then the profiler is working-as-designed I expect.
I think @markharwood 's explanation is the right one. org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:246) is just a loop that iterates over matches of the query, so the more documents, the longer it takes.
For the record, profiling will have less overhead in 5.6 (#25772).
Is there a way to disable the profile API?
No. I'm not sure it would make sense to make it possible to disable it given that there are many other ways to consume a lot of CPU on a cluster that you have read access to.
Elasticsearch version (
bin/elasticsearch --version
):Version: 5.5.2, Build: b2f0c09/2017-08-14T12:33:14.154Z, JVM: 1.8.0_141
Plugins installed:
discovery-ec2 ingest-geoip ingest-user-agent x-pack
JVM version (
java -version
):openjdk version "1.8.0_141"
OpenJDK Runtime Environment (build 1.8.0_141-b16)
OpenJDK 64-Bit Server VM (build 25.141-b16, mixed mode)
OS version (
uname -a
if on a Unix-like system):Linux 55d1beb7fa7e 4.9.43-17.38.amzn1.x86_64 #1 SMP Thu Aug 17 00:20:39 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
(running inside docker container)
Description of the problem including expected versus actual behavior:
Using the Search Profiler in Kibana causes runaway CPU usage on ES node(s), even when using the default query:
Index: _any
Query:
I can reproduce this in production (8 nodes, 7.5TB primary, 73 indices, 524 shards), and in a test environment (1 node, 1TB primary, 93 indices, 97 shards), but not on an empty cluster running locally.
In test env, it causes CPU on the single to spike to 100% for ~5 minutes before returning to normal.
In production, it caused CPU to spike to 100% on 4 out of 8 nodes for over half an hour (at which point I used the Task Cancellation API).
Steps to reproduce:
Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.
I have not tried to put together a minimal example yet (other than trying it on an empty cluster). Reproduction for me (on real systems) just involves navigating to https://kibana/app/kibana#/dev_tools/searchprofiler?_g=() and clicking the 'Profile' button.
Provide logs (if relevant):
Hot threads (in test env, on a node that is otherwise almost idle):
Screenshot after clicking 'Profile', waiting for ES to recover, clicking it again, waiting for recovery:
The text was updated successfully, but these errors were encountered: