-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
Description
I frequently see hot_threads output like this:
24.6% (122.8ms out of 500ms) cpu usage by thread 'elasticsearch[elastic1012][search][T#13]'
10/10 snapshots sharing following 10 elements
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)
org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:724)
or when the system is under lower load, I see this:
8.2% (41.1ms out of 500ms) cpu usage by thread 'elasticsearch[elastic1016][refresh][T#5]'
10/10 snapshots sharing following 9 elements
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:702)
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.poll(LinkedTransferQueue.java:1117)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:724)
It looks like hot_threads figures out what threads are hot and then takes stack trace snapshots. I imagine this is significantly less load on the system then snapshotting all the threads during the load measurements but it makes the results less useful, I think. Fine for really long running stuff like merges, but searches tend to be bursty.
Its still super useful but I find myself using jstack and thread dumps now that I've squashed most of the long running problems. Maybe something with hot_threads' stack trace merging that takes snapshots of all the threads and merges them would be more useful to me at this stage.