-
Notifications
You must be signed in to change notification settings - Fork 24.6k
-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High contention on InternalIndex.innerIndex() #18053
Comments
The math is correct. It's just the birthday problem which is exactly the formula that I see in your spreadsheet. |
An observation here is that the number of locks should scale like the square of the number of cores for a targeted probability of collision but we are only scaling linearly. That is, if we want to target a probability I'm not saying that that is the approach that we should take, just saying that this is why this is breaks down so hard as core counts increase. |
An increase of 20 K docs/sec sounds substantial! What was the indexing rate before you increased the lock count? It's frustrating that ES even needs these locks ... it's because if ES sends the same We have talked about having Lucene return a sequence number so that callers could figure it out ... and @s1monw may have a patch somewhere that started on this. I wonder if that would be enough for ES to remove the locks? E.g. we could use this sequence number to correctly update the version map ... |
Total throughput went from ~270,000 to ~290,000 docs/s. Note that was in a 4-node cluster, unsure if similar/less/better gains would be had from a single-node. Although I did check and the contention issues seem to exist on a single node too (which makes sense, based on what @jasontedor said about linear lock pool scaling) There was some variance in between runs, but the difference seemed to hold up. I have half a mind to rerun the tests a few dozen times and do a t-test just to make sure :) |
Nice find! For what it's worth, I was running a profiler on ES today and saw this same "innerIndex" stuff as one of the top hotspots (ES 5.0 alpha2, built from source). This was on a small 2-datanode ES cluster indexing ~20K docs/sec. |
Maybe a short term fix for this would be to use something like KeyedLock. This would still prevent two index/delete operations on the same id from occurring concurrently, but two different ids would never share the same lock. |
I was running some informal benchmarking on my home cluster for fun and found my max throughput. Interestingly, the cluster didn't appear saturated on any particular resource and I was unable to coax the indexing rate higher no matter what I did.
I reran the test while recording telemetry via Flight Recorder (using
XX:+DebugNonSafepoints
as recommended by @danielmitterdorfer and @rmuir). I didn't notice anything too outrageous, but there was very high contention onInternalEngine.innerIndex()
:On a 6min test, it racked up 3242 contention events and ~24min of CPU time. The
innerIndex()
method synchronizes on a pool of locks, which are set to be numProcessors * 10My machine has 32 cores (16 physical + hyperthreading), so the pool has 320 locks. @nik9000 and @jpountz suggested it may be an issue with collisions in that pool. I wasn't really sure of the math to verify, but I found an article which gives a formula to calculate the probability of generating
k
unique draws out ofN
possible values.Since our case is a bit more complex -- pool count (
N
) scales in relation to core count (k
) -- so I put together a spreadsheet to calculate everything1. Assuming I didn't do anything silly with the math, the collision rate is grim. For my machine, 32 simultaneous requests have a ~79% chance of at least one collision in a pool of 320 locks.y-axis: probability of collision
x-axis: number of simultaneous requests on an equal number of cores
Bumping the pool count
As a quick test, I bumped the lock multiplier from
* 10
to* 1000
, meaning my machines had 32000 available locks, which gives a theoretical ~1.5% collision rate. Rerunning the test consistently gives a 20k docs/s speedup, and drops contention down to 297 events and ~3min CPU time.Next?
All of this is touching delicate code that I don't understand, and my informal test may be biased by something I'm not accounting for. So I'm not really sure what or if there is a next step. But I thought these charts were interesting enough that more knowledgeable folks may want to have a look.
Footnotes
The text was updated successfully, but these errors were encountered: