High contention on InternalIndex.innerIndex() #18053

polyfractal · 2016-04-28T21:27:09Z

I was running some informal benchmarking on my home cluster for fun and found my max throughput. Interestingly, the cluster didn't appear saturated on any particular resource and I was unable to coax the indexing rate higher no matter what I did.

I reran the test while recording telemetry via Flight Recorder (using XX:+DebugNonSafepoints as recommended by @danielmitterdorfer and @rmuir). I didn't notice anything too outrageous, but there was very high contention on InternalEngine.innerIndex():

On a 6min test, it racked up 3242 contention events and ~24min of CPU time. The innerIndex() method synchronizes on a pool of locks, which are set to be numProcessors * 10

My machine has 32 cores (16 physical + hyperthreading), so the pool has 320 locks. @nik9000 and @jpountz suggested it may be an issue with collisions in that pool. I wasn't really sure of the math to verify, but I found an article which gives a formula to calculate the probability of generating k unique draws out of N possible values.

Since our case is a bit more complex -- pool count (N) scales in relation to core count (k) -- so I put together a spreadsheet to calculate everything¹. Assuming I didn't do anything silly with the math, the collision rate is grim. For my machine, 32 simultaneous requests have a ~79% chance of at least one collision in a pool of 320 locks.

y-axis: probability of collision
x-axis: number of simultaneous requests on an equal number of cores

Bumping the pool count

As a quick test, I bumped the lock multiplier from * 10 to * 1000, meaning my machines had 32000 available locks, which gives a theoretical ~1.5% collision rate. Rerunning the test consistently gives a 20k docs/s speedup, and drops contention down to 297 events and ~3min CPU time.

Next?

All of this is touching delicate code that I don't understand, and my informal test may be biased by something I'm not accounting for. So I'm not really sure what or if there is a next step. But I thought these charts were interesting enough that more knowledgeable folks may want to have a look.

Footnotes

Spreadsheet calculating collision probability
To verify my math, I pinned the pool count to 320 and calculated the same metric against 1-64 requests. It shows a sigmoid like the original article, so I think my math is correct.

The text was updated successfully, but these errors were encountered:

jasontedor · 2016-04-28T21:40:28Z

so I think my math is correct

The math is correct. It's just the birthday problem which is exactly the formula that I see in your spreadsheet.

jasontedor · 2016-04-28T22:19:49Z

An observation here is that the number of locks should scale like the square of the number of cores for a targeted probability of collision but we are only scaling linearly. That is, if we want to target a probability p for 2 * c threads running on c cores (indexing and bulk thread pools are limited to the number of cores), we would want approximately -2 * c^2 / log (1 - p) locks in the lock pool.

I'm not saying that that is the approach that we should take, just saying that this is why this is breaks down so hard as core counts increase.

mikemccand · 2016-04-28T22:30:17Z

An increase of 20 K docs/sec sounds substantial! What was the indexing rate before you increased the lock count?

It's frustrating that ES even needs these locks ... it's because if ES sends the same uid to Lucene concurrently, i.e. with different documents, there's no way for ES to know which document "won".

We have talked about having Lucene return a sequence number so that callers could figure it out ... and @s1monw may have a patch somewhere that started on this. I wonder if that would be enough for ES to remove the locks? E.g. we could use this sequence number to correctly update the version map ...

polyfractal · 2016-04-28T22:39:04Z

Total throughput went from ~270,000 to ~290,000 docs/s.

Note that was in a 4-node cluster, unsure if similar/less/better gains would be had from a single-node. Although I did check and the contention issues seem to exist on a single node too (which makes sense, based on what @jasontedor said about linear lock pool scaling)

There was some variance in between runs, but the difference seemed to hold up. I have half a mind to rerun the tests a few dozen times and do a t-test just to make sure :)

otisg · 2016-04-29T03:18:49Z

Nice find! For what it's worth, I was running a profiler on ES today and saw this same "innerIndex" stuff as one of the top hotspots (ES 5.0 alpha2, built from source). This was on a small 2-datanode ES cluster indexing ~20K docs/sec.

jpountz · 2016-04-29T07:16:25Z

Maybe a short term fix for this would be to use something like KeyedLock. This would still prevent two index/delete operations on the same id from occurring concurrently, but two different ids would never share the same lock.

$@polyfractal$ polyfractal added the :Core/Infra/Core Core issues without another label label Apr 28, 2016

jasontedor mentioned this issue Apr 29, 2016

Avoid sliced locked contention in internal engine #18060

Merged

jasontedor closed this as completed in #18060 Apr 29, 2016

jasontedor mentioned this issue Apr 29, 2016

Sliced lock contention 2.x #18067

Merged

prog8 mentioned this issue Apr 29, 2016

Append-only indices #18069

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High contention on InternalIndex.innerIndex() #18053

High contention on InternalIndex.innerIndex() #18053

polyfractal commented Apr 28, 2016

jasontedor commented Apr 28, 2016

jasontedor commented Apr 28, 2016

mikemccand commented Apr 28, 2016 •

edited

Loading

polyfractal commented Apr 28, 2016

otisg commented Apr 29, 2016

jpountz commented Apr 29, 2016

High contention on InternalIndex.innerIndex() #18053

High contention on InternalIndex.innerIndex() #18053

Comments

polyfractal commented Apr 28, 2016

Bumping the pool count

Next?

Footnotes

jasontedor commented Apr 28, 2016

jasontedor commented Apr 28, 2016

mikemccand commented Apr 28, 2016 • edited Loading

polyfractal commented Apr 28, 2016

otisg commented Apr 29, 2016

jpountz commented Apr 29, 2016

mikemccand commented Apr 28, 2016 •

edited

Loading