Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove artificial default processors limit #20874

Merged
merged 2 commits into from Oct 14, 2016

Conversation

Projects
None yet
5 participants
@jasontedor
Copy link
Member

commented Oct 11, 2016

Today Elasticsearch limits the number of processors used in computing
thread counts to 32. This was from a time when Elasticsearch created
more threads than it does now and users would run into out of memory
errors. It appears the real cause of these out of memory errors was not
well understood (it's often due to ulimit settings) and so users were
left hitting these out of memory errors on boxes with high core
counts. Today Elasticsearch creates less threads (but still a lot) and
we have a bootstrap check in place to ensure that the relevant ulimit is
not too low.

There are some caveats still to having too many concurrent indexing
threads as it can lead to too many little segments, and it's not a
magical go faster knob if indexing is already bottlenecked by disk, but
this limitation is artificial and surprising to users and so it should
be removed.

Closes #20828

jasontedor added some commits Oct 11, 2016

Remove processors limit
Today Elasticsearch limits the number of processors used in computing
thread counts to 32. This was from a time when Elasticsearch created
more threads than it does now and users would run into out of memory
errors. It appears the real cause of these out of memory errors was not
well understood (it's often due to ulimit settings) and so users were
left hitting these out of memory errors on boxes with high core
counts. Today Elasticsearch creates less threads (but still a lot) and
we have a bootstrap check in place to ensure that the relevant ulimit is
not too low.

There are some caveats still to having too many concurrent indexing
threads as it can lead to too many little segments, and it's not a
magical go faster knob if indexing is already bottlenecked by disk, but
this limitation is artificial and surprising to users and so it should
be removed.
Increase max number of threads threshold
This commit increases the lower bound of the max processes ulimit, to
prepare for a world where Elasticsearch instances might be running with
more the previous cap of 32 processors. With the current settings,
Elasticsearch wants to create roughly 576 + 25 * p / 2 threads, where p
is the number of processors. Add in roughly 7 * p / 8 threads for the GC
threads and a fudge factor, and 4096 should cover us pretty well up to
256 cores.
@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Oct 11, 2016

I've targeted this for 5.1.0, but I think that da92de7 should be backported to 5.0.0 or users will have a rude awakening when they upgrade to 5.1.0 if they've already adjusted their ulimit to 2048.

@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Oct 11, 2016

@mikemccand Are you interested in reviewing?

@bleskes bleskes changed the title Remove artificial processors limit Remove artificial default processors limit Oct 12, 2016

@bleskes

This comment has been minimized.

Copy link
Member

commented Oct 12, 2016

Tracing through history, it seems the original issue was about the transport client. Maybe it makes sense to set a bounded default in the client alone?

@mikemccand

This comment has been minimized.

Copy link
Contributor

commented Oct 12, 2016

LGTM, thanks @jasontedor!

@jpountz
Copy link
Contributor

left a comment

LGTM

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Oct 12, 2016

I've targeted this for 5.1.0, but I think that da92de7 should be backported to 5.0.0 or users will have a rude awakening when they upgrade to 5.1.0 if they've already adjusted their ulimit to 2048.

How bad would it be to only push this commit to 6.0?

@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Oct 12, 2016

How bad would it be to only push this commit to 6.0?

It just means another year of living with this limitation.

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Oct 12, 2016

I'm not sure I was clear, but I was commenting specifically about the change to the BootstrapCheck. Said otherwise: how about removing the 32 limit in 5.1 and requiring an increase to the ulimit only in 6.0? We could potentially revisit the idea to better share thread pools in 5.1 to make the number of threads less of an issue.

@bleskes

This comment has been minimized.

Copy link
Member

commented Oct 12, 2016

I think that is this stage (rc is out), we need to ask a different question - Does it really need to go into 5.0?. I think if we’re honest, the answer is no?

On 12 Oct 2016, at 23:48, Jason Tedor notifications@github.com wrote:

How bad would it be to only push this commit to 6.0?

It just means another year of living with this limitation.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Oct 12, 2016

I'm not sure I was clear, but I was commenting specifically about the change to the BootstrapCheck.

@jpountz It definitely was not clear; I spoke with @s1monw via another channel and I think that he interpreted the same way that I was interpreting it (to keep the limitation until 6.0.0).

Said otherwise: how about removing the 32 limit in 5.1 and requiring an increase to the ulimit only in 6.0?

It's worth considering.

@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Oct 12, 2016

@bleskes It's not clear what you're commenting on; the processor limitation removal is not targeted for 5.0.0, just the bootstrap check change as otherwise users might have to set the ulimit twice: once on upgrade to 5.0.0, and again on upgrade to 5.1.0.

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Oct 13, 2016

What about this, we wait with boostrap checks until 6.0 and make the hardlimit a softlimit in 5.1 then users that have such a high CPU count can set it higher but it's bound by default. We can also to support users print a warning message that they should consider raising this limit. I am not a fan of changing something like this in a minor release

@bleskes

This comment has been minimized.

Copy link
Member

commented Oct 13, 2016

@bleskes It's not clear what you're commenting on; the processor limitation removal is not targeted for 5.0.0,

I see. Then it's good in that respect. I'm still hesitant about changing the ulimit as well - I'm not sure bootstrap checks should make everyone account for the largest use case possible. One other idea I had is to take Runtime.getRuntime().availableProcessors() into account when determining the lower bound limit. That said - we haven't released this yet so I tend to prefer having the least restrictive and simplest option first and see how it goes.

@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Oct 13, 2016

I'm not sure bootstrap checks should make everyone account for the largest use case possible. One other idea I had is to take Runtime.getRuntime().availableProcessors() into account when determining the lower bound limit. That said - we haven't released this yet so I tend to prefer having the least restrictive and simplest option first and see how it goes.

@bleskes Before opening this PR, I did toy with exactly this idea. It's fine, it works, but I opted exactly for the simplest approach because the default on most systems is already quite large (total number of pages / 64); any x86 system with more than 1 GB of memory will have its defaults above the limit enforced here.

@bleskes

This comment has been minimized.

Copy link
Member

commented Oct 13, 2016

the default on most systems is already quite large (total number of pages / 64); any x86 system with more than 1 GB of memory will have its defaults above the limit enforced here.

All good then!

@jasontedor jasontedor removed the v5.1.1 label Oct 14, 2016

@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Oct 14, 2016

What about this, we wait with boostrap checks until 6.0 and make the hardlimit a softlimit in 5.1 then users that have such a high CPU count can set it higher but it's bound by default.

It's a soft limit today.

We discussed this during Fix-it-Friday and agreed to push this to 6.0.0 only since there is a workaround for the limit in the 5.x series already (see #20895).

@jasontedor jasontedor merged commit 595ec8c into elastic:master Oct 14, 2016

2 checks passed

CLA Commit author is a member of Elasticsearch
Details
elasticsearch-ci Build finished.
Details

@jasontedor jasontedor deleted the jasontedor:remove-artificial-processors-limit branch Oct 14, 2016

jasontedor added a commit that referenced this pull request Jul 9, 2017

Fix scaling thread pool test bug
This commit adjusts the expectation for the max number of threads in the
scaling thread pool configuration test. The reason that this expectation
is incorrect is because we removed the limitation that the number of
processors maxes out at 32, instead letting it be the true number of
logical processors on the machine. However, when we removed this
limitation, this test was never adjusted to reflect the new reality yet
it never arose since our tests were not running on machines with
incredibly high core counts.

Relates #20874

jasontedor added a commit that referenced this pull request Jul 9, 2017

Fix scaling thread pool test bug
This commit adjusts the expectation for the max number of threads in the
scaling thread pool configuration test. The reason that this expectation
is incorrect is because we removed the limitation that the number of
processors maxes out at 32, instead letting it be the true number of
logical processors on the machine. However, when we removed this
limitation, this test was never adjusted to reflect the new reality yet
it never arose since our tests were not running on machines with
incredibly high core counts.

Relates #20874

jasontedor added a commit that referenced this pull request Jul 9, 2017

Revert "Fix scaling thread pool test bug"
This reverts commit 037014b.

Relates #20874
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.