Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document that Transparent Huge Pages should be disabled on Linux #26551

Open
jakommo opened this Issue Sep 8, 2017 · 5 comments

Comments

Projects
None yet
8 participants
@jakommo
Copy link
Contributor

jakommo commented Sep 8, 2017

There seems to be a Kernel issue https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1644056 that causes a Kernel crash during high load.
Also reported in discuss: https://discuss.elastic.co/t/elasticsearch-5-4-2-process-periodically-dying-with-high-cpu-load-and-kernel-message-pgtable-generic-c-33-bad-pmd/92239

It seems like this can be worked around by disabling THP on linux, e.g. echo -n never > /sys/kernel/mm/transparent_hugepage/enabled.

I had a chat with @jasontedor and we should recommend to disable THP in general, not just with the effected Kernel versions.
Seems like this is enabled by default on at least Ubuntu 14/16.04 and RHEL 6/7.

I think Important System Configuration would be a good place for this.

@jakommo jakommo added the >docs label Sep 8, 2017

@javanna javanna added the help wanted label Sep 14, 2017

@jakommo

This comment has been minimized.

Copy link
Contributor Author

jakommo commented Sep 27, 2017

I talked to a user and they still experienced the above kernel bug after disabling THP, a lot less frequent though.
What seems to have solved it for them now is to also disable NUMA, so maybe we can add this as well (if there are no objections from dev side).

@jasontedor

This comment has been minimized.

Copy link
Member

jasontedor commented Oct 6, 2017

As I mentioned in another channel, we should make the THP recommendation independent of any kernel bugs that may or may not be present.

As far as NUMA, our recommendations here require gathering more data and running some experiments. I don’t think we should base our recommendations on the basis of one data point (that might be fixed in different kernel versions).

@elasticmachine

This comment has been minimized.

Copy link

elasticmachine commented Apr 24, 2018

@dliappis

This comment has been minimized.

Copy link
Contributor

dliappis commented Apr 24, 2018

One additional data point here, me and @danielmitterdorfer are working on a) evaluating the stability and b) performance behavior of Elasticsearch with and without THP. To be more exact, tuning involves not only testing THP enabled/false but also the defrag THP option, which as of kernel 4.6.1 offers new defrag strategies.

So far in our nightly benchmarking environment we have discovered that disabling THP (which in newer kernels is usually done by setting /sys/kernel/mm/transparent_hugepage/{defrag,enabled} to madvise) causes a performance drop in Elasticsearch.

Additionally, more recent versions of the Ubuntu kernel (starting with 4.12.2) are now setting THP to madvise from the enabled which used to be the default, which is how we became aware of the performance regression in the first place. madvise will also be the default setting in the upcoming Ubuntu 18.04 LTS release in a few days.

We will be providing more details when the necessary longrunning benchmarks have finished, backed by enough CI runs plus sufficient benchmarking data for a THP suggestion.

@alexander-marquardt

This comment has been minimized.

Copy link

alexander-marquardt commented Jul 2, 2018

MongoDB recommends against THP in the following document, so the same logic might apply to Elasticsearch if our access patterns are similar: https://docs.mongodb.com/manual/tutorial/transparent-huge-pages/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.