Autodetermine heap settings based on node roles and total system memory #65905

mark-vieira · 2020-12-04T22:38:10Z

This commit expands our JVM egonomics to also automatically determine
appropriate heap size based on the total available system memory as well
as the roles assigned to the node. Role determination is done via a
naive parsing of elasticsearch.yml. No settings validation is done and
only the 'node.roles' setting is taken into consideration.

For heap purposes a node falls into one of four (4) categories:

A 'master-only' node. This is a node with only the 'master' role.
A 'ml-only' node. Similarly, a node with only the 'ml' role.
A 'data' node. This is basically the 'other' case. A node with any
set of roles other than only master or only ml is considered a 'data'
node, to include things like coordinating-only or "tie-breaker" nodes.
Unknown. This is the case if legacy settings are used. In this
scenario we fallback to the old default heap options of 1GB.

In all cases we short-circuit if a user provides explicit heap options
so we only ever auto-determine heap if no existing heap options exist.
Starting with this commit the default heap settings (1GB) are now
removed from the default jvm.options which means we'll start auto-
setting heap as the new default.

This commit expands our JVM egonomics to also automatically determine appropriate heap size based on the total available system memory as well as the roles assigned to the node. Role determination is done via a naive parsing of elasticsearch.yml. No settings validation is done and only the 'node.roles' setting is taken into consideration. For heap purposes a node falls into one of four (4) categories: 1. A 'master-only' node. This is a node with only the 'master' role. 2. A 'ml-only' node. Similarly, a node with only the 'ml' role. 3. A 'data' node. This is basically the 'other' case. A node with any set of roles other than only master or only ml is considered a 'data' node, to include things like coordinating-only or "tie-breaker" nodes. 4. Unknown. This is the case if legacy settings are used. In this scenario we fallback to the old default heap options of 1GB. In all cases we short-circuit if a user provides explicit heap options so we only ever auto-determine heap if no existing heap options exist. Starting with this commit the default heap settings (1GB) are now removed from the default jvm.options which means we'll start auto- setting heap as the new default.

elasticmachine · 2020-12-04T22:38:13Z

Pinging @elastic/es-delivery (Team:Delivery)

mark-vieira · 2020-12-04T22:42:38Z

distribution/src/config/jvm.options

@@ -36,8 +36,8 @@
 # Xms represents the initial size of the JVM heap
 # Xmx represents the maximum size of the JVM heap

-Xms${heap.min}
-Xmx${heap.max}
+# -Xms${heap.min}


Still need to think about how to change this. Should we remove this section entirely? Continue to include it also recommend folks stick with auto-determined heap size?

We may want to leave it in with a note, since until now our users have always had to configure it, and our training courses will expect it to exist. Maybe we can remove it in master but retain it with a comment in 7.x?

It's tricky. Either way (removing, or commenting out), we are going to potentially break automation here. For example, if there's automation in place that downloads our tar archive, explodes it, and uses sed to replace the heap size.

That leads me to wonder out loud if we should not break users here. I'd propose the following: in 7.x, we leave this as-is in the jvm.options file. Perhaps we provide a warning when the heap size is set manually by a user, pointing them to documentation for machine-dependent heap sizes. In Cloud, we still want to activate the machine-dependent heap sizes, so we'd need deployments there to remove the -Xms/-Xmx flags from their jvm.options file.

In 8.0, we remove this from the default jvm.options file.

I understand the concern, though I'm not sure about a warning. I don't think we intend to disallow setting the heap manually right? Yet a warning for setting the heap would seem like just that. Such a sed replacement in practice would seem fragile. It could break on any change to the default.

I'm conflicted. I don't see a way to warn some users without annoying others. But I do think this is something we can followup on, so in this PR I will restore the original value.

My suggestion was:

Perhaps we provide a warning when the heap size is set manually by a user, pointing them to documentation for machine-dependent heap sizes.

Not a warning that we will disallow setting the heap size manually, but a permanent warning that the heap size is being set manually and we recommend not for the optimal experience.

But I do think this is something we can followup on, so in this PR I will restore the original value.

Are you saying then that the default experience will be to not get machine-dependent heap sizing? That seems contrary to our goals here.

If we leave the jvm.options as is, the default will remain not getting machine-dependent heap sizing. If we emit a warning, any user who chooses to not want to use machine-dependent heap sizing will get an annoying warning they cannot use. Do we agree on those statements?

My suggestion was to leave the jvm.options as is (so as not to break the theoretical users that could be using sed to replace the heap size), and then follow up with whatever we decide to do regarding helping users convert within 7.x. This suggestion is because cloud will not rely on the default.

If we leave the jvm.options as is, the default will remain not getting machine-dependent heap sizing. If we emit a warning, any user who chooses to not want to use machine-dependent heap sizing will get an annoying warning they cannot use. Do we agree on those statements?

Why can't they use it? The point of the warning is to tell them to stop setting the heap size manually, let Elasticsearch do it for them. The warning is useful to them.

Sorry, not sure how "cannot use" got in my response, it was meant to be "cannot remove". That is it is a persistent warning that the only way to remove is to go with ES's memory choices. The warning is useful in that they find out they could be using our automatic heap sizing, but are we saying users will never tune heap anymore? I personally get annoyed if there are warnings I cannot remove with normal usage.

Jason and I spoke and came to an agreement on the above discussion. @jasontedor Let me know if I missed or misunderstood anything.

We should add a warning when heap size is manually specified, since our goal is to have users use the auto heap. There are however some issues that prevent us from doing that in this PR. First is a technical issue: how do we pass back the message when the only output of the jvm options parser is the final jvm options?. The second issue is allowing cloud time to adapt to using auto sizing so that we do not spew errors across all cloud deployments. I created #66308 to ensure we followup with adding this warning.

Regarding whether to leave the heap options or remove them in a minor, we think the auto heap sizing is important enough that we don't want to have any more new users specifying heap. So, we should remove the Xmx/Xms from jvm.options. Users should no longer be relying on modifying jvm.options, but instead specifying these in jvm.options.d files.

@mark-vieira @jasontedor I've removed Xmx/Xms in 88a81a1, but left in a comment explaining where the heap size should be set if it needs to be.

mark-vieira · 2020-12-05T01:24:45Z

@rjernst I wasn't able to get as far as adding packaging tests as fixing existing tests took priority. I think I've got the linux tests passing but there's a Windows test still failing. I think the root cause is the service blows up trying to start because it can't allocate enough heap, probably due to other contention on the machine, for example, the Gradle daemon is using 25G by default 😆

We might need to put in some conservative defaults across the board here so that running concurrent tests doesn't blow up memory. These new defaults are somewhat aggressive and assume the current host is single-purpose and there's not a lot of other contention for resources.

Other than that the only remain item is documentation and that simply depends on to what degree we intend to document this. I suspect for 7.11 it will be minimal.

mark-vieira · 2020-12-05T05:49:13Z

The only packaging tests now failing are the Windows test classes that aren't applicable (no tests are run) but it's failing in the cleanup code due to a process still running it seems. Perhaps this is another side effect of us spinning up a JVM with a very large heap and that affecting sensitive timings in these tests.

qa/os/src/test/java/org/elasticsearch/packaging/util/DockerRun.java

pugnascotia · 2020-12-07T11:58:06Z

...ols/launchers/src/test/java/org/elasticsearch/tools/launchers/MachineDependentHeapTests.java

+
+public class MachineDependentHeapTests extends LaunchersTestCase {
+
+    public void testDetermineHeapSize() throws IOException {


Javadoc comments on the test cases help to ensure that the implementation of test matches its intent.

While I agree this can be helpful on complex tests, I think it is overkill on simple tests, such as these, that the name of the class and method generally make obvious what is being tested. In this particular case, I've renamed the method to testDefaultHeapSize() since it is testing the default when no configuration exists.

pugnascotia · 2020-12-07T12:03:39Z

...ion/tools/launchers/src/test/java/org/elasticsearch/tools/launchers/NodeRoleParserTests.java

+
+    public void testMasterOnlyNode() {
+        MachineDependentHeap.MachineNodeRole nodeRole = parseConfig(sb -> sb.append("node.roles: [master]"));
+        assertEquals(nodeRole, MASTER_ONLY);


My preference is to use assertThat pretty much everywhere these days. Not saying you should change it, but I find the messages it generates to be more readable.

Changed to assertThat.

pugnascotia · 2020-12-07T12:05:18Z

...ion/tools/launchers/src/test/java/org/elasticsearch/tools/launchers/NodeRoleParserTests.java

+import static org.junit.Assert.assertNotEquals;
+import static org.junit.Assert.fail;
+
+public class NodeRoleParserTests extends LaunchersTestCase {


Can we have tests for garbage input and invalid YAML too? I was going to suggest a JSON config, but the parser ought to parse that quite happily 😄

Added a test for invalid yaml (which is the same as garbage)

pugnascotia · 2020-12-07T12:08:02Z

qa/os/src/test/java/org/elasticsearch/packaging/test/ArchiveTests.java

@@ -311,7 +311,7 @@ public void test73CustomJvmOptionsDirectoryFilesWithoutOptionsExtensionIgnored()
            startElasticsearch();

            final String nodesResponse = makeRequest(Request.Get("http://localhost:9200/_nodes"));
-            assertThat(nodesResponse, containsString("\"heap_init_in_bytes\":1073741824"));
+            assertThat(nodesResponse, not(containsString("\"heap_init_in_bytes\":536870912")));


This changes makes me uncomfortable. Is it not possible that we might happen to calculate the heap to be this size, causing the test to fail?

I've rewritten the test to pass a bogus option, so that java should fail to start if the option is used.

jasontedor · 2020-12-08T03:47:21Z

...on/tools/launchers/src/main/java/org/elasticsearch/tools/launchers/MachineDependentHeap.java

+     * @throws IOException if unable to load elasticsearch.yml
+     */
+    public List<String> determineHeapSettings(Path configDir, List<String> userDefinedJvmOptions) throws IOException {
+        if (userDefinedJvmOptions.stream().anyMatch(s -> s.startsWith("-Xms") || s.startsWith("-Xmx"))) {


I don't think check is sufficient. A user can also set -XX:MaxHeapSize and -XX:MinHeapSize for which -Xmx and -Xms are synonyms.

Note that in JvmErgonomics#choose we start up a second JVM and extract whether or not a setting was passed in on the command line. We might want to reuse that here and then check if MinHeapSize or MaxHeapSize are passed on the command line. This might be more reliable?

I've moved the method to find the final options so it is reused here.

jasontedor · 2020-12-08T03:48:52Z

...on/tools/launchers/src/main/java/org/elasticsearch/tools/launchers/MachineDependentHeap.java

+                // Strangely formatted config, so just return defaults and let startup settings validation catch the problem
+                return MachineNodeRole.UNKNOWN;
+            } catch (YAMLException ex) {
+                throw new IllegalStateException("Unable to parse elasticsearch.yml:", ex);


I wonder if this should contain the full path? Or if we should return UNKNOWN and let startup fail us here too?

I like returning UNKOWN and letting Elasticsearch handle the misconfiguration.

jasontedor · 2020-12-08T03:49:45Z

...on/tools/launchers/src/main/java/org/elasticsearch/tools/launchers/MachineDependentHeap.java

+                            roles = (List<String>) map.get("node.roles");
+                        }
+                    } catch (ClassCastException ex) {
+                        throw new IllegalStateException("Unable to parse elasticsearch.yml. Expected 'node.roles' to be a list.");


Return UNKNOWN and let Elasticsearch handle it?

...on/tools/launchers/src/main/java/org/elasticsearch/tools/launchers/MachineDependentHeap.java

jasontedor

Mostly looks good. I left some comments, but I also didn't review all the details, I'll leave the detailed review to someone else.

@rjernst Are you picking up this work?

rjernst · 2020-12-08T06:54:06Z

Yes, I am picking this up. I will address all the comments soon.

Since elastic#65905 Elasticsearch has determined the Java heap settings from node roles and total system memory. This change allows the total system memory used in that calculation to be overridden with a user-specified value. This is intended to be used when Elasticsearch is running on a machine where some other software that consumes a non-negligible amount of memory is running. For example, a user could tell Elasticsearch to assume it was running on a machine with 3GB of RAM when actually it was running on a machine with 4GB of RAM. The system property is `es.total_memory_bytes`, so, for example, could be specified using `-Des.total_memory_bytes=3221225472`. (It is specified in bytes rather than using a unit, because it needs to be parsed by startup code that does not have access to the utility classes that interpret byte size units.)

Since #65905 Elasticsearch has determined the Java heap settings from node roles and total system memory. This change allows the total system memory used in that calculation to be overridden with a user-specified value. This is intended to be used when Elasticsearch is running on a machine where some other software that consumes a non-negligible amount of memory is running. For example, a user could tell Elasticsearch to assume it was running on a machine with 3GB of RAM when actually it was running on a machine with 4GB of RAM. The system property is `es.total_memory_bytes`, so, for example, could be specified using `-Des.total_memory_bytes=3221225472`. (It is specified in bytes rather than using a unit, because it needs to be parsed by startup code that does not have access to the utility classes that interpret byte size units.)

brsolomon-deloitte · 2021-11-17T13:03:24Z

... where is this formula actually documented?

https://www.elastic.co/guide/en/elasticsearch/reference/7.15/important-settings.html#heap-size-settings gives zero descriptive info on how the sizes are actually determined. Perhaps at the very least this PR could be linked to so those who are even a little bit curious about the determination formula don't have to go digging for it?

mark-vieira · 2021-11-17T17:39:13Z

The actual formulas are considered an implementation detail, so they are expressly not documented since we don't want folks to rely on these things as they are likely to change in the future. If you're curious as to the details the formulas are defined here.

brsolomon-deloitte · 2021-11-17T17:51:43Z

That's fair I guess. IMO slightly misconstruing the term "implementation detail."

Documenting that, e.g. "data nodes get 50% of total system memory when greater than 1 gigabyte up to a maximum of 31 gigabytes", maybe in a table format, is not an implementation detail. It is something that ES admins would want to know when deciding whether they should actually explicitly specify heap ops or let them be auto-determined. Exposing this back-of-the-envelope calculation side of the equation in documentation is not an implementation detail. The implementation detail in this case is how that determination is made such as how specifically machine resources are fetched/read. But if the human-readable definitions of what will be auto-determined should "not be relied upon", then I guess it does come close to meeting the definition of implementation detail.

mark-vieira · 2021-11-17T17:56:50Z

It is something that ES admins would want to know when deciding whether they should actually explicitly specify heap ops or let them be auto-determined.

The motivation with this change is that admins shouldn't need to do this, they should defer to the auto-determination, which may change across releases. Realistically, the deciding factor on whether heap should be explicitly specified is if you encounter problematic performance with the automatic settings.

But if the human-readable definitions of what will be auto-determined should "not be relied upon", then I guess it does come close to meeting the definition of implementation detail.

This is an accurate description of the intent here. Essentially heap size is either "whatever ES decides" (will obviously always be less than available system memory) or "whatever you explicitly set it to", choose one.

ersinbyrktr · 2022-07-07T09:45:22Z

Hi @mark-vieira 👋 greetings from Germany.

I was wondering why you chose to assign same heap memory ratio to coordinating-only as in data nodes in this PR. Unlike data nodes, coordinating-only nodes are not actually using filesystem cache afaik. Non-heap process memory is also mostly quite low. Is it because of OS network buffers ? Because I have a suspicion that it could be okay to set heap memory 60-70% for coordinating-only nodes 🤔

mark-vieira · 2022-07-08T17:23:15Z

Thanks for the comment @ersinbyrktr. I've opened #88395 to have our performance team investigate improvements here.

mark-vieira added the :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts label Dec 4, 2020

elasticmachine added the Team:Delivery Meta label for Delivery team label Dec 4, 2020

mark-vieira commented Dec 4, 2020

View reviewed changes

mark-vieira requested review from rjernst, jasontedor and pugnascotia December 4, 2020 22:43

mark-vieira added 3 commits December 4, 2020 14:55

Fix spotless

045ddea

Fix existing packaging tests

c57b8a9

More packaging test fixes

372d7e8

mark-vieira added 2 commits December 4, 2020 17:40

More fixes

6c49f48

Set heap size when starting windows service in packaging tests

afe45b7

mark-vieira commented Dec 5, 2020

View reviewed changes

qa/os/src/test/java/org/elasticsearch/packaging/util/DockerRun.java Show resolved Hide resolved

Add some better exception handling for poorly formatted config files

bf25202

pugnascotia reviewed Dec 7, 2020

View reviewed changes

jasontedor reviewed Dec 8, 2020

View reviewed changes

...on/tools/launchers/src/main/java/org/elasticsearch/tools/launchers/MachineDependentHeap.java Show resolved Hide resolved

jasontedor requested changes Dec 8, 2020

View reviewed changes

rjernst added 3 commits December 8, 2020 16:13

restore heap defaults in jvm.options

6ba5c66

Reuse findFinalOptions from ergonomics

8e6966c

cleanup role responses

e3726e7

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

droberts195 mentioned this pull request Oct 6, 2021

Allow total memory to be overridden #78750

Merged

droberts195 mentioned this pull request Dec 7, 2021

Various versions (7.13.2, 7.11.1) crashing on Apple Silicon (Old versions like 7.10.2 working) #81383

Closed

droberts195 mentioned this pull request May 4, 2022

Ergonomic JVM auto-sizing operates if a manually specified JVM size is not a multiple of 4MB #86431

Closed

mark-vieira mentioned this pull request Jul 8, 2022

Revisit automatic heap sizing heuristic for coordinating nodes #88395

Open

henningandersen mentioned this pull request Nov 14, 2022

Size JVM pools based on node.processors #91543

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autodetermine heap settings based on node roles and total system memory #65905

Autodetermine heap settings based on node roles and total system memory #65905

mark-vieira commented Dec 4, 2020 •

edited

Loading

elasticmachine commented Dec 4, 2020

mark-vieira Dec 4, 2020

pugnascotia Dec 7, 2020

jasontedor Dec 8, 2020

rjernst Dec 9, 2020

jasontedor Dec 9, 2020

rjernst Dec 10, 2020

jasontedor Dec 10, 2020

rjernst Dec 10, 2020 •

edited

Loading

rjernst Dec 14, 2020

rjernst Dec 14, 2020

mark-vieira commented Dec 5, 2020

mark-vieira commented Dec 5, 2020

pugnascotia Dec 7, 2020

rjernst Dec 10, 2020

pugnascotia Dec 7, 2020

rjernst Dec 10, 2020

pugnascotia Dec 7, 2020

rjernst Dec 10, 2020

pugnascotia Dec 7, 2020

rjernst Dec 10, 2020

jasontedor Dec 8, 2020

rjernst Dec 10, 2020

jasontedor Dec 8, 2020

rjernst Dec 10, 2020

jasontedor Dec 8, 2020

jasontedor left a comment

rjernst commented Dec 8, 2020

brsolomon-deloitte commented Nov 17, 2021

mark-vieira commented Nov 17, 2021

brsolomon-deloitte commented Nov 17, 2021

mark-vieira commented Nov 17, 2021

ersinbyrktr commented Jul 7, 2022

mark-vieira commented Jul 8, 2022


		public class MachineDependentHeapTests extends LaunchersTestCase {

		public void testDetermineHeapSize() throws IOException {

Autodetermine heap settings based on node roles and total system memory #65905

Autodetermine heap settings based on node roles and total system memory #65905

Conversation

mark-vieira commented Dec 4, 2020 • edited Loading

elasticmachine commented Dec 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjernst Dec 10, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mark-vieira commented Dec 5, 2020

mark-vieira commented Dec 5, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasontedor left a comment

Choose a reason for hiding this comment

rjernst commented Dec 8, 2020

brsolomon-deloitte commented Nov 17, 2021

mark-vieira commented Nov 17, 2021

brsolomon-deloitte commented Nov 17, 2021

mark-vieira commented Nov 17, 2021

ersinbyrktr commented Jul 7, 2022

mark-vieira commented Jul 8, 2022

mark-vieira commented Dec 4, 2020 •

edited

Loading

rjernst Dec 10, 2020 •

edited

Loading