Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autodetermine heap settings based on node roles and total system memory #65905

Merged
merged 39 commits into from
Dec 16, 2020

Conversation

mark-vieira
Copy link
Contributor

@mark-vieira mark-vieira commented Dec 4, 2020

This commit expands our JVM egonomics to also automatically determine
appropriate heap size based on the total available system memory as well
as the roles assigned to the node. Role determination is done via a
naive parsing of elasticsearch.yml. No settings validation is done and
only the 'node.roles' setting is taken into consideration.

For heap purposes a node falls into one of four (4) categories:

  1. A 'master-only' node. This is a node with only the 'master' role.
  2. A 'ml-only' node. Similarly, a node with only the 'ml' role.
  3. A 'data' node. This is basically the 'other' case. A node with any
    set of roles other than only master or only ml is considered a 'data'
    node, to include things like coordinating-only or "tie-breaker" nodes.
  4. Unknown. This is the case if legacy settings are used. In this
    scenario we fallback to the old default heap options of 1GB.

In all cases we short-circuit if a user provides explicit heap options
so we only ever auto-determine heap if no existing heap options exist.
Starting with this commit the default heap settings (1GB) are now
removed from the default jvm.options which means we'll start auto-
setting heap as the new default.

This commit expands our JVM egonomics to also automatically determine
appropriate heap size based on the total available system memory as well
as the roles assigned to the node. Role determination is done via a
naive parsing of elasticsearch.yml. No settings validation is done and
only the 'node.roles' setting is taken into consideration.

For heap purposes a node falls into one of four (4) categories:

1. A 'master-only' node. This is a node with only the 'master' role.
2. A 'ml-only' node. Similarly, a node with only the 'ml' role.
3. A 'data' node. This is basically the 'other' case. A node with any
set of roles other than only master or only ml is considered a 'data'
node, to include things like coordinating-only or "tie-breaker" nodes.
4. Unknown. This is the case if legacy settings are used. In this
scenario we fallback to the old default heap options of 1GB.

In all cases we short-circuit if a user provides explicit heap options
so we only ever auto-determine heap if no existing heap options exist.
Starting with this commit the default heap settings (1GB) are now
removed from the default jvm.options which means we'll start auto-
setting heap as the new default.
@mark-vieira mark-vieira added the :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts label Dec 4, 2020
@elasticmachine elasticmachine added the Team:Delivery Meta label for Delivery team label Dec 4, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-delivery (Team:Delivery)

@@ -36,8 +36,8 @@
# Xms represents the initial size of the JVM heap
# Xmx represents the maximum size of the JVM heap

-Xms${heap.min}
-Xmx${heap.max}
# -Xms${heap.min}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to think about how to change this. Should we remove this section entirely? Continue to include it also recommend folks stick with auto-determined heap size?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to leave it in with a note, since until now our users have always had to configure it, and our training courses will expect it to exist. Maybe we can remove it in master but retain it with a comment in 7.x?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tricky. Either way (removing, or commenting out), we are going to potentially break automation here. For example, if there's automation in place that downloads our tar archive, explodes it, and uses sed to replace the heap size.

That leads me to wonder out loud if we should not break users here. I'd propose the following: in 7.x, we leave this as-is in the jvm.options file. Perhaps we provide a warning when the heap size is set manually by a user, pointing them to documentation for machine-dependent heap sizes. In Cloud, we still want to activate the machine-dependent heap sizes, so we'd need deployments there to remove the -Xms/-Xmx flags from their jvm.options file.

In 8.0, we remove this from the default jvm.options file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the concern, though I'm not sure about a warning. I don't think we intend to disallow setting the heap manually right? Yet a warning for setting the heap would seem like just that. Such a sed replacement in practice would seem fragile. It could break on any change to the default.

I'm conflicted. I don't see a way to warn some users without annoying others. But I do think this is something we can followup on, so in this PR I will restore the original value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion was:

Perhaps we provide a warning when the heap size is set manually by a user, pointing them to documentation for machine-dependent heap sizes.

Not a warning that we will disallow setting the heap size manually, but a permanent warning that the heap size is being set manually and we recommend not for the optimal experience.

But I do think this is something we can followup on, so in this PR I will restore the original value.

Are you saying then that the default experience will be to not get machine-dependent heap sizing? That seems contrary to our goals here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we leave the jvm.options as is, the default will remain not getting machine-dependent heap sizing. If we emit a warning, any user who chooses to not want to use machine-dependent heap sizing will get an annoying warning they cannot use. Do we agree on those statements?

My suggestion was to leave the jvm.options as is (so as not to break the theoretical users that could be using sed to replace the heap size), and then follow up with whatever we decide to do regarding helping users convert within 7.x. This suggestion is because cloud will not rely on the default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we leave the jvm.options as is, the default will remain not getting machine-dependent heap sizing. If we emit a warning, any user who chooses to not want to use machine-dependent heap sizing will get an annoying warning they cannot use. Do we agree on those statements?

Why can't they use it? The point of the warning is to tell them to stop setting the heap size manually, let Elasticsearch do it for them. The warning is useful to them.

Copy link
Member

@rjernst rjernst Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not sure how "cannot use" got in my response, it was meant to be "cannot remove". That is it is a persistent warning that the only way to remove is to go with ES's memory choices. The warning is useful in that they find out they could be using our automatic heap sizing, but are we saying users will never tune heap anymore? I personally get annoyed if there are warnings I cannot remove with normal usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jason and I spoke and came to an agreement on the above discussion. @jasontedor Let me know if I missed or misunderstood anything.

We should add a warning when heap size is manually specified, since our goal is to have users use the auto heap. There are however some issues that prevent us from doing that in this PR. First is a technical issue: how do we pass back the message when the only output of the jvm options parser is the final jvm options?. The second issue is allowing cloud time to adapt to using auto sizing so that we do not spew errors across all cloud deployments. I created #66308 to ensure we followup with adding this warning.

Regarding whether to leave the heap options or remove them in a minor, we think the auto heap sizing is important enough that we don't want to have any more new users specifying heap. So, we should remove the Xmx/Xms from jvm.options. Users should no longer be relying on modifying jvm.options, but instead specifying these in jvm.options.d files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mark-vieira @jasontedor I've removed Xmx/Xms in 88a81a1, but left in a comment explaining where the heap size should be set if it needs to be.

@mark-vieira
Copy link
Contributor Author

@rjernst I wasn't able to get as far as adding packaging tests as fixing existing tests took priority. I think I've got the linux tests passing but there's a Windows test still failing. I think the root cause is the service blows up trying to start because it can't allocate enough heap, probably due to other contention on the machine, for example, the Gradle daemon is using 25G by default 😆

We might need to put in some conservative defaults across the board here so that running concurrent tests doesn't blow up memory. These new defaults are somewhat aggressive and assume the current host is single-purpose and there's not a lot of other contention for resources.

Other than that the only remain item is documentation and that simply depends on to what degree we intend to document this. I suspect for 7.11 it will be minimal.

@mark-vieira
Copy link
Contributor Author

The only packaging tests now failing are the Windows test classes that aren't applicable (no tests are run) but it's failing in the cleanup code due to a process still running it seems. Perhaps this is another side effect of us spinning up a JVM with a very large heap and that affecting sensitive timings in these tests.


public class MachineDependentHeapTests extends LaunchersTestCase {

public void testDetermineHeapSize() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadoc comments on the test cases help to ensure that the implementation of test matches its intent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree this can be helpful on complex tests, I think it is overkill on simple tests, such as these, that the name of the class and method generally make obvious what is being tested. In this particular case, I've renamed the method to testDefaultHeapSize() since it is testing the default when no configuration exists.


public void testMasterOnlyNode() {
MachineDependentHeap.MachineNodeRole nodeRole = parseConfig(sb -> sb.append("node.roles: [master]"));
assertEquals(nodeRole, MASTER_ONLY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is to use assertThat pretty much everywhere these days. Not saying you should change it, but I find the messages it generates to be more readable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to assertThat.

import static org.junit.Assert.assertNotEquals;
import static org.junit.Assert.fail;

public class NodeRoleParserTests extends LaunchersTestCase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have tests for garbage input and invalid YAML too? I was going to suggest a JSON config, but the parser ought to parse that quite happily 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test for invalid yaml (which is the same as garbage)

@@ -311,7 +311,7 @@ public void test73CustomJvmOptionsDirectoryFilesWithoutOptionsExtensionIgnored()
startElasticsearch();

final String nodesResponse = makeRequest(Request.Get("http://localhost:9200/_nodes"));
assertThat(nodesResponse, containsString("\"heap_init_in_bytes\":1073741824"));
assertThat(nodesResponse, not(containsString("\"heap_init_in_bytes\":536870912")));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes makes me uncomfortable. Is it not possible that we might happen to calculate the heap to be this size, causing the test to fail?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rewritten the test to pass a bogus option, so that java should fail to start if the option is used.

* @throws IOException if unable to load elasticsearch.yml
*/
public List<String> determineHeapSettings(Path configDir, List<String> userDefinedJvmOptions) throws IOException {
if (userDefinedJvmOptions.stream().anyMatch(s -> s.startsWith("-Xms") || s.startsWith("-Xmx"))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think check is sufficient. A user can also set -XX:MaxHeapSize and -XX:MinHeapSize for which -Xmx and -Xms are synonyms.

Note that in JvmErgonomics#choose we start up a second JVM and extract whether or not a setting was passed in on the command line. We might want to reuse that here and then check if MinHeapSize or MaxHeapSize are passed on the command line. This might be more reliable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the method to find the final options so it is reused here.

// Strangely formatted config, so just return defaults and let startup settings validation catch the problem
return MachineNodeRole.UNKNOWN;
} catch (YAMLException ex) {
throw new IllegalStateException("Unable to parse elasticsearch.yml:", ex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should contain the full path? Or if we should return UNKNOWN and let startup fail us here too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like returning UNKOWN and letting Elasticsearch handle the misconfiguration.

roles = (List<String>) map.get("node.roles");
}
} catch (ClassCastException ex) {
throw new IllegalStateException("Unable to parse elasticsearch.yml. Expected 'node.roles' to be a list.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return UNKNOWN and let Elasticsearch handle it?

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. I left some comments, but I also didn't review all the details, I'll leave the detailed review to someone else.

@rjernst Are you picking up this work?

@rjernst
Copy link
Member

rjernst commented Dec 8, 2020

Yes, I am picking this up. I will address all the comments soon.

droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Oct 6, 2021
Since elastic#65905 Elasticsearch has determined the Java heap settings
from node roles and total system memory.

This change allows the total system memory used in that calculation
to be overridden with a user-specified value. This is intended to
be used when Elasticsearch is running on a machine where some other
software that consumes a non-negligible amount of memory is running.
For example, a user could tell Elasticsearch to assume it was
running on a machine with 3GB of RAM when actually it was running
on a machine with 4GB of RAM.

The system property is `es.total_memory_bytes`, so, for example,
could be specified using `-Des.total_memory_bytes=3221225472`.
(It is specified in bytes rather than using a unit, because it
needs to be parsed by startup code that does not have access to
the utility classes that interpret byte size units.)
droberts195 added a commit that referenced this pull request Oct 16, 2021
Since #65905 Elasticsearch has determined the Java heap settings
from node roles and total system memory.

This change allows the total system memory used in that calculation
to be overridden with a user-specified value. This is intended to
be used when Elasticsearch is running on a machine where some other
software that consumes a non-negligible amount of memory is running.
For example, a user could tell Elasticsearch to assume it was
running on a machine with 3GB of RAM when actually it was running
on a machine with 4GB of RAM.

The system property is `es.total_memory_bytes`, so, for example,
could be specified using `-Des.total_memory_bytes=3221225472`.
(It is specified in bytes rather than using a unit, because it
needs to be parsed by startup code that does not have access to
the utility classes that interpret byte size units.)
@brsolomon-deloitte
Copy link

... where is this formula actually documented?

https://www.elastic.co/guide/en/elasticsearch/reference/7.15/important-settings.html#heap-size-settings gives zero descriptive info on how the sizes are actually determined. Perhaps at the very least this PR could be linked to so those who are even a little bit curious about the determination formula don't have to go digging for it?

@mark-vieira
Copy link
Contributor Author

The actual formulas are considered an implementation detail, so they are expressly not documented since we don't want folks to rely on these things as they are likely to change in the future. If you're curious as to the details the formulas are defined here.

@brsolomon-deloitte
Copy link

That's fair I guess. IMO slightly misconstruing the term "implementation detail."

Documenting that, e.g. "data nodes get 50% of total system memory when greater than 1 gigabyte up to a maximum of 31 gigabytes", maybe in a table format, is not an implementation detail. It is something that ES admins would want to know when deciding whether they should actually explicitly specify heap ops or let them be auto-determined. Exposing this back-of-the-envelope calculation side of the equation in documentation is not an implementation detail. The implementation detail in this case is how that determination is made such as how specifically machine resources are fetched/read. But if the human-readable definitions of what will be auto-determined should "not be relied upon", then I guess it does come close to meeting the definition of implementation detail.

@mark-vieira
Copy link
Contributor Author

It is something that ES admins would want to know when deciding whether they should actually explicitly specify heap ops or let them be auto-determined.

The motivation with this change is that admins shouldn't need to do this, they should defer to the auto-determination, which may change across releases. Realistically, the deciding factor on whether heap should be explicitly specified is if you encounter problematic performance with the automatic settings.

But if the human-readable definitions of what will be auto-determined should "not be relied upon", then I guess it does come close to meeting the definition of implementation detail.

This is an accurate description of the intent here. Essentially heap size is either "whatever ES decides" (will obviously always be less than available system memory) or "whatever you explicitly set it to", choose one.

@ersinbyrktr
Copy link

Hi @mark-vieira 👋 greetings from Germany.

I was wondering why you chose to assign same heap memory ratio to coordinating-only as in data nodes in this PR. Unlike data nodes, coordinating-only nodes are not actually using filesystem cache afaik. Non-heap process memory is also mostly quite low. Is it because of OS network buffers ? Because I have a suspicion that it could be okay to set heap memory 60-70% for coordinating-only nodes 🤔

@mark-vieira
Copy link
Contributor Author

Thanks for the comment @ersinbyrktr. I've opened #88395 to have our performance team investigate improvements here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts Team:Core/Infra Meta label for core/infra team Team:Delivery Meta label for Delivery team v7.11.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.