Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default max local storage nodes to one #19964

Merged
merged 3 commits into from Aug 12, 2016

Conversation

Projects
None yet
5 participants
@jasontedor
Copy link
Member

commented Aug 11, 2016

This commit defaults the max local storage nodes to one. The motivation
for this change is that a default value greather than one is dangerous
as users sometimes end up unknowingly starting a second node and start
thinking that they have encountered data loss.

Closes #19679, supersedes #19748

Default max local storage nodes to one
This commit defaults the max local storage nodes to one. The motivation
for this change is that a default value greather than one is dangerous
as users sometimes end up unknowingly starting a second node and start
thinking that they have encountered data loss.
@@ -261,6 +261,7 @@ class ClusterFormationTasks {
'node.attr.testattr' : 'test',
'repositories.url.allowed_urls': 'http://snapshot.test*'
]
esConfig['node.max_local_storage_nodes'] = node.config.numNodes

This comment has been minimized.

Copy link
@nik9000

nik9000 Aug 11, 2016

Contributor

Cute.

@@ -1632,6 +1633,7 @@ protected Settings nodeSettings(int nodeOrdinal) {
Settings.Builder builder = Settings.builder()
// Default the watermarks to absurdly low to prevent the tests
// from failing on nodes without enough disk space
.put(NodeEnvironment.MAX_LOCAL_STORAGE_NODES_SETTING.getKey(), 8)

This comment has been minimized.

Copy link
@nik9000

nik9000 Aug 11, 2016

Contributor

I feel like we should either use a guess from the annotation or just go with Integer.MAX_VALUE. If we do stick with 8 we should add a comment about why 8.

Also, the comment above this line is actually for the next line. Maybe just shift this line above the comment?

This comment has been minimized.

Copy link
@jasontedor

jasontedor Aug 11, 2016

Author Member

That's a good point @nik9000, I will do that.

This comment has been minimized.

Copy link
@jasontedor

jasontedor Aug 12, 2016

Author Member

I pushed fe563c5 and then on further thought pushed fdee065.

Adjust test node max local storage nodes settings
This commit adjusts the node max local storage node settings value for
some tests
 - the provided value for ESIntegTestCase is derived from the value of
   the annotations
 - the default value for the InternalTestCluster is more carefully
   calculated
 - the value for the tribe unit tests is adjusted to reflect that there
   are two clusters in play
@rmuir

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2016

The motivation
for this change is that a default value greather than one is dangerous
as users sometimes end up unknowingly starting a second node and start
thinking that they have encountered data loss.

Also that when locks are lost, the exception is masked and dropped on the floor and ES keeps on trucking. Oh and did i mention they are filesystem locks?

Seems legitimately unsafe.

Simplify tests handling of max local storage nodes
This commit simplifies the handling of max local storage nodes in
integration tests by just setting the default max local storage nodes to
be the maximum possible integer.
@nik9000

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2016

LGTM

@mikemccand

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2016

Can we just remove this feature entirely, such that users must always be explicit on starting a node about precisely which path.data that node gets to use, and it must not be in use by any other node?

I think this is too much magic on ES's part, trying to have multiple nodes share path.data entries, and it has hit me (well, my brother, who then asked me WTF was happening) personally when he accidentally started up another node on the same box.

@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Aug 12, 2016

@mikemccand Doesn't this get us there, now you have to be explicit about wanting multiple nodes that share the same path.data? Removing the feature doesn't buy us much from a code perspective (we still need the locking code anyway), and it will complicate the build.

@jasontedor jasontedor merged commit 1f0673c into elastic:master Aug 12, 2016

2 checks passed

CLA Commit author is a member of Elasticsearch
Details
elasticsearch-ci Build finished.
Details

@jasontedor jasontedor deleted the jasontedor:default-max-local-storage-nodes branch Aug 12, 2016

@jasontedor

This comment has been minimized.

Copy link
Member Author

commented Aug 12, 2016

Thanks for reviewing @nik9000. I've merged this @mikemccand, but that shouldn't stop discussion on your proposal!

@mikemccand

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2016

@mikemccand Doesn't this get us there, now you have to be explicit about wanting multiple nodes that share the same path.data?

This is definitely an improvement (thank you! progress not perfection!), but what I'm saying is I don't think such dangerous magic should even be an option in ES.

@rmuir

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2016

Personally I feel the design is broken anyway, as i've said over, and over again. It relies on filesystem locking, which is unreliable by definition.

But worse, its lenient. Startup ES, index some docs, and go nuke that node.lock. You can continue about your business, continue indexing docs, and so on. The only thing you will see is this:

[2016-08-12 09:37:25,300][WARN ][env                      ] [_Oe9-bX] lock assertion failed
java.nio.file.NoSuchFileException: /home/rmuir/workspace/elasticsearch/distribution/zip/build/distributions/elasticsearch-5.0.0-alpha6-SNAPSHOT/data/nodes/0/node.lock

Why is such an important exception dropped on the floor and merely translated into a logger WARN?

This feature is 100% unsafe.

@nik9000

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2016

I'm ok with removing the feature altogether. If we're already breaking backwards compatibility with the setting maybe we can just kill it? @clintongormley, what do you think?

I like that we did this now though because I have a feeling even if we do decide to remove the feature it'll take some time because lots of tests and the gradle build rely on it.

@rjernst

This comment has been minimized.

Copy link
Member

commented Aug 12, 2016

The gradle build does not depend on it. Integ tests have unique installations per node, and even fantasy land tests create a unique temp dir per node for path.home iirc.

javanna added a commit to javanna/elasticsearch that referenced this pull request Nov 10, 2016

Remove max_local_storage_nodes from elasticsearch.yml
Given that the default is now 1, the comment in the config file was outdated. Also considering that the default value is production ready, we shouldn't list it among the values that need attention when going to production.

Relates to elastic#19964

javanna added a commit that referenced this pull request Nov 10, 2016

Remove max_local_storage_nodes from elasticsearch.yml (#21467)
Given that the default is now 1, the comment in the config file was outdated. Also considering that the default value is production ready, we shouldn't list it among the values that need attention when going to production.

Relates to #19964

javanna added a commit that referenced this pull request Nov 10, 2016

Remove max_local_storage_nodes from elasticsearch.yml (#21467)
Given that the default is now 1, the comment in the config file was outdated. Also considering that the default value is production ready, we shouldn't list it among the values that need attention when going to production.

Relates to #19964

javanna added a commit that referenced this pull request Nov 10, 2016

Remove max_local_storage_nodes from elasticsearch.yml (#21467)
Given that the default is now 1, the comment in the config file was outdated. Also considering that the default value is production ready, we shouldn't list it among the values that need attention when going to production.

Relates to #19964
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.