Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default max local storage nodes to one #19964

Merged
merged 3 commits into from
Aug 12, 2016
Merged

Default max local storage nodes to one #19964

merged 3 commits into from
Aug 12, 2016

Conversation

jasontedor
Copy link
Member

@jasontedor jasontedor commented Aug 11, 2016

This commit defaults the max local storage nodes to one. The motivation
for this change is that a default value greather than one is dangerous
as users sometimes end up unknowingly starting a second node and start
thinking that they have encountered data loss.

Closes #19679, supersedes #19748

This commit defaults the max local storage nodes to one. The motivation
for this change is that a default value greather than one is dangerous
as users sometimes end up unknowingly starting a second node and start
thinking that they have encountered data loss.
@@ -261,6 +261,7 @@ class ClusterFormationTasks {
'node.attr.testattr' : 'test',
'repositories.url.allowed_urls': 'http://snapshot.test*'
]
esConfig['node.max_local_storage_nodes'] = node.config.numNodes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cute.

This commit adjusts the node max local storage node settings value for
some tests
 - the provided value for ESIntegTestCase is derived from the value of
   the annotations
 - the default value for the InternalTestCluster is more carefully
   calculated
 - the value for the tribe unit tests is adjusted to reflect that there
   are two clusters in play
@rmuir
Copy link
Contributor

rmuir commented Aug 12, 2016

The motivation
for this change is that a default value greather than one is dangerous
as users sometimes end up unknowingly starting a second node and start
thinking that they have encountered data loss.

Also that when locks are lost, the exception is masked and dropped on the floor and ES keeps on trucking. Oh and did i mention they are filesystem locks?

Seems legitimately unsafe.

This commit simplifies the handling of max local storage nodes in
integration tests by just setting the default max local storage nodes to
be the maximum possible integer.
@nik9000
Copy link
Member

nik9000 commented Aug 12, 2016

LGTM

@mikemccand
Copy link
Contributor

Can we just remove this feature entirely, such that users must always be explicit on starting a node about precisely which path.data that node gets to use, and it must not be in use by any other node?

I think this is too much magic on ES's part, trying to have multiple nodes share path.data entries, and it has hit me (well, my brother, who then asked me WTF was happening) personally when he accidentally started up another node on the same box.

@jasontedor
Copy link
Member Author

@mikemccand Doesn't this get us there, now you have to be explicit about wanting multiple nodes that share the same path.data? Removing the feature doesn't buy us much from a code perspective (we still need the locking code anyway), and it will complicate the build.

@jasontedor jasontedor merged commit 1f0673c into elastic:master Aug 12, 2016
@jasontedor jasontedor deleted the default-max-local-storage-nodes branch August 12, 2016 13:26
@jasontedor
Copy link
Member Author

Thanks for reviewing @nik9000. I've merged this @mikemccand, but that shouldn't stop discussion on your proposal!

@mikemccand
Copy link
Contributor

@mikemccand Doesn't this get us there, now you have to be explicit about wanting multiple nodes that share the same path.data?

This is definitely an improvement (thank you! progress not perfection!), but what I'm saying is I don't think such dangerous magic should even be an option in ES.

@rmuir
Copy link
Contributor

rmuir commented Aug 12, 2016

Personally I feel the design is broken anyway, as i've said over, and over again. It relies on filesystem locking, which is unreliable by definition.

But worse, its lenient. Startup ES, index some docs, and go nuke that node.lock. You can continue about your business, continue indexing docs, and so on. The only thing you will see is this:

[2016-08-12 09:37:25,300][WARN ][env                      ] [_Oe9-bX] lock assertion failed
java.nio.file.NoSuchFileException: /home/rmuir/workspace/elasticsearch/distribution/zip/build/distributions/elasticsearch-5.0.0-alpha6-SNAPSHOT/data/nodes/0/node.lock

Why is such an important exception dropped on the floor and merely translated into a logger WARN?

This feature is 100% unsafe.

@nik9000
Copy link
Member

nik9000 commented Aug 12, 2016

I'm ok with removing the feature altogether. If we're already breaking backwards compatibility with the setting maybe we can just kill it? @clintongormley, what do you think?

I like that we did this now though because I have a feeling even if we do decide to remove the feature it'll take some time because lots of tests and the gradle build rely on it.

@rjernst
Copy link
Member

rjernst commented Aug 12, 2016

The gradle build does not depend on it. Integ tests have unique installations per node, and even fantasy land tests create a unique temp dir per node for path.home iirc.

javanna added a commit to javanna/elasticsearch that referenced this pull request Nov 10, 2016
Given that the default is now 1, the comment in the config file was outdated. Also considering that the default value is production ready, we shouldn't list it among the values that need attention when going to production.

Relates to elastic#19964
javanna added a commit that referenced this pull request Nov 10, 2016
Given that the default is now 1, the comment in the config file was outdated. Also considering that the default value is production ready, we shouldn't list it among the values that need attention when going to production.

Relates to #19964
javanna added a commit that referenced this pull request Nov 10, 2016
Given that the default is now 1, the comment in the config file was outdated. Also considering that the default value is production ready, we shouldn't list it among the values that need attention when going to production.

Relates to #19964
javanna added a commit that referenced this pull request Nov 10, 2016
Given that the default is now 1, the comment in the config file was outdated. Also considering that the default value is production ready, we shouldn't list it among the values that need attention when going to production.

Relates to #19964
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants