Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't allow nodes with missing custom meta data to join cluster #15401

Conversation

Projects
None yet
4 participants
@areek
Copy link
Contributor

commented Dec 12, 2015

Currently, when some nodes in a cluster are missing plugins with custom metadata, the nodes silently fail while trying to deserialize the unknown custom metadata on the first cluster state update. Ideally, the node should fail hard and make it obvious why it failed.

This PR treats node with missing custom metadata same as a node with version incompatibility. When a node with missing custom metadata tries to join a cluster, the master detects it and the node fails to join the cluster.

closes #13445

@areek

This comment has been minimized.

Copy link
Contributor Author

commented Dec 12, 2015

would like to know your thoughts on this, @imotov & @bleskes.

@@ -189,6 +189,13 @@ public DiscoveryNode localNode() {
}

@Override
public List<String> localCustomMetaDataTypes() {
List<String> customMetaDataList = new ArrayList<>(MetaData.customPrototypes.keySet());

This comment has been minimized.

Copy link
@areek

areek Dec 12, 2015

Author Contributor

Any better way to grab registered custom metadata than this?

This comment has been minimized.

Copy link
@imotov

imotov Dec 15, 2015

Member

I think relying on order of elements in the list is brittle. I think representing this list as a set might be a more reliable solution.

@imotov

This comment has been minimized.

Copy link
Member

commented Dec 15, 2015

Custom data types can be added on three levels - cluster state level, metadata level and index metadata level. This PR only handles metadata level. I think we should handle all three level

@areek areek force-pushed the areek:enhancement/reject_nodes_with_missing_custom_metadata branch 2 times, most recently Dec 15, 2015

@areek

This comment has been minimized.

Copy link
Contributor Author

commented Dec 16, 2015

Thanks for having a look, @imotov! I updated the PR to use the node join validation request (thanks @bleskes for the tip!) to attempt to deserialize master's cluster state. Now, if the joining node has missing custom data types on any level, the deserialization will fail. The join failure is logged as an ISE hinting missing plugins on the joining node.

@@ -824,8 +824,7 @@ public static void validateStateIsFromCurrentMaster(ESLogger logger, DiscoveryNo
}
}

void handleJoinRequest(final DiscoveryNode node, final MembershipAction.JoinCallback callback) {

void handleJoinRequest(final DiscoveryNode node, final ClusterState state, final MembershipAction.JoinCallback callback) {

This comment has been minimized.

Copy link
@bleskes

bleskes Dec 16, 2015

Member

why can't we sample the state here - no need for the extra param, right?

This comment has been minimized.

Copy link
@areek

areek Dec 16, 2015

Author Contributor

We can but passing the cluster state as param makes this easily testable. Initially, I tried writing a test with two nodes, one having a custom plugin and another none, but the testing framework makes it tricky. any ideas on how to test this functionality, if we just sample the state instead of passing it as a param?

This comment has been minimized.

Copy link
@bleskes

bleskes Dec 17, 2015

Member

oh. sorry. I'm missed the usage in the test. All good.

@bleskes

View changes

core/src/main/java/org/elasticsearch/discovery/zen/ZenDiscovery.java Outdated
try {
membership.sendValidateJoinRequestBlocking(node, state, joinTimeout);
} catch (Throwable e) {
callback.onFailure(new IllegalStateException("Can't handle join request from node [" + node + "], probable cause missing plugins", ExceptionsHelper.unwrapCause(e)));

This comment has been minimized.

Copy link
@bleskes

bleskes Dec 16, 2015

Member

can we change the message to what it is? "failure why sending a validation request to node" . The plugin part can better go in the type resolution (i.e., lookupPrototypeSafe ) ?

This comment has been minimized.

Copy link
@areek

areek Dec 16, 2015

Author Contributor

I changed the message as suggested in areek@3cc23f0

@bleskes

View changes

core/src/main/java/org/elasticsearch/discovery/zen/ZenDiscovery.java Outdated
try {
membership.sendValidateJoinRequestBlocking(node, state, joinTimeout);
} catch (Throwable e) {
callback.onFailure(new IllegalStateException("failure when sending a validation request to node", ExceptionsHelper.unwrapCause(e)));

This comment has been minimized.

Copy link
@bleskes

bleskes Dec 17, 2015

Member

why do we need to unwrap the cause?

This comment has been minimized.

Copy link
@areek

areek Dec 18, 2015

Author Contributor

Now we use the original cause

@bleskes

View changes

core/src/main/java/org/elasticsearch/cluster/ClusterState.java Outdated
@@ -129,7 +129,7 @@ public static void registerPrototype(String type, Custom proto) {
@SuppressWarnings("unchecked")
T proto = (T)customPrototypes.get(type);
if (proto == null) {
throw new IllegalArgumentException("No custom state prototype registered for type [" + type + "]");
throw new IllegalArgumentException("No custom state prototype registered for type [" + type + "], probable cause missing plugins");

This comment has been minimized.

Copy link
@bleskes

bleskes Dec 17, 2015

Member

you're the english master - but "probable cause missing plugins" ?

This comment has been minimized.

Copy link
@areek

areek Dec 18, 2015

Author Contributor

changed it to "node likely missing plugins"

@bleskes

View changes

core/src/main/java/org/elasticsearch/discovery/zen/ZenDiscovery.java Outdated
try {
membership.sendValidateJoinRequestBlocking(node, state, joinTimeout);
} catch (Throwable e) {
callback.onFailure(new IllegalStateException("failure when sending a validation request to node", e));

This comment has been minimized.

Copy link
@bleskes

bleskes Dec 18, 2015

Member

can we log a warn here - "failed to validate incoming join request from node [{}]" ? I think this should be visible.

@bleskes

This comment has been minimized.

Copy link
Member

commented Dec 18, 2015

LGTM. Left one minor request but no need for another review. Also, I think this can go into 2.2, no?

@areek areek force-pushed the areek:enhancement/reject_nodes_with_missing_custom_metadata branch 2 times, most recently Dec 18, 2015

@areek areek force-pushed the areek:enhancement/reject_nodes_with_missing_custom_metadata branch to 9d9b557 Dec 18, 2015

@areek areek merged commit 9d9b557 into elastic:master Dec 18, 2015

1 check passed

CLA Commit author is a member of Elasticsearch
Details

@areek areek added the v2.2.0 label Dec 19, 2015

@areek

This comment has been minimized.

Copy link
Contributor Author

commented Dec 19, 2015

Thanks @bleskes for the review, back ported to 2.x in 8ab3ca8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.