Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check when trying to reroute a shard to a non-data discovery node #28886

Merged
merged 4 commits into from Mar 12, 2018
Merged

Add check when trying to reroute a shard to a non-data discovery node #28886

merged 4 commits into from Mar 12, 2018

Conversation

autophagy
Copy link
Contributor

While trying to reroute a shard to or from a non-data node (a node with node.data=false), I encountered a null pointer exception. Though an exception is to be expected, the NPE was occurring because allocation.routingNodes() would not contain any non-data nodes, so when you attempt to do allocation.routingNodes.node(non-data-node) it would not find it, and thus error. This occured regardless of whether I was rerouting to or from a non-data node.

This PR adds a check (as well as a test for these use cases) to return a legible, useful exception if the discovery node you are rerouting to or from is not a data node.

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

1 similar comment
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@autophagy
Copy link
Contributor Author

Sorry - I have signed the CLA now. 🙂

@bleskes
Copy link
Contributor

bleskes commented Mar 2, 2018

@elasticmachine - please test this

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @autophagy for the PR. I left some very minor asks. Looks good otherwise.

@@ -122,6 +127,9 @@ public RerouteExplanation execute(RoutingAllocation allocation, boolean explain)
}

RoutingNode toRoutingNode = allocation.routingNodes().node(toDiscoNode.getId());
if (toRoutingNode == null && !toDiscoNode.isDataNode()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this validation up to the start of the method, next to the validation of the fromNode? then all of this is together and we can fail early.


import static java.util.Collections.emptyMap;

public class NonDataNodeShardRoutingTests extends ESAllocationTestCase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can move these to AllocationCommandsTests - no need to have a dedicated class for this.

.add(new DiscoveryNode("node1", "node1", "node1", "test1", "test1", buildNewFakeTransportAddress(), emptyMap(),
MASTER_DATA_ROLES, Version.CURRENT))
.add(new DiscoveryNode("node2", "node2", "node2", "test2", "test2", buildNewFakeTransportAddress(), emptyMap(),
Collections.unmodifiableSet(EnumSet.of(DiscoveryNode.Role.MASTER)), Version.CURRENT))).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we randomize and sometime choose a non-data/non-master nodes? i.e. give the node a random combination of roles, making sure it never gets the data role.

.add(new DiscoveryNode("node1", "node1", "node1", "test1", "test1", buildNewFakeTransportAddress(), emptyMap(),
MASTER_DATA_ROLES, Version.CURRENT))
.add(new DiscoveryNode("node2", "node2", "node2", "test2", "test2", buildNewFakeTransportAddress(), emptyMap(),
Collections.unmodifiableSet(EnumSet.of(DiscoveryNode.Role.MASTER)), Version.CURRENT))).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment on randomization of roles.

@autophagy
Copy link
Contributor Author

@bleskes Thank you for the comments! I think that fixup should address them.

for (ShardRouting shardRouting : allocation.routingNodes().node(fromDiscoNode.getId())) {
RoutingNode fromRoutingNode = allocation.routingNodes().node(fromDiscoNode.getId());
if (fromRoutingNode == null && !fromDiscoNode.isDataNode()) {
throw new IllegalArgumentException("[move_allocation] can't move " + shardId + ", from " + fromDiscoNode + ", to " + toDiscoNode + ", since its not allowed, reason: " + fromDiscoNode + " is not a data node.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think it'd be better for the message to read:

[move_allocation] can't move abc123 from node1 to node2: node1 is not a data node

(NB less punctuation, and no need to say since its not allowed)

RoutingAllocation routingAllocation = new RoutingAllocation(new AllocationDeciders(Settings.EMPTY, Collections.emptyList()),
new RoutingNodes(clusterState, false), clusterState, ClusterInfo.EMPTY, System.nanoTime());
logger.info("--> executing move allocation command to non-data node");
expectThrows(IllegalArgumentException.class, () -> command.execute(routingAllocation, false));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please could you assert that the content of the message is correct? (expectThrows returns the exception thrown, so you just need to assign it to a variable and check its message.)

}
RoutingNode toRoutingNode = allocation.routingNodes().node(toDiscoNode.getId());
if (toRoutingNode == null && !toDiscoNode.isDataNode()) {
throw new IllegalArgumentException("[move_allocation] can't move " + shardId + ", from " + fromDiscoNode + ", to " + toDiscoNode + ", since its not allowed, reason: " + toDiscoNode + " is not a data node.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As previous comment - this message could be leaner.

RoutingAllocation routingAllocation = new RoutingAllocation(new AllocationDeciders(Settings.EMPTY, Collections.emptyList()),
new RoutingNodes(clusterState, false), clusterState, ClusterInfo.EMPTY, System.nanoTime());
logger.info("--> executing move allocation command from non-data node");
expectThrows(IllegalArgumentException.class, () -> command.execute(routingAllocation, false));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As previous comment: please could you assert that the content of the message is correct?

@DaveCTurner
Copy link
Contributor

Also you seem to have signed the CLA with a different email address from the one attached to these commits (Autophagy <mika@crate.io>). Could you sign again with this email address please?

@autophagy
Copy link
Contributor Author

autophagy commented Mar 9, 2018

@DaveCTurner Thanks for the suggestion to check the message - I've refactored the test to use the same initial set up as testMoveShardCommand as the test was testing the wrong exception (but the right class).

I've also signed the CLA with the crate.io address. 🙂

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more minor things that I missed on the first review, sorry.

new RoutingNodes(clusterState, false), clusterState, ClusterInfo.EMPTY, System.nanoTime());
logger.info("--> executing move allocation command from non-data node");
IllegalArgumentException e = expectThrows(IllegalArgumentException.class, () -> command.execute(routingAllocation, false));
assertEquals("[move_allocation] can't move 0 from node2 to node1: node2 is not a data node.", e.getMessage());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, thanks. I'm glad you did that, because it shows that this message isn't what I was expecting. The shard number on its own is likely to cause confusion, and also it's useful to have the full DiscoveryNode.toString() output rather than just using .getName().

Something like this'd be great:

DaveCTurner@c53a980

for (ShardRouting shardRouting : allocation.routingNodes().node(fromDiscoNode.getId())) {
RoutingNode fromRoutingNode = allocation.routingNodes().node(fromDiscoNode.getId());
if (fromRoutingNode == null && !fromDiscoNode.isDataNode()) {
throw new IllegalArgumentException("[move_allocation] can't move " + shardId + " from " + fromDiscoNode.getName() + " to " + toDiscoNode.getName() + ": " + fromDiscoNode.getName() + " is not a data node.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you wrap these lines to <120 characters? We're trying to cut down on overlong lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing!

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, LGTM.

@DaveCTurner
Copy link
Contributor

@elasticmachine please test this.

@DaveCTurner
Copy link
Contributor

@bleskes are you happy with the changes made?

@DaveCTurner DaveCTurner added >bug v7.0.0 v6.3.0 :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Mar 9, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @autophagy . I'll pull this in.

@bleskes bleskes merged commit a7b53fd into elastic:master Mar 12, 2018
bleskes pushed a commit that referenced this pull request Mar 12, 2018
…#28886)

While trying to reroute a shard to or from a non-data node (a node with ``node.data=false``), I encountered a null pointer exception. Though an exception is to be expected, the NPE was occurring because ``allocation.routingNodes()`` would not contain any non-data nodes, so when you attempt to do ``allocation.routingNodes.node(non-data-node)`` it would not find it, and thus error. This occurred regardless of whether I was rerouting to or from a non-data node.

This PR adds a check (as well as a test for these use cases) to return a legible, useful exception if the discovery node you are rerouting to or from is not a data node.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v6.3.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants