New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add check when trying to reroute a shard to a non-data discovery node #28886
Add check when trying to reroute a shard to a non-data discovery node #28886
Conversation
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
1 similar comment
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
Sorry - I have signed the CLA now. 🙂 |
@elasticmachine - please test this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @autophagy for the PR. I left some very minor asks. Looks good otherwise.
@@ -122,6 +127,9 @@ public RerouteExplanation execute(RoutingAllocation allocation, boolean explain) | |||
} | |||
|
|||
RoutingNode toRoutingNode = allocation.routingNodes().node(toDiscoNode.getId()); | |||
if (toRoutingNode == null && !toDiscoNode.isDataNode()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move this validation up to the start of the method, next to the validation of the fromNode? then all of this is together and we can fail early.
|
||
import static java.util.Collections.emptyMap; | ||
|
||
public class NonDataNodeShardRoutingTests extends ESAllocationTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move these to AllocationCommandsTests
- no need to have a dedicated class for this.
.add(new DiscoveryNode("node1", "node1", "node1", "test1", "test1", buildNewFakeTransportAddress(), emptyMap(), | ||
MASTER_DATA_ROLES, Version.CURRENT)) | ||
.add(new DiscoveryNode("node2", "node2", "node2", "test2", "test2", buildNewFakeTransportAddress(), emptyMap(), | ||
Collections.unmodifiableSet(EnumSet.of(DiscoveryNode.Role.MASTER)), Version.CURRENT))).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we randomize and sometime choose a non-data/non-master nodes? i.e. give the node a random combination of roles, making sure it never gets the data role.
.add(new DiscoveryNode("node1", "node1", "node1", "test1", "test1", buildNewFakeTransportAddress(), emptyMap(), | ||
MASTER_DATA_ROLES, Version.CURRENT)) | ||
.add(new DiscoveryNode("node2", "node2", "node2", "test2", "test2", buildNewFakeTransportAddress(), emptyMap(), | ||
Collections.unmodifiableSet(EnumSet.of(DiscoveryNode.Role.MASTER)), Version.CURRENT))).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment on randomization of roles.
@bleskes Thank you for the comments! I think that fixup should address them. |
for (ShardRouting shardRouting : allocation.routingNodes().node(fromDiscoNode.getId())) { | ||
RoutingNode fromRoutingNode = allocation.routingNodes().node(fromDiscoNode.getId()); | ||
if (fromRoutingNode == null && !fromDiscoNode.isDataNode()) { | ||
throw new IllegalArgumentException("[move_allocation] can't move " + shardId + ", from " + fromDiscoNode + ", to " + toDiscoNode + ", since its not allowed, reason: " + fromDiscoNode + " is not a data node."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think it'd be better for the message to read:
[move_allocation] can't move abc123 from node1 to node2: node1 is not a data node
(NB less punctuation, and no need to say since its not allowed
)
RoutingAllocation routingAllocation = new RoutingAllocation(new AllocationDeciders(Settings.EMPTY, Collections.emptyList()), | ||
new RoutingNodes(clusterState, false), clusterState, ClusterInfo.EMPTY, System.nanoTime()); | ||
logger.info("--> executing move allocation command to non-data node"); | ||
expectThrows(IllegalArgumentException.class, () -> command.execute(routingAllocation, false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please could you assert that the content of the message is correct? (expectThrows
returns the exception thrown, so you just need to assign it to a variable and check its message.)
} | ||
RoutingNode toRoutingNode = allocation.routingNodes().node(toDiscoNode.getId()); | ||
if (toRoutingNode == null && !toDiscoNode.isDataNode()) { | ||
throw new IllegalArgumentException("[move_allocation] can't move " + shardId + ", from " + fromDiscoNode + ", to " + toDiscoNode + ", since its not allowed, reason: " + toDiscoNode + " is not a data node."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As previous comment - this message could be leaner.
RoutingAllocation routingAllocation = new RoutingAllocation(new AllocationDeciders(Settings.EMPTY, Collections.emptyList()), | ||
new RoutingNodes(clusterState, false), clusterState, ClusterInfo.EMPTY, System.nanoTime()); | ||
logger.info("--> executing move allocation command from non-data node"); | ||
expectThrows(IllegalArgumentException.class, () -> command.execute(routingAllocation, false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As previous comment: please could you assert that the content of the message is correct?
Also you seem to have signed the CLA with a different email address from the one attached to these commits ( |
@DaveCTurner Thanks for the suggestion to check the message - I've refactored the test to use the same initial set up as I've also signed the CLA with the crate.io address. 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple more minor things that I missed on the first review, sorry.
new RoutingNodes(clusterState, false), clusterState, ClusterInfo.EMPTY, System.nanoTime()); | ||
logger.info("--> executing move allocation command from non-data node"); | ||
IllegalArgumentException e = expectThrows(IllegalArgumentException.class, () -> command.execute(routingAllocation, false)); | ||
assertEquals("[move_allocation] can't move 0 from node2 to node1: node2 is not a data node.", e.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, thanks. I'm glad you did that, because it shows that this message isn't what I was expecting. The shard number on its own is likely to cause confusion, and also it's useful to have the full DiscoveryNode.toString()
output rather than just using .getName()
.
Something like this'd be great:
for (ShardRouting shardRouting : allocation.routingNodes().node(fromDiscoNode.getId())) { | ||
RoutingNode fromRoutingNode = allocation.routingNodes().node(fromDiscoNode.getId()); | ||
if (fromRoutingNode == null && !fromDiscoNode.isDataNode()) { | ||
throw new IllegalArgumentException("[move_allocation] can't move " + shardId + " from " + fromDiscoNode.getName() + " to " + toDiscoNode.getName() + ": " + fromDiscoNode.getName() + " is not a data node."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you wrap these lines to <120 characters? We're trying to cut down on overlong lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing!
…n-data discovery node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, LGTM.
@elasticmachine please test this. |
@bleskes are you happy with the changes made? |
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @autophagy . I'll pull this in.
…#28886) While trying to reroute a shard to or from a non-data node (a node with ``node.data=false``), I encountered a null pointer exception. Though an exception is to be expected, the NPE was occurring because ``allocation.routingNodes()`` would not contain any non-data nodes, so when you attempt to do ``allocation.routingNodes.node(non-data-node)`` it would not find it, and thus error. This occurred regardless of whether I was rerouting to or from a non-data node. This PR adds a check (as well as a test for these use cases) to return a legible, useful exception if the discovery node you are rerouting to or from is not a data node.
While trying to reroute a shard to or from a non-data node (a node with
node.data=false
), I encountered a null pointer exception. Though an exception is to be expected, the NPE was occurring becauseallocation.routingNodes()
would not contain any non-data nodes, so when you attempt to doallocation.routingNodes.node(non-data-node)
it would not find it, and thus error. This occured regardless of whether I was rerouting to or from a non-data node.This PR adds a check (as well as a test for these use cases) to return a legible, useful exception if the discovery node you are rerouting to or from is not a data node.