Make shard store fetch less dependent on the current cluster state, both on master and non data nodes #19044

bleskes · 2016-06-23T13:06:34Z

#18938 has changed the timing in which we send out to nodes to fetch their shard stores. Instead of doing this after the cluster state resulting of the node's join was published, #18938 made it be sent concurrently to the publishing processes. This revealed a couple of points where the shard store fetching is dependent of the current state of affairs of the cluster state, both on the master and the data nodes. The problem discovered were already present without #18938 but required a failure/extreme situations to make them happen.This PR tries to remove as much as possible of these dependencies making shard store fetching simpler and make the way to re-introduce #18938 which was reverted.

These are the notable changes:

Allow TransportNodesAction (of which shard store fetching is derived) callers to supply concrete disco nodes, so it won't need the cluster state to resolve them. This was a problem because the cluster state containing the needed nodes was not yet made available through ClusterService. Note that long term we can expect the rest layer to resolve node ids to concrete nodes, making this mode the only one needed.
The data node relied on the cluster state to have the relevant index meta data so it can find data when custom paths are used. We now fall back to read the meta data from disk if needed.
The data node was relying on it's own IndexService state to indicate whether the data it has corresponds to an existing allocation. This is of course something it can not know until it got (and processed) the new cluster state from the master. This flag in the response is now removed. This is not a problem because we used that flag to protect against double assigning of a shard to the same node, but we are already protected from it by the allocation deciders.
I removed the redundant filterNodeIds method in TransportNodesAction - if people want to filter they can override resolveRequest.

…odes

ywelsch · 2016-06-24T08:57:25Z

core/src/main/java/org/elasticsearch/action/support/nodes/BaseNodesRequest.java

@@ -35,8 +37,23 @@

    public static String[] ALL_NODES = Strings.EMPTY_ARRAY;

+    /**
+     * the list of nodesIds that will be used to resolve this request, once resolved this array will by nulled and {@link #concreteNodes}


s/will by/will be/

ywelsch · 2016-06-24T09:48:45Z

Left minor comments. Also there is a word missing in the title ;-)

…odes

bleskes · 2016-06-25T16:00:41Z

@ywelsch thx. I pushed another commit

ywelsch · 2016-06-27T09:29:25Z

Left one more comment (to get rid of a line of code). Feel free to push once addressed. LGTM

bleskes · 2016-06-27T13:05:47Z

thanks @ywelsch ! I added the assertion and pushed.

bleskes added 8 commits June 21, 2016 20:33

wip

aedd183

proper implementation

1ef52f4

sigh

40d2f0a

sigh2

881f8e5

extra logs

b54cf8a

more logs

6a79156

more info

d67d789

fix testThrottleWhenAllocatingToMatchingNode

461342e

bleskes added >bug resiliency :Allocation labels Jun 23, 2016

bleskes assigned ywelsch Jun 23, 2016

Merge remote-tracking branch 'upstream/master' into async_fetch_use_n…

ac78dc0

…odes

ywelsch reviewed Jun 24, 2016
View reviewed changes

bleskes changed the title ~~Make shard store fetch less dependent on the current cluster state, both on master and non nodes~~ Make shard store fetch less dependent on the current cluster state, both on master and non data nodes Jun 24, 2016

bleskes added 2 commits June 25, 2016 17:23

Merge remote-tracking branch 'upstream/master' into async_fetch_use_n…

50ff2fc

…odes

feedback

bb8c2c9

bleskes merged commit cb0824e into elastic:master Jun 27, 2016

bleskes deleted the async_fetch_use_nodes branch June 27, 2016 13:05

bleskes added the v5.0.0-alpha5 label Jun 27, 2016

lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make shard store fetch less dependent on the current cluster state, both on master and non data nodes #19044

Make shard store fetch less dependent on the current cluster state, both on master and non data nodes #19044

bleskes commented Jun 23, 2016

ywelsch Jun 24, 2016

ywelsch commented Jun 24, 2016

bleskes commented Jun 25, 2016

ywelsch commented Jun 27, 2016

bleskes commented Jun 27, 2016

Make shard store fetch less dependent on the current cluster state, both on master and non data nodes #19044

Make shard store fetch less dependent on the current cluster state, both on master and non data nodes #19044

Conversation

bleskes commented Jun 23, 2016

ywelsch Jun 24, 2016

Choose a reason for hiding this comment

ywelsch commented Jun 24, 2016

bleskes commented Jun 25, 2016

ywelsch commented Jun 27, 2016

bleskes commented Jun 27, 2016