Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make shard store fetch less dependent on the current cluster state, both on master and non data nodes #19044

Merged
merged 11 commits into from Jun 27, 2016

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Jun 23, 2016

#18938 has changed the timing in which we send out to nodes to fetch their shard stores. Instead of doing this after the cluster state resulting of the node's join was published, #18938 made it be sent concurrently to the publishing processes. This revealed a couple of points where the shard store fetching is dependent of the current state of affairs of the cluster state, both on the master and the data nodes. The problem discovered were already present without #18938 but required a failure/extreme situations to make them happen.This PR tries to remove as much as possible of these dependencies making shard store fetching simpler and make the way to re-introduce #18938 which was reverted.

These are the notable changes:

  1. Allow TransportNodesAction (of which shard store fetching is derived) callers to supply concrete disco nodes, so it won't need the cluster state to resolve them. This was a problem because the cluster state containing the needed nodes was not yet made available through ClusterService. Note that long term we can expect the rest layer to resolve node ids to concrete nodes, making this mode the only one needed.
  2. The data node relied on the cluster state to have the relevant index meta data so it can find data when custom paths are used. We now fall back to read the meta data from disk if needed.
  3. The data node was relying on it's own IndexService state to indicate whether the data it has corresponds to an existing allocation. This is of course something it can not know until it got (and processed) the new cluster state from the master. This flag in the response is now removed. This is not a problem because we used that flag to protect against double assigning of a shard to the same node, but we are already protected from it by the allocation deciders.
  4. I removed the redundant filterNodeIds method in TransportNodesAction - if people want to filter they can override resolveRequest.

@@ -35,8 +37,23 @@

public static String[] ALL_NODES = Strings.EMPTY_ARRAY;

/**
* the list of nodesIds that will be used to resolve this request, once resolved this array will by nulled and {@link #concreteNodes}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/will by/will be/

@ywelsch
Copy link
Contributor

ywelsch commented Jun 24, 2016

Left minor comments. Also there is a word missing in the title ;-)

@bleskes bleskes changed the title Make shard store fetch less dependent on the current cluster state, both on master and non nodes Make shard store fetch less dependent on the current cluster state, both on master and non data nodes Jun 24, 2016
@bleskes
Copy link
Contributor Author

bleskes commented Jun 25, 2016

@ywelsch thx. I pushed another commit

@ywelsch
Copy link
Contributor

ywelsch commented Jun 27, 2016

Left one more comment (to get rid of a line of code). Feel free to push once addressed. LGTM

@bleskes bleskes merged commit cb0824e into elastic:master Jun 27, 2016
@bleskes bleskes deleted the async_fetch_use_nodes branch June 27, 2016 13:05
@bleskes
Copy link
Contributor Author

bleskes commented Jun 27, 2016

thanks @ywelsch ! I added the assertion and pushed.

@lcawl lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. resiliency v5.0.0-alpha5
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants