Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During discovery, remove any local state and use clusterService.state instead #7834

Closed
wants to merge 19 commits into from

Conversation

Projects
None yet
3 participants
@bleskes
Copy link
Member

bleskes commented Sep 23, 2014

At the moment, ZenDiscovery contains a local copy of the disco nodes plus a flag that indicates whether the local node is master or not. This is redundant as the same information is stored in the cluster state. Have duplicate copy can lead to unneeded concurrency issues. This PR removes the duplication.

The PR introduces a tighter control of the background joining thread to make sure it is started and stopped together with any cluster state changes. This solves potentially concurrency bugs where a joining thread may fail to start.

Last we add a couple of safety checks to make sure that if a nodes receives a cluster state from a new master while actively trying to join another one (or electing itself) we go back to pinging to actively join it.

Note - this is PR is against the 1.x branch.

bleskes added some commits Sep 18, 2014

wip
use currentJoinThread to see if we should stop or not and use it to s…
…top when a cluster state comes from a new master

bleskes added a commit that referenced this pull request Sep 26, 2014

Discovery: remove any local state and use clusterService.state instead
At the moment, ZenDiscovery contains a local copy of the disco nodes plus a flag that indicates whether the local node is master or not. This is redundant as the same information is stored in the cluster state. Have duplicate copy can lead to unneeded concurrency issues. This PR removes the duplication, including moving the ownership of the localNode creation to ClusterState

The PR introduces a tighter control of the background joining thread to make sure it is started and stopped together with any cluster state changes. This solves potentially concurrency bugs where a joining thread may fail to start.

Last we add a couple of safety checks to make sure that if a nodes receives a cluster state from a new master while actively trying to join another one (or electing itself) we go back to pinging to actively join it.

Closes #7834

bleskes added a commit that referenced this pull request Sep 26, 2014

Discovery: remove any local state and use clusterService.state instead
At the moment, ZenDiscovery contains a local copy of the disco nodes plus a flag that indicates whether the local node is master or not. This is redundant as the same information is stored in the cluster state. Have duplicate copy can lead to unneeded concurrency issues. This PR removes the duplication, including moving the ownership of the localNode creation to ClusterState

The PR introduces a tighter control of the background joining thread to make sure it is started and stopped together with any cluster state changes. This solves potentially concurrency bugs where a joining thread may fail to start.

Last we add a couple of safety checks to make sure that if a nodes receives a cluster state from a new master while actively trying to join another one (or electing itself) we go back to pinging to actively join it.

Closes #7834

@bleskes bleskes closed this in db54e9c Sep 26, 2014

@bleskes bleskes deleted the feature/zen_direct_on_cluster_state branch Sep 26, 2014

@bleskes bleskes restored the feature/zen_direct_on_cluster_state branch Sep 26, 2014

@clintongormley clintongormley changed the title Discovery: remove any local state and use clusterService.state instead Resiliency: During discovery, remove any local state and use clusterService.state instead Sep 26, 2014

bleskes added a commit to bleskes/elasticsearch that referenced this pull request Sep 29, 2014

Discovery: master fault detection fall back to cluster state thread u…
…pon error

With elastic#7834, we simplified ZenDiscovery by making it use the current cluster state for all it's decision. This had the side effect a node may start it's Master FD before the master  has fully processed that cluster state update that adds that node (or elects the master master). This is due to the fact that master FD is started when a node receives a cluster state from the master but the master it self may still be publishing to other node.

 This commit makes sure that a master FD ping is only failed once we know that there is no current cluster state update in progress.

bleskes added a commit that referenced this pull request Sep 29, 2014

Discovery: master fault detection fall back to cluster state thread u…
…pon error

With #7834, we simplified ZenDiscovery by making it use the current cluster state for all it's decision. This had the side effect a node may start it's Master FD before the master  has fully processed that cluster state update that adds that node (or elects the master master). This is due to the fact that master FD is started when a node receives a cluster state from the master but the master it self may still be publishing to other node.

This commit makes sure that a master FD ping is only failed once we know that there is no current cluster state update in progress.

Closes #7908

bleskes added a commit that referenced this pull request Sep 29, 2014

Discovery: master fault detection fall back to cluster state thread u…
…pon error

With #7834, we simplified ZenDiscovery by making it use the current cluster state for all it's decision. This had the side effect a node may start it's Master FD before the master  has fully processed that cluster state update that adds that node (or elects the master master). This is due to the fact that master FD is started when a node receives a cluster state from the master but the master it self may still be publishing to other node.

This commit makes sure that a master FD ping is only failed once we know that there is no current cluster state update in progress.

Closes #7908

bleskes added a commit that referenced this pull request Sep 29, 2014

Discovery: master fault detection fall back to cluster state thread u…
…pon error

With #7834, we simplified ZenDiscovery by making it use the current cluster state for all it's decision. This had the side effect a node may start it's Master FD before the master  has fully processed that cluster state update that adds that node (or elects the master master). This is due to the fact that master FD is started when a node receives a cluster state from the master but the master it self may still be publishing to other node.

This commit makes sure that a master FD ping is only failed once we know that there is no current cluster state update in progress.

Closes #7908

@jpountz jpountz removed the review label Oct 21, 2014

@clintongormley clintongormley changed the title Resiliency: During discovery, remove any local state and use clusterService.state instead During discovery, remove any local state and use clusterService.state instead Jun 7, 2015

mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015

Discovery: remove any local state and use clusterService.state instead
At the moment, ZenDiscovery contains a local copy of the disco nodes plus a flag that indicates whether the local node is master or not. This is redundant as the same information is stored in the cluster state. Have duplicate copy can lead to unneeded concurrency issues. This PR removes the duplication, including moving the ownership of the localNode creation to ClusterState

The PR introduces a tighter control of the background joining thread to make sure it is started and stopped together with any cluster state changes. This solves potentially concurrency bugs where a joining thread may fail to start.

Last we add a couple of safety checks to make sure that if a nodes receives a cluster state from a new master while actively trying to join another one (or electing itself) we go back to pinging to actively join it.

Closes elastic#7834

mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015

Discovery: master fault detection fall back to cluster state thread u…
…pon error

With elastic#7834, we simplified ZenDiscovery by making it use the current cluster state for all it's decision. This had the side effect a node may start it's Master FD before the master  has fully processed that cluster state update that adds that node (or elects the master master). This is due to the fact that master FD is started when a node receives a cluster state from the master but the master it self may still be publishing to other node.

This commit makes sure that a master FD ping is only failed once we know that there is no current cluster state update in progress.

Closes elastic#7908

bleskes added a commit to bleskes/elasticsearch that referenced this pull request Jan 29, 2016

Local Discovery - don't create a local DiscoNode, but use the one fro…
…m cluster service

Long ago (elastic#7834) the owner ship of the local disco node was centralized to the cluster service. LocalDiscovery is still created it's own disco node, which is not used by the cluster service and thus creating confusion (two nodes same name but different ids).

This commit also removes and optimization where when joining a new master we would first copy the master's metadata and only then pull in the rest of the cluster state (and it's nodes).

bleskes added a commit that referenced this pull request Feb 1, 2016

Local Discovery - don't create a local DiscoNode, but use the one fro…
…m cluster service

Long ago (#7834) the owner ship of the local disco node was centralized to the cluster service. LocalDiscovery is still created it's own disco node, which is not used by the cluster service and thus creating confusion (two nodes same name but different ids).

This commit also removes and optimization where when joining a new master we would first copy the master's metadata and only then pull in the rest of the cluster state (and it's nodes).

Closes #16317

bleskes added a commit that referenced this pull request Feb 2, 2016

Local Discovery - don't create a local DiscoNode, but use the one fro…
…m cluster service

Long ago (#7834) the owner ship of the local disco node was centralized to the cluster service. LocalDiscovery is still created it's own disco node, which is not used by the cluster service and thus creating confusion (two nodes same name but different ids).

This commit also removes and optimization where when joining a new master we would first copy the master's metadata and only then pull in the rest of the cluster state (and it's nodes).

Closes #16317
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.