Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discovery: back port #7558 to 1.x and add bwc protections of the new ping on master gone introduced in #7493 #7694

Closed

Conversation

Projects
None yet
5 participants
@bleskes
Copy link
Member

commented Sep 11, 2014

The change in #7558 adds a flag to PingResponse. However, when unicast discovery is used, this extra flag can not be serialized by the very initial pings as they do not know yet what node version they ping (i.e., they have to default to 1.0.0, which excludes changing the serialization format). This commit bypasses this problem by adding a dedicated action which only exist on nodes of version 1.4 or up. Nodes first try to ping this endpoint using 1.4.0 as a serialization version. If that fails they fall back to the pre 1.4.0 action. This is optimal if all nodes are on 1.4.0 or higher, with a small down side if the cluster has mixed versions - but this is a temporary state.

Further two bwc protections are added:

  1. Disable the preference to nodes who previously joined the cluster if some of the pings are on version < 1.4.0
  2. Disable the rejoin on master gone functionality if some nodes in the cluster or version < 1.4.0

bleskes added some commits Sep 3, 2014

Resiliency: Master election should demotes nodes which try to join th…
…e cluster for the first time

With the change in #7493,  we introduced a pinging round when a master nodes goes down. That pinging round helps validating the current state of the cluster and takes, by default, 3 seconds. It may be that during that window, a new node tries to join the cluster and starts pinging (this is typical when you quickly restart the current master).  If this node gets elected as the new master it will force recovery from the gateway (it has no in memory cluster state), which in turn will cause a full cluster shard synchronisation. While this is not a problem on it's own, it's a shame. This commit demotes "new" nodes during master election so the will only be elected if really needed.

Closes #7558
Discovery: back port #7558 to 1.x and add bwc protections of the new …
…ping on master gone introduced in #7493

The change in #7558 adds a flag to PingResponse. However, when unicast discovery is used,  this extra flag can not be serialized by the very initial pings as they do not know yet what node version they ping (i.e., they have to default to 1.0.0, which excludes changing the serialization format). This commit bypasses this problem by adding a dedicated action which only exist on nodes of version 1.4 or up. Nodes first try to ping this endpoint using 1.4.0 as a serialization version. If that fails they fall back to the pre 1.4.0 action. This is optimal if all nodes are on 1.4.0 or higher, with a small down side if the cluster has mixed versions - but this is a temporary state.

Further two bwc protections are added:
1) Disable the preference to nodes who previously joined the cluster if some of the pings are on version < 1.4.0
2) Disable the rejoin on master gone functionality if some nodes in the cluster or version < 1.4.0

@bleskes bleskes force-pushed the bleskes:master_prefer_non_initial_join_1.x branch to 7c590fd Sep 16, 2014

@bleskes

This comment has been minimized.

Copy link
Member Author

commented Sep 16, 2014

@s1monw , @javanna - I addressed all the feedback and rebase to the latest 1.x . Can you have another look?

public DiscoveryNode master() {
return master;
}

/** true if the joined has successfully joined the cluster before, null for nodes with a <1.4.0 version */

This comment has been minimized.

Copy link
@javanna

javanna Sep 16, 2014

Member

s/if the joined/if the node ?

This comment has been minimized.

Copy link
@bleskes

bleskes Sep 16, 2014

Author Member

oops. yes.

* the lowest possible version (i.e., 1.0.0) for serializing information on the wire. As of 1.4, we needed to extend
* the information sent in a ping, to prefer nodes which have previously joined the cluster during master election.
* This information is only needed if all the cluster is on version 1.4 or up. To bypass this issue we introduce
* a second action name which is guaranteed to exist only nodes from version 1.4.0 and up. Using this action,

This comment has been minimized.

Copy link
@javanna

javanna Sep 16, 2014

Member

s/only/only on ?

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Sep 16, 2014

I think I am fine with it LGTM

bleskes added a commit to bleskes/elasticsearch that referenced this pull request Sep 16, 2014

Discovery: back port elastic#7558 to 1.x and add bwc protections of t…
…he new ping on master gone introduced in elastic#7493

The change in elastic#7558 adds a flag to PingResponse. However, when unicast discovery is used,  this extra flag can not be serialized by the very initial pings as they do not know yet what node version they ping (i.e., they have to default to 1.0.0, which excludes changing the serialization format). This commit bypasses this problem by adding a dedicated action which only exist on nodes of version 1.4 or up. Nodes first try to ping this endpoint using 1.4.0 as a serialization version. If that fails they fall back to the pre 1.4.0 action. This is optimal if all nodes are on 1.4.0 or higher, with a small down side if the cluster has mixed versions - but this is a temporary state.

Further two bwc protections are added:
1) Disable the preference to nodes who previously joined the cluster if some of the pings are on version < 1.4.0
2) Disable the rejoin on master gone functionality if some nodes in the cluster or version < 1.4.0

Closes elastic#7694

bleskes added a commit that referenced this pull request Sep 16, 2014

Discovery: back port #7558 to 1.x and add bwc protections of the new …
…ping on master gone introduced in #7493

The change in #7558 adds a flag to PingResponse. However, when unicast discovery is used,  this extra flag can not be serialized by the very initial pings as they do not know yet what node version they ping (i.e., they have to default to 1.0.0, which excludes changing the serialization format). This commit bypasses this problem by adding a dedicated action which only exist on nodes of version 1.4 or up. Nodes first try to ping this endpoint using 1.4.0 as a serialization version. If that fails they fall back to the pre 1.4.0 action. This is optimal if all nodes are on 1.4.0 or higher, with a small down side if the cluster has mixed versions - but this is a temporary state.

Further two bwc protections are added:
1) Disable the preference to nodes who previously joined the cluster if some of the pings are on version < 1.4.0
2) Disable the rejoin on master gone functionality if some nodes in the cluster or version < 1.4.0

Closes #7694
@bleskes

This comment has been minimized.

Copy link
Member Author

commented Sep 16, 2014

merged in 1.x & 1.4

@bleskes bleskes closed this Sep 16, 2014

@bleskes bleskes deleted the bleskes:master_prefer_non_initial_join_1.x branch Sep 16, 2014

@jpountz jpountz removed the review label Oct 21, 2014

mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015

Discovery: back port elastic#7558 to 1.x and add bwc protections of t…
…he new ping on master gone introduced in elastic#7493

The change in elastic#7558 adds a flag to PingResponse. However, when unicast discovery is used,  this extra flag can not be serialized by the very initial pings as they do not know yet what node version they ping (i.e., they have to default to 1.0.0, which excludes changing the serialization format). This commit bypasses this problem by adding a dedicated action which only exist on nodes of version 1.4 or up. Nodes first try to ping this endpoint using 1.4.0 as a serialization version. If that fails they fall back to the pre 1.4.0 action. This is optimal if all nodes are on 1.4.0 or higher, with a small down side if the cluster has mixed versions - but this is a temporary state.

Further two bwc protections are added:
1) Disable the preference to nodes who previously joined the cluster if some of the pings are on version < 1.4.0
2) Disable the rejoin on master gone functionality if some nodes in the cluster or version < 1.4.0

Closes elastic#7694
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.