Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Master election should demotes nodes which try to join the cluster for the first time #7558

Closed

Conversation

Projects
None yet
3 participants
@bleskes
Copy link
Member

bleskes commented Sep 3, 2014

With the change in #7493 , we introduced a pinging round when a master nodes goes down. That pinging round helps validating the current state of the cluster and takes, by default, 3 seconds. It may be that during that window, a new node tries to join the cluster and starts pinging (this is typical when you quickly restart the current master). If this node gets elected as the new master it will force recovery from the gateway (it has no in memory cluster state), which in turn will cause a full cluster shard synchronisation. While this is not a problem on it's own, it's a shame. This commit demotes "new" nodes during master election so the will only be elected if really needed.

bleskes added some commits Sep 3, 2014

@bleskes bleskes added v1.4.0 labels Sep 3, 2014

@bleskes bleskes force-pushed the bleskes:master_prefer_non_initial_join branch 2 times, most recently Sep 4, 2014

@bleskes bleskes force-pushed the bleskes:master_prefer_non_initial_join branch to 32083c5 Sep 4, 2014

@bleskes

This comment has been minimized.

Copy link
Member Author

bleskes commented Sep 4, 2014

@kimchy I pushed another update based on our discussion.

@@ -139,6 +140,9 @@

private volatile boolean rejoinOnMasterGone;

// will be set to true upon the first successful cluster join
private final AtomicBoolean hasJoinedClusterOnce = new AtomicBoolean();

This comment has been minimized.

Copy link
@s1monw

s1monw Sep 5, 2014

Contributor

not sure if it will help but maybe make that an integer such that we can log how often it did join it might help when debugging? just an idea

This comment has been minimized.

Copy link
@bleskes

bleskes Sep 5, 2014

Author Member

It might.. won't hurt. I'll add it.

@@ -306,6 +311,14 @@ public NodeService nodeService() {
}

@Override
public boolean isFirstClusterJoin() {

This comment has been minimized.

Copy link
@s1monw

s1monw Sep 5, 2014

Contributor

I don't like this - can we just name is hasJoinedCluster()

This comment has been minimized.

Copy link
@bleskes

bleskes Sep 5, 2014

Author Member

I agree this has gotten confusing in the latest refactoring. Good call. Will change.

@@ -963,20 +976,37 @@ private DiscoveryNode findMaster() {
}
}

Set<DiscoveryNode> possibleMasterNodes = Sets.newHashSet();
// nodes discovered during pinging
Set<DiscoveryNode> activeNodes = Sets.newHashSet();

This comment has been minimized.

Copy link
@s1monw

s1monw Sep 5, 2014

Contributor

if these collections are used for iteration I'd use something that has a stable iteration order instead of hashSet which might be different due to object identity and rehashing etc. I butt we should force this on the interface in electMaster further down the road

This comment has been minimized.

Copy link
@bleskes

bleskes Sep 5, 2014

Author Member

Not sure -

  • electMaster doesn't care about order -it sorts the list anyway
  • the set dedups the results (though I don't think that's needed)
  • DiscoveryNodes implements equals and hashcode - so no object pointers are used?

Before changing I'll have to double check the dedup logic is not needed. I'm not sure it's worth the risk - or do I miss something?

public interface PingContextProvider extends DiscoveryNodesProvider {

/** return true if this node is joining the cluster for the first time */
boolean isFirstClusterJoin();

This comment has been minimized.

Copy link
@s1monw

s1monw Sep 5, 2014

Contributor

do we really need a new interface for this?

This comment has been minimized.

Copy link
@bleskes

bleskes Sep 5, 2014

Author Member

I originally made it because DiscoveryNodesProvider is also used by the node/master fault detection. Over there it's confusing to have to implement something like isFirstCLusterJoin - if I rename this method to hasJoinedClusterOnce it will be less confusing, but still feels a bit weird to me. Don't feel strongly about it though.

this.initialJoin = in.readBoolean();
} else {
// we prefer to elect nodes which are not in the process of joining the cluster for the first time.
// false is the safe choice here.

This comment has been minimized.

Copy link
@s1monw

s1monw Sep 5, 2014

Contributor

so initialJoin is true if the node has NOT already joined the cluster ie. it is currently joining for the first time? I must be missing something but either we name is hasJoined or we set it to true please document what this means!

This comment has been minimized.

Copy link
@bleskes

bleskes Sep 5, 2014

Author Member

Old nodes will perform election without this knowledge. We need to make sure that if an old node elects it self, we will do the same - so we can't demote it.

I'll add this to the comment and add BWC test for this.

@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented Sep 5, 2014

I left some comments @bleskes

Rewrite firstClusterJoin to hasJoinedClusterOnce. Use a join cluster …
…to have extra trace info. Change pingRensponse.target to pingResponse.node, for clarity. Added comments and docs

@bleskes bleskes added blocker and removed review labels Sep 5, 2014

@clintongormley clintongormley changed the title [Discovery] Master election should demotes nodes which try to join the cluster for the first time Resiliency: Master election should demotes nodes which try to join the cluster for the first time Sep 8, 2014

@bleskes bleskes added the review label Sep 10, 2014

@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented Sep 10, 2014

LGTM

@bleskes

This comment has been minimized.

Copy link
Member Author

bleskes commented Sep 11, 2014

Re-opening as there is some BWC work to be done for 1.4

@bleskes bleskes reopened this Sep 11, 2014

bleskes added a commit to bleskes/elasticsearch that referenced this pull request Sep 12, 2014

Discovery: restore preference to latest unicast pings describing the …
…same node

elastic#5413 introduced a change where we prefer ping responses containing a master over those who don't. The same change changes the preference of acceptance if both pings have a master indication or if neither do.

 elastic#7558 added new flag to the PingResponse which changes after a node has joined the cluster for the very first time. Giving preference to older pings cause the wrong value of this flag to be used.   This commit restores the preference to the original one.

bleskes added a commit to bleskes/elasticsearch that referenced this pull request Sep 16, 2014

Resiliency: Master election should demotes nodes which try to join th…
…e cluster for the first time

With the change in elastic#7493,  we introduced a pinging round when a master nodes goes down. That pinging round helps validating the current state of the cluster and takes, by default, 3 seconds. It may be that during that window, a new node tries to join the cluster and starts pinging (this is typical when you quickly restart the current master).  If this node gets elected as the new master it will force recovery from the gateway (it has no in memory cluster state), which in turn will cause a full cluster shard synchronisation. While this is not a problem on it's own, it's a shame. This commit demotes "new" nodes during master election so the will only be elected if really needed.

Closes elastic#7558

bleskes added a commit to bleskes/elasticsearch that referenced this pull request Sep 16, 2014

Discovery: back port elastic#7558 to 1.x and add bwc protections of t…
…he new ping on master gone introduced in elastic#7493

The change in elastic#7558 adds a flag to PingResponse. However, when unicast discovery is used,  this extra flag can not be serialized by the very initial pings as they do not know yet what node version they ping (i.e., they have to default to 1.0.0, which excludes changing the serialization format). This commit bypasses this problem by adding a dedicated action which only exist on nodes of version 1.4 or up. Nodes first try to ping this endpoint using 1.4.0 as a serialization version. If that fails they fall back to the pre 1.4.0 action. This is optimal if all nodes are on 1.4.0 or higher, with a small down side if the cluster has mixed versions - but this is a temporary state.

Further two bwc protections are added:
1) Disable the preference to nodes who previously joined the cluster if some of the pings are on version < 1.4.0
2) Disable the rejoin on master gone functionality if some nodes in the cluster or version < 1.4.0

bleskes added a commit to bleskes/elasticsearch that referenced this pull request Sep 16, 2014

Resiliency: Master election should demotes nodes which try to join th…
…e cluster for the first time

With the change in elastic#7493,  we introduced a pinging round when a master nodes goes down. That pinging round helps validating the current state of the cluster and takes, by default, 3 seconds. It may be that during that window, a new node tries to join the cluster and starts pinging (this is typical when you quickly restart the current master).  If this node gets elected as the new master it will force recovery from the gateway (it has no in memory cluster state), which in turn will cause a full cluster shard synchronisation. While this is not a problem on it's own, it's a shame. This commit demotes "new" nodes during master election so the will only be elected if really needed.

Closes elastic#7558

bleskes added a commit to bleskes/elasticsearch that referenced this pull request Sep 16, 2014

Discovery: back port elastic#7558 to 1.x and add bwc protections of t…
…he new ping on master gone introduced in elastic#7493

The change in elastic#7558 adds a flag to PingResponse. However, when unicast discovery is used,  this extra flag can not be serialized by the very initial pings as they do not know yet what node version they ping (i.e., they have to default to 1.0.0, which excludes changing the serialization format). This commit bypasses this problem by adding a dedicated action which only exist on nodes of version 1.4 or up. Nodes first try to ping this endpoint using 1.4.0 as a serialization version. If that fails they fall back to the pre 1.4.0 action. This is optimal if all nodes are on 1.4.0 or higher, with a small down side if the cluster has mixed versions - but this is a temporary state.

Further two bwc protections are added:
1) Disable the preference to nodes who previously joined the cluster if some of the pings are on version < 1.4.0
2) Disable the rejoin on master gone functionality if some nodes in the cluster or version < 1.4.0

Closes elastic#7694

bleskes added a commit that referenced this pull request Sep 16, 2014

Resiliency: Master election should demotes nodes which try to join th…
…e cluster for the first time

With the change in #7493,  we introduced a pinging round when a master nodes goes down. That pinging round helps validating the current state of the cluster and takes, by default, 3 seconds. It may be that during that window, a new node tries to join the cluster and starts pinging (this is typical when you quickly restart the current master).  If this node gets elected as the new master it will force recovery from the gateway (it has no in memory cluster state), which in turn will cause a full cluster shard synchronisation. While this is not a problem on it's own, it's a shame. This commit demotes "new" nodes during master election so the will only be elected if really needed.

Closes #7558

bleskes added a commit that referenced this pull request Sep 16, 2014

Discovery: back port #7558 to 1.x and add bwc protections of the new …
…ping on master gone introduced in #7493

The change in #7558 adds a flag to PingResponse. However, when unicast discovery is used,  this extra flag can not be serialized by the very initial pings as they do not know yet what node version they ping (i.e., they have to default to 1.0.0, which excludes changing the serialization format). This commit bypasses this problem by adding a dedicated action which only exist on nodes of version 1.4 or up. Nodes first try to ping this endpoint using 1.4.0 as a serialization version. If that fails they fall back to the pre 1.4.0 action. This is optimal if all nodes are on 1.4.0 or higher, with a small down side if the cluster has mixed versions - but this is a temporary state.

Further two bwc protections are added:
1) Disable the preference to nodes who previously joined the cluster if some of the pings are on version < 1.4.0
2) Disable the rejoin on master gone functionality if some nodes in the cluster or version < 1.4.0

Closes #7694
@bleskes

This comment has been minimized.

Copy link
Member Author

bleskes commented Sep 16, 2014

this is now back ported to 1.x & 1.4

@bleskes bleskes closed this Sep 16, 2014

@bleskes bleskes deleted the bleskes:master_prefer_non_initial_join branch Sep 16, 2014

@clintongormley clintongormley changed the title Resiliency: Master election should demotes nodes which try to join the cluster for the first time Master election should demotes nodes which try to join the cluster for the first time Jun 7, 2015

mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015

Resiliency: Master election should demotes nodes which try to join th…
…e cluster for the first time

With the change in elastic#7493,  we introduced a pinging round when a master nodes goes down. That pinging round helps validating the current state of the cluster and takes, by default, 3 seconds. It may be that during that window, a new node tries to join the cluster and starts pinging (this is typical when you quickly restart the current master).  If this node gets elected as the new master it will force recovery from the gateway (it has no in memory cluster state), which in turn will cause a full cluster shard synchronisation. While this is not a problem on it's own, it's a shame. This commit demotes "new" nodes during master election so the will only be elected if really needed.

Closes elastic#7558

mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015

Discovery: back port elastic#7558 to 1.x and add bwc protections of t…
…he new ping on master gone introduced in elastic#7493

The change in elastic#7558 adds a flag to PingResponse. However, when unicast discovery is used,  this extra flag can not be serialized by the very initial pings as they do not know yet what node version they ping (i.e., they have to default to 1.0.0, which excludes changing the serialization format). This commit bypasses this problem by adding a dedicated action which only exist on nodes of version 1.4 or up. Nodes first try to ping this endpoint using 1.4.0 as a serialization version. If that fails they fall back to the pre 1.4.0 action. This is optimal if all nodes are on 1.4.0 or higher, with a small down side if the cluster has mixed versions - but this is a temporary state.

Further two bwc protections are added:
1) Disable the preference to nodes who previously joined the cluster if some of the pings are on version < 1.4.0
2) Disable the rejoin on master gone functionality if some nodes in the cluster or version < 1.4.0

Closes elastic#7694
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.