UnicastZenPing should also ping last known discoNodes #7336

Closed
wants to merge 2 commits into
from

Projects

None yet

6 participants

@bleskes
Member
bleskes commented Aug 19, 2014

At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Note: this is agains the feature/improve_zen branch

@bleskes bleskes [Discovery] UnicastZenPing should also ping last known discoNodes
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.
629f0c4
@martijnvg martijnvg commented on the diff Aug 19, 2014
...ticsearch/discovery/zen/elect/ElectMasterService.java
@@ -120,6 +131,12 @@ public DiscoveryNode electMaster(Iterable<DiscoveryNode> nodes) {
@Override
public int compare(DiscoveryNode o1, DiscoveryNode o2) {
+ if (o1.masterNode() && !o2.masterNode()) {
+ return -1;
@martijnvg
martijnvg Aug 19, 2014 Member

Shouldn't this return 1 and the other if statement return -1?

@martijnvg
martijnvg Aug 19, 2014 Member

Wait this code is ok, we want nodes with master eligible nodes to be in the beginning of the list and therefor it should be deemed smaller (return -1) then a node with that isn't master.

Maybe add a unit test for the elect master logic :)

@martijnvg
Member

Left one comment regarding unit test for elect logic, other than that LGTM.

@bleskes
Member
bleskes commented Aug 19, 2014

@martijnvg added unit tests.

@martijnvg

Shouldn't we also compare ids if it is a data node?

Owner

I thought about and decided against it as it is not part of the method contract - regardless of the id, they are both equally unlikely to be chosen as a master.

Ok, that make sense, we just want the nodes that are the most likely to appear first in the list.

@martijnvg
Member

@bleskes LGTM

@kimchy
Member
kimchy commented Aug 20, 2014

lgtm

@bleskes bleskes added a commit that referenced this pull request Aug 20, 2014
@bleskes bleskes [Discovery] UnicastZenPing should also ping last known discoNodes
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Closes #7336
b11a92e
@bleskes bleskes closed this Aug 20, 2014
@bleskes bleskes deleted the bleskes:unicast_ping_discovery_nodes branch Aug 20, 2014
@bleskes bleskes added a commit that referenced this pull request Aug 22, 2014
@bleskes bleskes [Discovery] UnicastZenPing should also ping last known discoNodes
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Closes #7336
b942d54
@bleskes bleskes added a commit that referenced this pull request Aug 23, 2014
@bleskes bleskes [Discovery] UnicastZenPing should also ping last known discoNodes
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Closes #7336
5aabcc1
@jpountz jpountz removed the review label Aug 26, 2014
@bleskes bleskes added a commit that referenced this pull request Aug 27, 2014
@bleskes bleskes [Discovery] UnicastZenPing should also ping last known discoNodes
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Closes #7336
d5552a9
@bleskes bleskes added a commit to bleskes/elasticsearch that referenced this pull request Sep 1, 2014
@bleskes bleskes [Discovery] UnicastZenPing should also ping last known discoNodes
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Closes #7336
e3c246e
@dadoonet
Member
dadoonet commented Sep 5, 2014

@bleskes Should we label this issue with 1.4.0 and 2.0.0?

@dadoonet dadoonet referenced this pull request in elastic/elasticsearch-cloud-aws Sep 5, 2014
Closed

ZenDiscovery constructor needs ElectMasterService instance #115

@dadoonet dadoonet added a commit to elastic/elasticsearch-cloud-aws that referenced this pull request Sep 5, 2014
@dadoonet dadoonet ZenDiscovery constructor needs ElectMasterService instance
Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change Ec2Discovery constructor.

Closes #115.
d8a6f21
@dadoonet dadoonet added a commit to elastic/elasticsearch-cloud-aws that referenced this pull request Sep 5, 2014
@dadoonet dadoonet ZenDiscovery constructor needs ElectMasterService instance
Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change Ec2Discovery constructor.

Closes #115.
8195ab2
@dadoonet dadoonet referenced this pull request in elastic/elasticsearch-cloud-azure Sep 5, 2014
Closed

ZenDiscovery constructor needs ElectMasterService instance #34

@dadoonet dadoonet added a commit to elastic/elasticsearch-cloud-azure that referenced this pull request Sep 5, 2014
@dadoonet dadoonet ZenDiscovery constructor needs ElectMasterService instance
Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change AzureDiscovery constructor.

Closes #34.
417a756
@dadoonet dadoonet added a commit to elastic/elasticsearch-cloud-azure that referenced this pull request Sep 5, 2014
@dadoonet dadoonet ZenDiscovery constructor needs ElectMasterService instance
Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change AzureDiscovery constructor.

Closes #34.
f5d3500
@dadoonet dadoonet referenced this pull request in elastic/elasticsearch-cloud-gce Sep 5, 2014
Closed

ZenDiscovery constructor needs ElectMasterService instance #35

@dadoonet dadoonet added a commit to elastic/elasticsearch-cloud-gce that referenced this pull request Sep 5, 2014
@dadoonet dadoonet ZenDiscovery constructor needs ElectMasterService instance
Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change GceDiscovery constructor.

Closes #35.
d951483
@dadoonet dadoonet added a commit to elastic/elasticsearch-cloud-gce that referenced this pull request Sep 5, 2014
@dadoonet dadoonet ZenDiscovery constructor needs ElectMasterService instance
Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change GceDiscovery constructor.

Closes #35.
ce69736
@bleskes bleskes added a commit that referenced this pull request Sep 8, 2014
@bleskes @areek bleskes + areek [Discovery] UnicastZenPing should also ping last known discoNodes
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Closes #7336
c77ebfb
@clintongormley clintongormley changed the title from [Discovery] UnicastZenPing should also ping last known discoNodes to Resiliency: Discovery - UnicastZenPing should also ping last known discoNodes Sep 11, 2014
@clintongormley clintongormley changed the title from Resiliency: Discovery - UnicastZenPing should also ping last known discoNodes to UnicastZenPing should also ping last known discoNodes Jun 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment