Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicastZenPing should also ping last known discoNodes #7336

Conversation

Projects
None yet
6 participants
@bleskes
Copy link
Member

commented Aug 19, 2014

At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Note: this is agains the feature/improve_zen branch

[Discovery] UnicastZenPing should also ping last known discoNodes
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.
@@ -120,6 +131,12 @@ public DiscoveryNode electMaster(Iterable<DiscoveryNode> nodes) {

@Override
public int compare(DiscoveryNode o1, DiscoveryNode o2) {
if (o1.masterNode() && !o2.masterNode()) {
return -1;

This comment has been minimized.

Copy link
@martijnvg

martijnvg Aug 19, 2014

Member

Shouldn't this return 1 and the other if statement return -1?

This comment has been minimized.

Copy link
@martijnvg

martijnvg Aug 19, 2014

Member

Wait this code is ok, we want nodes with master eligible nodes to be in the beginning of the list and therefor it should be deemed smaller (return -1) then a node with that isn't master.

Maybe add a unit test for the elect master logic :)

@martijnvg

This comment has been minimized.

Copy link
Member

commented Aug 19, 2014

Left one comment regarding unit test for elect logic, other than that LGTM.

@bleskes

This comment has been minimized.

Copy link
Member Author

commented Aug 19, 2014

@martijnvg added unit tests.

@martijnvg

This comment has been minimized.

Shouldn't we also compare ids if it is a data node?

This comment has been minimized.

Copy link
Owner Author

replied Aug 19, 2014

I thought about and decided against it as it is not part of the method contract - regardless of the id, they are both equally unlikely to be chosen as a master.

This comment has been minimized.

Copy link

replied Aug 19, 2014

Ok, that make sense, we just want the nodes that are the most likely to appear first in the list.

@martijnvg

This comment has been minimized.

Copy link
Member

commented Aug 19, 2014

@bleskes LGTM

@kimchy

This comment has been minimized.

Copy link
Member

commented Aug 20, 2014

lgtm

bleskes added a commit that referenced this pull request Aug 20, 2014

[Discovery] UnicastZenPing should also ping last known discoNodes
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Closes #7336

@bleskes bleskes closed this Aug 20, 2014

@bleskes bleskes deleted the bleskes:unicast_ping_discovery_nodes branch Aug 20, 2014

bleskes added a commit that referenced this pull request Aug 22, 2014

[Discovery] UnicastZenPing should also ping last known discoNodes
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Closes #7336

bleskes added a commit that referenced this pull request Aug 23, 2014

[Discovery] UnicastZenPing should also ping last known discoNodes
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Closes #7336

@jpountz jpountz removed the review label Aug 26, 2014

bleskes added a commit that referenced this pull request Aug 27, 2014

[Discovery] UnicastZenPing should also ping last known discoNodes
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Closes #7336

bleskes added a commit to bleskes/elasticsearch that referenced this pull request Sep 1, 2014

[Discovery] UnicastZenPing should also ping last known discoNodes
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Closes elastic#7336
@dadoonet

This comment has been minimized.

Copy link
Member

commented Sep 5, 2014

@bleskes Should we label this issue with 1.4.0 and 2.0.0?

dadoonet added a commit to elastic/elasticsearch-cloud-aws that referenced this pull request Sep 5, 2014

ZenDiscovery constructor needs ElectMasterService instance
Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change Ec2Discovery constructor.

Closes #115.

dadoonet added a commit to elastic/elasticsearch-cloud-aws that referenced this pull request Sep 5, 2014

ZenDiscovery constructor needs ElectMasterService instance
Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change Ec2Discovery constructor.

Closes #115.

dadoonet added a commit to elastic/elasticsearch-cloud-azure that referenced this pull request Sep 5, 2014

ZenDiscovery constructor needs ElectMasterService instance
Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change AzureDiscovery constructor.

Closes #34.

dadoonet added a commit to elastic/elasticsearch-cloud-azure that referenced this pull request Sep 5, 2014

ZenDiscovery constructor needs ElectMasterService instance
Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change AzureDiscovery constructor.

Closes #34.

dadoonet added a commit to elastic/elasticsearch-cloud-gce that referenced this pull request Sep 5, 2014

ZenDiscovery constructor needs ElectMasterService instance
Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change GceDiscovery constructor.

Closes #35.

dadoonet added a commit to elastic/elasticsearch-cloud-gce that referenced this pull request Sep 5, 2014

ZenDiscovery constructor needs ElectMasterService instance
Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change GceDiscovery constructor.

Closes #35.

bleskes added a commit that referenced this pull request Sep 8, 2014

[Discovery] UnicastZenPing should also ping last known discoNodes
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Closes #7336

@clintongormley clintongormley changed the title [Discovery] UnicastZenPing should also ping last known discoNodes Resiliency: Discovery - UnicastZenPing should also ping last known discoNodes Sep 11, 2014

@clintongormley clintongormley changed the title Resiliency: Discovery - UnicastZenPing should also ping last known discoNodes UnicastZenPing should also ping last known discoNodes Jun 7, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.