Start Master|Node fault detection pinging immediately during discovery #6706

Closed
wants to merge 1 commit into
from

Projects

None yet

3 participants

@bleskes
Member
bleskes commented Jul 3, 2014

After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

@bleskes bleskes [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.
321277e
@martijnvg
Member

LGTM

@bleskes bleskes added a commit that closed this pull request Jul 3, 2014
@bleskes bleskes [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
ae16956
@bleskes bleskes closed this in ae16956 Jul 3, 2014
@bleskes bleskes removed the review label Jul 3, 2014
@bleskes bleskes deleted the bleskes:immediately_ping branch Jul 3, 2014
@bleskes bleskes added a commit that referenced this pull request Jul 3, 2014
@bleskes bleskes [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
9757a6e
@bleskes bleskes added a commit that referenced this pull request Jul 8, 2014
@bleskes bleskes Revert "[Discovery] immediately start Master|Node fault detection pin…
…ging"

In #6706 we change the master validation to start pining immediately after a new master as ellected or a node joined. The idea is to have a quicker response to failures. This does however create a problem if the new master has yet fully processed it's ellection and responds to the ping with a NoLongerMasterException. This causes the source node to remove the current master and ellect another, only to find out it's not a master either and so forth. We are moving this change to the feature/improve_zen branch, where the improvements we made will cause the situation to be handled properly.

This reverts commit ae16956.
caf11ff
@bleskes bleskes added a commit that referenced this pull request Jul 8, 2014
@bleskes bleskes Revert "[Discovery] immediately start Master|Node fault detection pin…
…ging"

In #6706 we change the master validation to start pining immediately after a new master as ellected or a node joined. The idea is to have a quicker response to failures. This does however create a problem if the new master has yet fully processed it's ellection and responds to the ping with a NoLongerMasterException. This causes the source node to remove the current master and ellect another, only to find out it's not a master either and so forth. We are moving this change to the feature/improve_zen branch, where the improvements we made will cause the situation to be handled properly.

This reverts commit ae16956.
20cd74d
@bleskes bleskes added a commit that referenced this pull request Jul 8, 2014
@bleskes bleskes [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
4ed028b
@bleskes bleskes added a commit that referenced this pull request Jul 9, 2014
@bleskes bleskes [Discovery] Start master fault detection after pingInterval
This is to allow the master election to complete on the chosen master.

 Relates to #6706
92cfa8a
@bleskes bleskes added a commit that referenced this pull request Jul 9, 2014
@bleskes @martijnvg bleskes + martijnvg [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
a54a88b
@bleskes bleskes added a commit that referenced this pull request Jul 9, 2014
@bleskes @martijnvg bleskes + martijnvg [Discovery] Start master fault detection after pingInterval
This is to allow the master election to complete on the chosen master.

 Relates to #6706
7e85769
@bleskes bleskes added a commit that referenced this pull request Jul 11, 2014
@bleskes bleskes [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
f243aaf
@bleskes bleskes added a commit that referenced this pull request Jul 11, 2014
@bleskes bleskes [Discovery] Start master fault detection after pingInterval
This is to allow the master election to complete on the chosen master.

 Relates to #6706
3b8fedf
@bleskes bleskes added a commit that referenced this pull request Jul 12, 2014
@bleskes bleskes [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
432042f
@bleskes bleskes added a commit that referenced this pull request Jul 12, 2014
@bleskes bleskes [Discovery] Start master fault detection after pingInterval
This is to allow the master election to complete on the chosen master.

 Relates to #6706
1433b82
@bleskes bleskes added the resiliency label Jul 14, 2014
@bleskes bleskes added a commit to bleskes/elasticsearch that referenced this pull request Jul 14, 2014
@bleskes bleskes [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
bd69e0e
@bleskes bleskes added a commit to bleskes/elasticsearch that referenced this pull request Jul 14, 2014
@bleskes bleskes [Discovery] Start master fault detection after pingInterval
This is to allow the master election to complete on the chosen master.

 Relates to #6706
3bde08b
@bleskes bleskes added a commit that referenced this pull request Jul 16, 2014
@bleskes bleskes [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
3e08188
@bleskes bleskes added a commit that referenced this pull request Jul 16, 2014
@bleskes bleskes [Discovery] Start master fault detection after pingInterval
This is to allow the master election to complete on the chosen master.

 Relates to #6706
975b689
@clintongormley clintongormley changed the title from [Discovery] immediately start Master|Node fault detection pinging to Resiliency: Start Master|Node fault detection pinging immediately during discovery Jul 16, 2014
@bleskes bleskes added a commit that referenced this pull request Jul 17, 2014
@bleskes bleskes [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
0e9ca5e
@bleskes bleskes added a commit that referenced this pull request Jul 17, 2014
@bleskes bleskes [Discovery] Start master fault detection after pingInterval
This is to allow the master election to complete on the chosen master.

 Relates to #6706
c0cd013
@bleskes bleskes added a commit that referenced this pull request Jul 21, 2014
@bleskes bleskes [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
cb3793b
@bleskes bleskes added a commit that referenced this pull request Jul 21, 2014
@bleskes bleskes [Discovery] Start master fault detection after pingInterval
This is to allow the master election to complete on the chosen master.

 Relates to #6706
cbf602f
@bleskes bleskes added a commit that referenced this pull request Jul 22, 2014
@bleskes @martijnvg bleskes + martijnvg [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
79c87ce
@bleskes bleskes added a commit that referenced this pull request Jul 22, 2014
@bleskes @martijnvg bleskes + martijnvg [Discovery] Start master fault detection after pingInterval
This is to allow the master election to complete on the chosen master.

 Relates to #6706
87003d7
@bleskes bleskes added a commit that referenced this pull request Aug 4, 2014
@bleskes @martijnvg bleskes + martijnvg [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
66b3931
@bleskes bleskes added a commit that referenced this pull request Aug 4, 2014
@bleskes @martijnvg bleskes + martijnvg [Discovery] Start master fault detection after pingInterval
This is to allow the master election to complete on the chosen master.

 Relates to #6706
231d031
@bleskes bleskes added a commit that referenced this pull request Aug 5, 2014
@bleskes @martijnvg bleskes + martijnvg [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
1706ef2
@bleskes bleskes added a commit that referenced this pull request Aug 5, 2014
@bleskes @martijnvg bleskes + martijnvg [Discovery] Start master fault detection after pingInterval
This is to allow the master election to complete on the chosen master.

 Relates to #6706
4319bdb
@bleskes bleskes added a commit that referenced this pull request Aug 14, 2014
@bleskes @martijnvg bleskes + martijnvg [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
dda5fb0
@bleskes bleskes added a commit that referenced this pull request Aug 14, 2014
@bleskes @martijnvg bleskes + martijnvg [Discovery] Start master fault detection after pingInterval
This is to allow the master election to complete on the chosen master.

 Relates to #6706
ff9dcd0
@lindstromhenrik lindstromhenrik added a commit to episerver/elasticsearch that referenced this pull request Aug 21, 2014
@bleskes @lindstromhenrik bleskes + lindstromhenrik [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
388aa93
@lindstromhenrik lindstromhenrik added a commit to episerver/elasticsearch that referenced this pull request Aug 21, 2014
@bleskes @lindstromhenrik bleskes + lindstromhenrik Revert "[Discovery] immediately start Master|Node fault detection pin…
…ging"

In #6706 we change the master validation to start pining immediately after a new master as ellected or a node joined. The idea is to have a quicker response to failures. This does however create a problem if the new master has yet fully processed it's ellection and responds to the ping with a NoLongerMasterException. This causes the source node to remove the current master and ellect another, only to find out it's not a master either and so forth. We are moving this change to the feature/improve_zen branch, where the improvements we made will cause the situation to be handled properly.

This reverts commit ae16956.
11bdf28
@bleskes bleskes added a commit that referenced this pull request Aug 22, 2014
@bleskes bleskes [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
a268fdd
@bleskes bleskes added a commit that referenced this pull request Aug 22, 2014
@bleskes bleskes [Discovery] Start master fault detection after pingInterval
This is to allow the master election to complete on the chosen master.

 Relates to #6706
cc50708
@bleskes bleskes added a commit that referenced this pull request Aug 23, 2014
@bleskes bleskes [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
95e7268
@bleskes bleskes added a commit that referenced this pull request Aug 23, 2014
@bleskes bleskes [Discovery] Start master fault detection after pingInterval
This is to allow the master election to complete on the chosen master.

 Relates to #6706
5bba569
@bleskes bleskes added a commit that referenced this pull request Aug 27, 2014
@bleskes bleskes [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
5302a53
@bleskes bleskes added a commit that referenced this pull request Aug 27, 2014
@bleskes bleskes [Discovery] Start master fault detection after pingInterval
This is to allow the master election to complete on the chosen master.

 Relates to #6706
3586e38
@bleskes bleskes added a commit to bleskes/elasticsearch that referenced this pull request Sep 1, 2014
@bleskes bleskes [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
17874a4
@bleskes bleskes added a commit to bleskes/elasticsearch that referenced this pull request Sep 1, 2014
@bleskes bleskes [Discovery] Start master fault detection after pingInterval
This is to allow the master election to complete on the chosen master.

 Relates to #6706
95bc858
@bleskes bleskes added v1.4.0 v1.3.0 and removed v1.3.0 v1.4.0 labels Sep 2, 2014
@bleskes bleskes added a commit that referenced this pull request Sep 8, 2014
@bleskes @areek bleskes + areek [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
6b07234
@bleskes bleskes added a commit that referenced this pull request Sep 8, 2014
@bleskes @areek bleskes + areek [Discovery] Start master fault detection after pingInterval
This is to allow the master election to complete on the chosen master.

 Relates to #6706
58861c5
@clintongormley clintongormley changed the title from Resiliency: Start Master|Node fault detection pinging immediately during discovery to Start Master|Node fault detection pinging immediately during discovery Jun 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment