Ignore timeouts with single-node discovery #52159

DaveCTurner · 2020-02-10T17:44:57Z

Today we use cluster.join.timeout to prevent nodes from waiting indefinitely
if joining a faulty master that is too slow to respond, and
cluster.publish.timeout to allow a faulty master to detect that it is unable
to publish its cluster state updates in a timely fashion. If these timeouts
occur then the node restarts the discovery process in an attempt to find a
healthier master.

In the special case of discovery.type: single-node there is no point in
looking for another healthier master since the single node in the cluster is
all we've got. This commit suppresses these timeouts and instead lets the node
wait for joins and publications to succeed no matter how long this might take.

Today we use `cluster.join.timeout` to prevent nodes from waiting indefinitely if joining a faulty master that is too slow to respond, and `cluster.publish.timeout` to allow a faulty master to detect that it is unable to publish its cluster state updates in a timely fashion. If these timeouts occur then the node restarts the discovery process in an attempt to find a healthier master. In the special case of `discovery.type: single-node` there is no point in looking for another healthier master since the single node in the cluster is all we've got. This commit suppresses these timeouts and instead lets the node wait for joins and publications to succeed no matter how long this might take.

elasticmachine · 2020-02-10T17:44:59Z

Pinging @elastic/es-distributed (:Distributed/Cluster Coordination)

original-brownbear

LGTM :)

server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java

Today we use `cluster.join.timeout` to prevent nodes from waiting indefinitely if joining a faulty master that is too slow to respond, and `cluster.publish.timeout` to allow a faulty master to detect that it is unable to publish its cluster state updates in a timely fashion. If these timeouts occur then the node restarts the discovery process in an attempt to find a healthier master. In the special case of `discovery.type: single-node` there is no point in looking for another healthier master since the single node in the cluster is all we've got. This commit suppresses these timeouts and instead lets the node wait for joins and publications to succeed no matter how long this might take.

DaveCTurner added >bug :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.7.0 labels Feb 10, 2020

DaveCTurner requested a review from ywelsch February 10, 2020 17:44

original-brownbear approved these changes Feb 10, 2020

View reviewed changes

server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java Outdated Show resolved Hide resolved

Private

b7e1d8a

DaveCTurner merged commit a304d9a into elastic:master Feb 11, 2020

DaveCTurner deleted the 2020-02-10-no-timeouts-in-single-node-discovery branch February 11, 2020 14:00

DaveCTurner added the backport pending label Feb 11, 2020

DaveCTurner removed the backport pending label Feb 11, 2020

codebrain mentioned this pull request Apr 1, 2020

7.7.0 meta ticket (Part 3) elastic/elasticsearch-net#4534

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

jrodewig mentioned this pull request Dec 6, 2021

[DOCS] Release notes for v7.16.0 #81369

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore timeouts with single-node discovery #52159

Ignore timeouts with single-node discovery #52159

DaveCTurner commented Feb 10, 2020

elasticmachine commented Feb 10, 2020

original-brownbear left a comment

Ignore timeouts with single-node discovery #52159

Ignore timeouts with single-node discovery #52159

Conversation

DaveCTurner commented Feb 10, 2020

elasticmachine commented Feb 10, 2020

original-brownbear left a comment

Choose a reason for hiding this comment