Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Better support lossy networks for arbitrator pings
The current implementation sets the socket timeout to 1s and, in some error cases, retries until the timeout passed as raw_daemon_request() kwarg (5s default) is reached. On tcp loss, the retransmit occurs after 1s, so after the socket timeout we configured... and the ETIMEDOUT errno was not considered as retryable => a error log was logged after 1s instead of retrying for the configured or default timeout, which is usually more than 1s. This patch: 1/ implements the ETIMEDOUT errno handler, treating the same as a socket.timeout, ie retry. 2/ fix a infinite retry loop situation, that was never actually experienced in lab or never reported to us. 3/ set the socket timeout to 1.5s instead of 1s to allow at least one tcp retransmit 4/ return the error from _ping() so the caller can decide to print or log the cause of the ping stale. Example: $ om node ping --node 1.2.2.3 1.2.2.3 is not alive: timeout daemon request (connect error)
- Loading branch information
Showing
3 changed files
with
33 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters