Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client does not retry alternative hosts when retry_on_status is triggered #893

Open
jackellenberger opened this issue May 19, 2020 · 1 comment

Comments

@jackellenberger
Copy link

Context

When multiple hosts are provided to a client and a TransportError occurs, e.g. Faraday::ConnectionFailed, the request is passed on to be tried on the next available host. When all hosts have been tried, but there are more retry attempts remaining, all hosts are revived. This behavior seems pretty straight forward.

Problem

However, when the retry_on_status option is provided to the client, along with multiple hosts, all retries are attempted against the erroring host, and secondary hosts are never queried. In my understanding, this is because retry is called immediately, before the host connection can be killed.

Not only is this somewhat unexpected behavior, there is the added wrinkle that the retry count is adjusted up for multi host connections regardless of whether or not all those connections are used. So with 2 hosts on a client and a retry_on_failure value of 3, a transport error will retry 3 times each on host 1 and host 2, alternating between the two, but on an exception that is noted in retry_on_status, host 1 will see 6 attempted requests before it gives up.

Version

We are still way back on version 5.0.4, but this behavior appears to be the same all the way through 7.7.0

Example

Here are logs from a client with two hosts, ["foobar-us-east-1", "barbaz-us-east-2"], retry_on_failure: 3, retry_on_status: [503]

Expected behavior, and the current behavior of TransportErrors:

{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to foobar-us-east-1:9200 (getaddrinfo: Name or service not known) {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 1 connecting to {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to bazbat-us-east-2:9200 (getaddrinfo: Name or service not known) {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 2 connecting to {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to foobar-us-east-1:9200 (getaddrinfo: Name or service not known) {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 3 connecting to {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to bazbat-us-east-2:9200 (getaddrinfo: Name or service not known) {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 4 connecting to {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to foobar-us-east-1:9200 (getaddrinfo: Name or service not known) {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 5 connecting to {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to bazbat-us-east-2:9200 (getaddrinfo: Name or service not known) {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 6 connecting to {:host=>\"bazbat-us-east-2\", :port=>9200, :protocol=>\"http\"}"}
{"level":"ERROR","msg":"[Faraday::ConnectionFailed] Failed to open TCP connection to foobar-us-east-1:9200 (getaddrinfo: Name or service not known) {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"WARN","msg":"[Faraday::ConnectionFailed] Attempt 7 connecting to {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"}"}
{"level":"FATAL","msg":"[Faraday::ConnectionFailed] Cannot connect to {:host=>\"foobar-us-east-1\", :port=>9200, :protocol=>\"http\"} after 7 tries"}

Note how attempts bounce back and forth between foobar-us-east-1 and bazbat-us-east-2

Current behavior of retry_on_status errors:

{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 1 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 2 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 3 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 4 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 5 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 6 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"WARN","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Attempt 7 to get response from http://foobar-us-east-1:9200/_search"}
{"level":"FATAL","msg":"[Elasticsearch::Transport::Transport::Errors::ServiceUnavailable] Cannot get response from http://foobar-us-east-1:9200/_search after 7 tries"}

Note how attempts are only made to foobar-us-east-1, and there are retry_count * number of hosts + 1 of them.

@HuntsmanX
Copy link

@jackellenberger Do you have any updates for it? I have the same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants